airflow.providers.apache.sqoop.operators.sqoop
¶
This module contains a sqoop 1 operator
Module Contents¶
Classes¶
Execute a Sqoop job. |
- class airflow.providers.apache.sqoop.operators.sqoop.SqoopOperator(*, conn_id='sqoop_default', cmd_type='import', table=None, query=None, target_dir=None, append=False, file_type='text', columns=None, num_mappers=None, split_by=None, where=None, export_dir=None, input_null_string=None, input_null_non_string=None, staging_table=None, clear_staging_table=False, enclosed_by=None, escaped_by=None, input_fields_terminated_by=None, input_lines_terminated_by=None, input_optionally_enclosed_by=None, batch=False, direct=False, driver=None, verbose=False, relaxed_isolation=False, properties=None, hcatalog_database=None, hcatalog_table=None, create_hcatalog_table=False, extra_import_options=None, extra_export_options=None, schema=None, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Execute a Sqoop job. Documentation for Apache Sqoop can be found here: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html
- Parameters
conn_id (str) -- str
cmd_type (str) -- str specify command to execute "export" or "import"
schema (Optional[str]) -- Schema name
table (Optional[str]) -- Table to read
query (Optional[str]) -- Import result of arbitrary SQL query. Instead of using the table, columns and where arguments, you can specify a SQL statement with the query argument. Must also specify a destination directory with target_dir.
target_dir (Optional[str]) -- HDFS destination directory where the data from the rdbms will be written
append (bool) -- Append data to an existing dataset in HDFS
file_type (str) -- "avro", "sequence", "text" Imports data to into the specified format. Defaults to text.
columns (Optional[str]) -- <col,col,col> Columns to import from table
num_mappers (Optional[int]) -- Use n mapper tasks to import/export in parallel
split_by (Optional[str]) -- Column of the table used to split work units
where (Optional[str]) -- WHERE clause to use during import
export_dir (Optional[str]) -- HDFS Hive database directory to export to the rdbms
input_null_string (Optional[str]) -- The string to be interpreted as null for string columns
input_null_non_string (Optional[str]) -- The string to be interpreted as null for non-string columns
staging_table (Optional[str]) -- The table in which data will be staged before being inserted into the destination table
clear_staging_table (bool) -- Indicate that any data present in the staging table can be deleted
enclosed_by (Optional[str]) -- Sets a required field enclosing character
escaped_by (Optional[str]) -- Sets the escape character
input_fields_terminated_by (Optional[str]) -- Sets the input field separator
input_lines_terminated_by (Optional[str]) -- Sets the input end-of-line character
input_optionally_enclosed_by (Optional[str]) -- Sets a field enclosing character
batch (bool) -- Use batch mode for underlying statement execution
direct (bool) -- Use direct export fast path
driver (Optional[Any]) -- Manually specify JDBC driver class to use
verbose (bool) -- Switch to more verbose logging for debug purposes
relaxed_isolation (bool) -- use read uncommitted isolation level
hcatalog_database (Optional[str]) -- Specifies the database name for the HCatalog table
hcatalog_table (Optional[str]) -- The argument value for this option is the HCatalog table
create_hcatalog_table (bool) -- Have sqoop create the hcatalog table passed in or not
properties (Optional[Dict[str, Any]]) -- additional JVM properties passed to sqoop
extra_import_options (Optional[Dict[str, Any]]) -- Extra import options to pass as dict. If a key doesn't have a value, just pass an empty string to it. Don't include prefix of -- for sqoop options.
extra_export_options (Optional[Dict[str, Any]]) -- Extra export options to pass as dict. If a key doesn't have a value, just pass an empty string to it. Don't include prefix of -- for sqoop options.