airflow.providers.apache.sqoop.hooks.sqoop
¶
This module contains a sqoop 1.x hook
Module Contents¶
Classes¶
This hook is a wrapper around the sqoop 1 binary. To be able to use the hook |
- class airflow.providers.apache.sqoop.hooks.sqoop.SqoopHook(conn_id=default_conn_name, verbose=False, num_mappers=None, hcatalog_database=None, hcatalog_table=None, properties=None)[source]¶
Bases:
airflow.hooks.base.BaseHook
This hook is a wrapper around the sqoop 1 binary. To be able to use the hook it is required that "sqoop" is in the PATH.
Additional arguments that can be passed via the 'extra' JSON field of the sqoop connection:
job_tracker
: Job tracker local|jobtracker:port.namenode
: Namenode.lib_jars
: Comma separated jar files to include in the classpath.files
: Comma separated files to be copied to the map reduce cluster.archives
: Comma separated archives to be unarchived on the computemachines.
password_file
: Path to file containing the password.
- Parameters
- popen(self, cmd, **kwargs)[source]¶
Remote Popen
- Parameters
cmd (List[str]) -- command to remotely execute
kwargs (Any) -- extra arguments to Popen (see subprocess.Popen)
- Returns
handle to subprocess
- Return type
None
- import_table(self, table, target_dir=None, append=False, file_type='text', columns=None, split_by=None, where=None, direct=False, driver=None, extra_import_options=None, schema=None)[source]¶
Imports table from remote location to target dir. Arguments are copies of direct sqoop command line arguments
- Parameters
table (str) -- Table to read
schema (Optional[str]) -- Schema name
target_dir (Optional[str]) -- HDFS destination dir
append (bool) -- Append data to an existing dataset in HDFS
file_type (str) -- "avro", "sequence", "text" or "parquet". Imports data to into the specified format. Defaults to text.
columns (Optional[str]) -- <col,col,col…> Columns to import from table
split_by (Optional[str]) -- Column of the table used to split work units
where (Optional[str]) -- WHERE clause to use during import
direct (bool) -- Use direct connector if exists for the database
driver (Any) -- Manually specify JDBC driver class to use
extra_import_options (Optional[Dict[str, Any]]) -- Extra import options to pass as dict. If a key doesn't have a value, just pass an empty string to it. Don't include prefix of -- for sqoop options.
- import_query(self, query, target_dir=None, append=False, file_type='text', split_by=None, direct=None, driver=None, extra_import_options=None)[source]¶
Imports a specific query from the rdbms to hdfs
- Parameters
query (str) -- Free format query to run
target_dir (Optional[str]) -- HDFS destination dir
append (bool) -- Append data to an existing dataset in HDFS
file_type (str) -- "avro", "sequence", "text" or "parquet" Imports data to hdfs into the specified format. Defaults to text.
split_by (Optional[str]) -- Column of the table used to split work units
direct (Optional[bool]) -- Use direct import fast path
driver (Optional[Any]) -- Manually specify JDBC driver class to use
extra_import_options (Optional[Dict[str, Any]]) -- Extra import options to pass as dict. If a key doesn't have a value, just pass an empty string to it. Don't include prefix of -- for sqoop options.
- export_table(self, table, export_dir=None, input_null_string=None, input_null_non_string=None, staging_table=None, clear_staging_table=False, enclosed_by=None, escaped_by=None, input_fields_terminated_by=None, input_lines_terminated_by=None, input_optionally_enclosed_by=None, batch=False, relaxed_isolation=False, extra_export_options=None, schema=None)[source]¶
Exports Hive table to remote location. Arguments are copies of direct sqoop command line Arguments
- Parameters
table (str) -- Table remote destination
schema (Optional[str]) -- Schema name
export_dir (Optional[str]) -- Hive table to export
input_null_string (Optional[str]) -- The string to be interpreted as null for string columns
input_null_non_string (Optional[str]) -- The string to be interpreted as null for non-string columns
staging_table (Optional[str]) -- The table in which data will be staged before being inserted into the destination table
clear_staging_table (bool) -- Indicate that any data present in the staging table can be deleted
enclosed_by (Optional[str]) -- Sets a required field enclosing character
escaped_by (Optional[str]) -- Sets the escape character
input_fields_terminated_by (Optional[str]) -- Sets the field separator character
input_lines_terminated_by (Optional[str]) -- Sets the end-of-line character
input_optionally_enclosed_by (Optional[str]) -- Sets a field enclosing character
batch (bool) -- Use batch mode for underlying statement execution
relaxed_isolation (bool) -- Transaction isolation to read uncommitted for the mappers
extra_export_options (Optional[Dict[str, Any]]) -- Extra export options to pass as dict. If a key doesn't have a value, just pass an empty string to it. Don't include prefix of -- for sqoop options.