airflow.contrib.hooks.sqoop_hook¶
This module contains a sqoop 1.x hook
Module Contents¶
- 
class airflow.contrib.hooks.sqoop_hook.SqoopHook(conn_id='sqoop_default', verbose=False, num_mappers=None, hcatalog_database=None, hcatalog_table=None, properties=None)[source]¶
- Bases: - airflow.hooks.base_hook.BaseHook- This hook is a wrapper around the sqoop 1 binary. To be able to use the hook it is required that “sqoop” is in the PATH. - Additional arguments that can be passed via the ‘extra’ JSON field of the sqoop connection: - job_tracker: Job tracker local|jobtracker:port.
- namenode: Namenode.
- lib_jars: Comma separated jar files to include in the classpath.
- files: Comma separated files to be copied to the map reduce cluster.
- archives: Comma separated archives to be unarchived on the compute
- machines. 
 
- password_file: Path to file containing the password.
 - Parameters
 - 
Popen(self, cmd, **kwargs)[source]¶
- Remote Popen - Parameters
- cmd – command to remotely execute 
- kwargs – extra arguments to Popen (see subprocess.Popen) 
 
- Returns
- handle to subprocess 
 
 - 
_import_cmd(self, target_dir, append, file_type, split_by, direct, driver, extra_import_options)[source]¶
 - 
import_table(self, table, target_dir=None, append=False, file_type='text', columns=None, split_by=None, where=None, direct=False, driver=None, extra_import_options=None)[source]¶
- Imports table from remote location to target dir. Arguments are copies of direct sqoop command line arguments - Parameters
- table – Table to read 
- target_dir – HDFS destination dir 
- append – Append data to an existing dataset in HDFS 
- file_type – “avro”, “sequence”, “text” or “parquet”. Imports data to into the specified format. Defaults to text. 
- columns – <col,col,col…> Columns to import from table 
- split_by – Column of the table used to split work units 
- where – WHERE clause to use during import 
- direct – Use direct connector if exists for the database 
- driver – Manually specify JDBC driver class to use 
- extra_import_options – Extra import options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options. 
 
 
 - 
import_query(self, query, target_dir, append=False, file_type='text', split_by=None, direct=None, driver=None, extra_import_options=None)[source]¶
- Imports a specific query from the rdbms to hdfs - Parameters
- query – Free format query to run 
- target_dir – HDFS destination dir 
- append – Append data to an existing dataset in HDFS 
- file_type – “avro”, “sequence”, “text” or “parquet” Imports data to hdfs into the specified format. Defaults to text. 
- split_by – Column of the table used to split work units 
- direct – Use direct import fast path 
- driver – Manually specify JDBC driver class to use 
- extra_import_options – Extra import options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options. 
 
 
 - 
_export_cmd(self, table, export_dir, input_null_string, input_null_non_string, staging_table, clear_staging_table, enclosed_by, escaped_by, input_fields_terminated_by, input_lines_terminated_by, input_optionally_enclosed_by, batch, relaxed_isolation, extra_export_options)[source]¶
 - 
export_table(self, table, export_dir, input_null_string, input_null_non_string, staging_table, clear_staging_table, enclosed_by, escaped_by, input_fields_terminated_by, input_lines_terminated_by, input_optionally_enclosed_by, batch, relaxed_isolation, extra_export_options=None)[source]¶
- Exports Hive table to remote location. Arguments are copies of direct sqoop command line Arguments - Parameters
- table – Table remote destination 
- export_dir – Hive table to export 
- input_null_string – The string to be interpreted as null for string columns 
- input_null_non_string – The string to be interpreted as null for non-string columns 
- staging_table – The table in which data will be staged before being inserted into the destination table 
- clear_staging_table – Indicate that any data present in the staging table can be deleted 
- enclosed_by – Sets a required field enclosing character 
- escaped_by – Sets the escape character 
- input_fields_terminated_by – Sets the field separator character 
- input_lines_terminated_by – Sets the end-of-line character 
- input_optionally_enclosed_by – Sets a field enclosing character 
- batch – Use batch mode for underlying statement execution 
- relaxed_isolation – Transaction isolation to read uncommitted for the mappers 
- extra_export_options – Extra export options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.