airflow.providers.apache.sqoop.hooks.sqoop¶
This module contains a sqoop 1.x hook
Module Contents¶
-
class
airflow.providers.apache.sqoop.hooks.sqoop.SqoopHook(conn_id: str = default_conn_name, verbose: bool = False, num_mappers: Optional[int] = None, hcatalog_database: Optional[str] = None, hcatalog_table: Optional[str] = None, properties: Optional[Dict[str, Any]] = None)[source]¶ Bases:
airflow.hooks.base.BaseHookThis hook is a wrapper around the sqoop 1 binary. To be able to use the hook it is required that "sqoop" is in the PATH.
Additional arguments that can be passed via the 'extra' JSON field of the sqoop connection:
job_tracker: Job tracker local|jobtracker:port.namenode: Namenode.lib_jars: Comma separated jar files to include in the classpath.files: Comma separated files to be copied to the map reduce cluster.archives: Comma separated archives to be unarchived on the computemachines.
password_file: Path to file containing the password.
- Parameters
-
popen(self, cmd: List[str], **kwargs)[source]¶ Remote Popen
- Parameters
cmd -- command to remotely execute
kwargs -- extra arguments to Popen (see subprocess.Popen)
- Returns
handle to subprocess
-
import_table(self, table: str, target_dir: Optional[str] = None, append: bool = False, file_type: str = 'text', columns: Optional[str] = None, split_by: Optional[str] = None, where: Optional[str] = None, direct: bool = False, driver: Any = None, extra_import_options: Optional[Dict[str, Any]] = None, schema: Optional[str] = None)[source]¶ Imports table from remote location to target dir. Arguments are copies of direct sqoop command line arguments
- Parameters
table -- Table to read
schema -- Schema name
target_dir -- HDFS destination dir
append -- Append data to an existing dataset in HDFS
file_type -- "avro", "sequence", "text" or "parquet". Imports data to into the specified format. Defaults to text.
columns -- <col,col,col…> Columns to import from table
split_by -- Column of the table used to split work units
where -- WHERE clause to use during import
direct -- Use direct connector if exists for the database
driver -- Manually specify JDBC driver class to use
extra_import_options -- Extra import options to pass as dict. If a key doesn't have a value, just pass an empty string to it. Don't include prefix of -- for sqoop options.
-
import_query(self, query: str, target_dir: Optional[str] = None, append: bool = False, file_type: str = 'text', split_by: Optional[str] = None, direct: Optional[bool] = None, driver: Optional[Any] = None, extra_import_options: Optional[Dict[str, Any]] = None)[source]¶ Imports a specific query from the rdbms to hdfs
- Parameters
query -- Free format query to run
target_dir -- HDFS destination dir
append -- Append data to an existing dataset in HDFS
file_type -- "avro", "sequence", "text" or "parquet" Imports data to hdfs into the specified format. Defaults to text.
split_by -- Column of the table used to split work units
direct -- Use direct import fast path
driver -- Manually specify JDBC driver class to use
extra_import_options -- Extra import options to pass as dict. If a key doesn't have a value, just pass an empty string to it. Don't include prefix of -- for sqoop options.
-
export_table(self, table: str, export_dir: Optional[str] = None, input_null_string: Optional[str] = None, input_null_non_string: Optional[str] = None, staging_table: Optional[str] = None, clear_staging_table: bool = False, enclosed_by: Optional[str] = None, escaped_by: Optional[str] = None, input_fields_terminated_by: Optional[str] = None, input_lines_terminated_by: Optional[str] = None, input_optionally_enclosed_by: Optional[str] = None, batch: bool = False, relaxed_isolation: bool = False, extra_export_options: Optional[Dict[str, Any]] = None, schema: Optional[str] = None)[source]¶ Exports Hive table to remote location. Arguments are copies of direct sqoop command line Arguments
- Parameters
table -- Table remote destination
schema -- Schema name
export_dir -- Hive table to export
input_null_string -- The string to be interpreted as null for string columns
input_null_non_string -- The string to be interpreted as null for non-string columns
staging_table -- The table in which data will be staged before being inserted into the destination table
clear_staging_table -- Indicate that any data present in the staging table can be deleted
enclosed_by -- Sets a required field enclosing character
escaped_by -- Sets the escape character
input_fields_terminated_by -- Sets the field separator character
input_lines_terminated_by -- Sets the end-of-line character
input_optionally_enclosed_by -- Sets a field enclosing character
batch -- Use batch mode for underlying statement execution
relaxed_isolation -- Transaction isolation to read uncommitted for the mappers
extra_export_options -- Extra export options to pass as dict. If a key doesn't have a value, just pass an empty string to it. Don't include prefix of -- for sqoop options.