`airflow.contrib.hooks.sqoop_hook`¶

This module contains a sqoop 1.x hook

Module Contents¶

class airflow.contrib.hooks.sqoop_hook.SqoopHook(conn_id='sqoop_default', verbose=False, num_mappers=None, hcatalog_database=None, hcatalog_table=None, properties=None)[source]¶

Bases:airflow.hooks.base_hook.BaseHook

This hook is a wrapper around the sqoop 1 binary. To be able to use the hook it is required that “sqoop” is in the PATH.

Additional arguments that can be passed via the ‘extra’ JSON field of the sqoop connection:

job_tracker: Job tracker local|jobtracker:port.

namenode: Namenode.

lib_jars: Comma separated jar files to include in the classpath.

files: Comma separated files to be copied to the map reduce cluster.

archives: Comma separated archives to be unarchived on the compute
machines.

password_file: Path to file containing the password.

Parameters

conn_id (str) – Reference to the sqoop connection.
verbose (bool) – Set sqoop to verbose.
num_mappers (int) – Number of map tasks to import in parallel.
properties (dict) – Properties to set via the -D argument

get_conn(self)[source]¶

cmd_mask_password(self, cmd_orig)[source]¶

Popen(self, cmd, **kwargs)[source]¶

Remote Popen

Parameters

cmd – command to remotely execute
kwargs – extra arguments to Popen (see subprocess.Popen)

Returns

handle to subprocess

_prepare_command(self, export=False)[source]¶

static _get_export_format_argument(file_type='text')[source]¶

_import_cmd(self, target_dir, append, file_type, split_by, direct, driver, extra_import_options)[source]¶

import_table(self, table, target_dir=None, append=False, file_type='text', columns=None, split_by=None, where=None, direct=False, driver=None, extra_import_options=None)[source]¶

Imports table from remote location to target dir. Arguments are copies of direct sqoop command line arguments

Parameters

table – Table to read
target_dir – HDFS destination dir
append – Append data to an existing dataset in HDFS
file_type – “avro”, “sequence”, “text” or “parquet”. Imports data to into the specified format. Defaults to text.
columns – <col,col,col…> Columns to import from table
split_by – Column of the table used to split work units
where – WHERE clause to use during import
direct – Use direct connector if exists for the database
driver – Manually specify JDBC driver class to use
extra_import_options – Extra import options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.

import_query(self, query, target_dir, append=False, file_type='text', split_by=None, direct=None, driver=None, extra_import_options=None)[source]¶

Imports a specific query from the rdbms to hdfs

Parameters

query – Free format query to run
target_dir – HDFS destination dir
append – Append data to an existing dataset in HDFS
file_type – “avro”, “sequence”, “text” or “parquet” Imports data to hdfs into the specified format. Defaults to text.
split_by – Column of the table used to split work units
direct – Use direct import fast path
driver – Manually specify JDBC driver class to use
extra_import_options – Extra import options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.

_export_cmd(self, table, export_dir, input_null_string, input_null_non_string, staging_table, clear_staging_table, enclosed_by, escaped_by, input_fields_terminated_by, input_lines_terminated_by, input_optionally_enclosed_by, batch, relaxed_isolation, extra_export_options)[source]¶

export_table(self, table, export_dir, input_null_string, input_null_non_string, staging_table, clear_staging_table, enclosed_by, escaped_by, input_fields_terminated_by, input_lines_terminated_by, input_optionally_enclosed_by, batch, relaxed_isolation, extra_export_options=None)[source]¶

Exports Hive table to remote location. Arguments are copies of direct sqoop command line Arguments

Parameters

table – Table remote destination
export_dir – Hive table to export
input_null_string – The string to be interpreted as null for string columns
input_null_non_string – The string to be interpreted as null for non-string columns
staging_table – The table in which data will be staged before being inserted into the destination table
clear_staging_table – Indicate that any data present in the staging table can be deleted
enclosed_by – Sets a required field enclosing character
escaped_by – Sets the escape character
input_fields_terminated_by – Sets the field separator character
input_lines_terminated_by – Sets the end-of-line character
input_optionally_enclosed_by – Sets a field enclosing character
batch – Use batch mode for underlying statement execution
relaxed_isolation – Transaction isolation to read uncommitted for the mappers
extra_export_options – Extra export options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.

airflow.contrib.hooks.sqoop_hook¶

Module Contents¶

`airflow.contrib.hooks.sqoop_hook`¶