airflow.providers.apache.sqoop.hooks.sqoop

This module contains a sqoop 1.x hook.

Module Contents

Classes

SqoopHook

Wrapper around the sqoop 1 binary.

class airflow.providers.apache.sqoop.hooks.sqoop.SqoopHook(conn_id=default_conn_name, verbose=False, num_mappers=None, hcatalog_database=None, hcatalog_table=None, properties=None, libjars=None, extra_options=None)[source]

Bases: airflow.hooks.base.BaseHook

Wrapper around the sqoop 1 binary.

To be able to use the hook, it is required that “sqoop” is in the PATH.

Additional arguments that can be passed via the ‘extra’ JSON field of the sqoop connection:

  • job_tracker: Job tracker local|jobtracker:port.

  • namenode: Namenode.

  • files: Comma separated files to be copied to the map reduce cluster.

  • archives: Comma separated archives to be unarchived on the compute

    machines.

  • password_file: Path to file containing the password.

Parameters
  • conn_id (str) – Reference to the sqoop connection.

  • verbose (bool) – Set sqoop to verbose.

  • num_mappers (int | None) – Number of map tasks to import in parallel.

  • properties (dict[str, Any] | None) – Properties to set via the -D argument

  • libjars (str | None) – Optional Comma separated jar files to include in the classpath.

  • extra_options (dict[str, Any] | None) – Extra import/export options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.

conn_name_attr = 'conn_id'[source]
default_conn_name = 'sqoop_default'[source]
conn_type = 'sqoop'[source]
hook_name = 'Sqoop'[source]
get_conn()[source]

Return connection for the hook.

cmd_mask_password(cmd_orig)[source]

Mask command password for safety.

popen(cmd, **kwargs)[source]

Remote Popen.

Parameters
  • cmd (list[str]) – command to remotely execute

  • kwargs (Any) – extra arguments to Popen (see subprocess.Popen)

Returns

handle to subprocess

Return type

None

import_table(table, target_dir=None, append=False, file_type='text', columns=None, split_by=None, where=None, direct=False, driver=None, schema=None)[source]

Import table from remote location to target dir.

Arguments are copies of direct sqoop command line arguments.

Parameters
  • table (str) – Table to read

  • schema (str | None) – Schema name

  • target_dir (str | None) – HDFS destination dir

  • append (bool) – Append data to an existing dataset in HDFS

  • file_type (str) – “avro”, “sequence”, “text” or “parquet”. Imports data to into the specified format. Defaults to text.

  • columns (str | None) – <col,col,col…> Columns to import from table

  • split_by (str | None) – Column of the table used to split work units

  • where (str | None) – WHERE clause to use during import

  • direct (bool) – Use direct connector if exists for the database

  • driver (Any) – Manually specify JDBC driver class to use

import_query(query, target_dir=None, append=False, file_type='text', split_by=None, direct=None, driver=None)[source]

Import a specific query from the rdbms to hdfs.

Parameters
  • query (str) – Free format query to run

  • target_dir (str | None) – HDFS destination dir

  • append (bool) – Append data to an existing dataset in HDFS

  • file_type (str) – “avro”, “sequence”, “text” or “parquet” Imports data to hdfs into the specified format. Defaults to text.

  • split_by (str | None) – Column of the table used to split work units

  • direct (bool | None) – Use direct import fast path

  • driver (Any | None) – Manually specify JDBC driver class to use

export_table(table, export_dir=None, input_null_string=None, input_null_non_string=None, staging_table=None, clear_staging_table=False, enclosed_by=None, escaped_by=None, input_fields_terminated_by=None, input_lines_terminated_by=None, input_optionally_enclosed_by=None, batch=False, relaxed_isolation=False, schema=None)[source]

Export Hive table to remote location.

Arguments are copies of direct Sqoop command line Arguments

Parameters
  • table (str) – Table remote destination

  • schema (str | None) – Schema name

  • export_dir (str | None) – Hive table to export

  • input_null_string (str | None) – The string to be interpreted as null for string columns

  • input_null_non_string (str | None) – The string to be interpreted as null for non-string columns

  • staging_table (str | None) – The table in which data will be staged before being inserted into the destination table

  • clear_staging_table (bool) – Indicate that any data present in the staging table can be deleted

  • enclosed_by (str | None) – Sets a required field enclosing character

  • escaped_by (str | None) – Sets the escape character

  • input_fields_terminated_by (str | None) – Sets the field separator character

  • input_lines_terminated_by (str | None) – Sets the end-of-line character

  • input_optionally_enclosed_by (str | None) – Sets a field enclosing character

  • batch (bool) – Use batch mode for underlying statement execution

  • relaxed_isolation (bool) – Transaction isolation to read uncommitted for the mappers

Was this entry helpful?