airflow.providers.apache.sqoop.operators.sqoop

This module contains a sqoop 1 operator.

Module Contents

Classes

SqoopOperator

Execute a Sqoop job.

class airflow.providers.apache.sqoop.operators.sqoop.SqoopOperator(*, conn_id='sqoop_default', cmd_type='import', table=None, query=None, target_dir=None, append=False, file_type='text', columns=None, num_mappers=None, split_by=None, where=None, export_dir=None, input_null_string=None, input_null_non_string=None, staging_table=None, clear_staging_table=False, enclosed_by=None, escaped_by=None, input_fields_terminated_by=None, input_lines_terminated_by=None, input_optionally_enclosed_by=None, batch=False, direct=False, driver=None, verbose=False, relaxed_isolation=False, properties=None, hcatalog_database=None, hcatalog_table=None, create_hcatalog_table=False, extra_options=None, schema=None, libjars=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Execute a Sqoop job.

Documentation for Apache Sqoop can be found here: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Parameters
  • conn_id (str) – str

  • cmd_type (str) – str specify command to execute “export” or “import”

  • schema (str | None) – Schema name

  • table (str | None) – Table to read

  • query (str | None) – Import result of arbitrary SQL query. Instead of using the table, columns and where arguments, you can specify a SQL statement with the query argument. Must also specify a destination directory with target_dir.

  • target_dir (str | None) – HDFS destination directory where the data from the rdbms will be written

  • append (bool) – Append data to an existing dataset in HDFS

  • file_type (str) – “avro”, “sequence”, “text” Imports data to into the specified format. Defaults to text.

  • columns (str | None) – <col,col,col> Columns to import from table

  • num_mappers (int | None) – Use n mapper tasks to import/export in parallel

  • split_by (str | None) – Column of the table used to split work units

  • where (str | None) – WHERE clause to use during import

  • export_dir (str | None) – HDFS Hive database directory to export to the rdbms

  • input_null_string (str | None) – The string to be interpreted as null for string columns

  • input_null_non_string (str | None) – The string to be interpreted as null for non-string columns

  • staging_table (str | None) – The table in which data will be staged before being inserted into the destination table

  • clear_staging_table (bool) – Indicate that any data present in the staging table can be deleted

  • enclosed_by (str | None) – Sets a required field enclosing character

  • escaped_by (str | None) – Sets the escape character

  • input_fields_terminated_by (str | None) – Sets the input field separator

  • input_lines_terminated_by (str | None) – Sets the input end-of-line character

  • input_optionally_enclosed_by (str | None) – Sets a field enclosing character

  • batch (bool) – Use batch mode for underlying statement execution

  • direct (bool) – Use direct export fast path

  • driver (Any | None) – Manually specify JDBC driver class to use

  • verbose (bool) – Switch to more verbose logging for debug purposes

  • relaxed_isolation (bool) – use read uncommitted isolation level

  • hcatalog_database (str | None) – Specifies the database name for the HCatalog table

  • hcatalog_table (str | None) – The argument value for this option is the HCatalog table

  • create_hcatalog_table (bool) – Have sqoop create the hcatalog table passed in or not

  • properties (dict[str, Any] | None) – additional JVM properties passed to sqoop

  • extra_options (dict[str, Any] | None) – Extra import/export options to pass as dict to the SqoopHook. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.

  • libjars (str | None) – Optional Comma separated jar files to include in the classpath.

template_fields: Sequence[str] = ('conn_id', 'cmd_type', 'table', 'query', 'target_dir', 'file_type', 'columns', 'split_by',...[source]
template_fields_renderers[source]
ui_color = '#7D8CA4'[source]
execute(context)[source]

Execute sqoop job.

on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

Was this entry helpful?