`airflow.providers.apache.sqoop.operators.sqoop`¶

This module contains a sqoop 1 operator.

Module Contents¶

Classes¶

SqoopOperator

Execute a Sqoop job.

class airflow.providers.apache.sqoop.operators.sqoop.SqoopOperator(*, conn_id='sqoop_default', cmd_type='import', table=None, query=None, target_dir=None, append=False, file_type='text', columns=None, num_mappers=None, split_by=None, where=None, export_dir=None, input_null_string=None, input_null_non_string=None, staging_table=None, clear_staging_table=False, enclosed_by=None, escaped_by=None, input_fields_terminated_by=None, input_lines_terminated_by=None, input_optionally_enclosed_by=None, batch=False, direct=False, driver=None, verbose=False, relaxed_isolation=False, properties=None, hcatalog_database=None, hcatalog_table=None, create_hcatalog_table=False, extra_options=None, schema=None, libjars=None, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Execute a Sqoop job.

Documentation for Apache Sqoop can be found here: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Parameters

conn_id (str) – str
cmd_type (str) – str specify command to execute “export” or “import”
schema (str | None) – Schema name
table (str | None) – Table to read
query (str | None) – Import result of arbitrary SQL query. Instead of using the table, columns and where arguments, you can specify a SQL statement with the query argument. Must also specify a destination directory with target_dir.
target_dir (str | None) – HDFS destination directory where the data from the rdbms will be written
append (bool) – Append data to an existing dataset in HDFS
file_type (str) – “avro”, “sequence”, “text” Imports data to into the specified format. Defaults to text.
columns (str | None) – <col,col,col> Columns to import from table
num_mappers (int | None) – Use n mapper tasks to import/export in parallel
split_by (str | None) – Column of the table used to split work units
where (str | None) – WHERE clause to use during import
export_dir (str | None) – HDFS Hive database directory to export to the rdbms
input_null_string (str | None) – The string to be interpreted as null for string columns
input_null_non_string (str | None) – The string to be interpreted as null for non-string columns
staging_table (str | None) – The table in which data will be staged before being inserted into the destination table
clear_staging_table (bool) – Indicate that any data present in the staging table can be deleted
enclosed_by (str | None) – Sets a required field enclosing character
escaped_by (str | None) – Sets the escape character
input_fields_terminated_by (str | None) – Sets the input field separator
input_lines_terminated_by (str | None) – Sets the input end-of-line character
input_optionally_enclosed_by (str | None) – Sets a field enclosing character
batch (bool) – Use batch mode for underlying statement execution
direct (bool) – Use direct export fast path
driver (Any | None) – Manually specify JDBC driver class to use
verbose (bool) – Switch to more verbose logging for debug purposes
relaxed_isolation (bool) – use read uncommitted isolation level
hcatalog_database (str | None) – Specifies the database name for the HCatalog table
hcatalog_table (str | None) – The argument value for this option is the HCatalog table
create_hcatalog_table (bool) – Have sqoop create the hcatalog table passed in or not
properties (dict[str, Any] | None) – additional JVM properties passed to sqoop
extra_options (dict[str, Any] | None) – Extra import/export options to pass as dict to the SqoopHook. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.
libjars (str | None) – Optional Comma separated jar files to include in the classpath.

template_fields: Sequence[str] = ('conn_id', 'cmd_type', 'table', 'query', 'target_dir', 'file_type', 'columns', 'split_by',...[source]¶

template_fields_renderers[source]¶

ui_color = '#7D8CA4'[source]¶

execute(context)[source]¶

Execute sqoop job.

on_kill()[source]¶

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

airflow.providers.apache.sqoop.operators.sqoop¶

Module Contents¶

Classes¶

`airflow.providers.apache.sqoop.operators.sqoop`¶