`airflow.contrib.operators.qubole_operator`¶

Module Contents¶

class airflow.contrib.operators.qubole_operator.QuboleOperator(qubole_conn_id='qubole_default', *args, **kwargs)[source]¶

Execute tasks (commands) on QDS (https://qubole.com).

Parameters: qubole_conn_id (str) – Connection id which consists of qds auth_token

kwargs:

command_type: type of command to be executed, e.g. hivecmd, shellcmd, hadoopcmd
tags: array of tags to be assigned with the command
cluster_label: cluster label on which the command will be executed
name: name to be given to command
notify: whether to send email on command completion or not (default is False)

Arguments specific to command types

hivecmd:

prestocmd:

hadoopcmd:

sub_commnad: must be one these [“jar”, “s3distcp”, “streaming”] followed by 1 or more args

shellcmd:

script: inline command with args
script_location: s3 location containing query statement
files: list of files in s3 bucket as file1,file2 format. These files will be copied into the working directory where the qubole command is being executed.
archives: list of archives in s3 bucket as archive1,archive2 format. These will be unarchived into the working directory where the qubole command is being executed
parameters: any extra args which need to be passed to script (only when script_location is supplied)

pigcmd:

script: inline query statement (latin_statements)
script_location: s3 location containing pig query
parameters: any extra args which need to be passed to script (only when script_location is supplied

sparkcmd:

program: the complete Spark Program in Scala, SQL, Command, R, or Python
cmdline: spark-submit command line, all required information must be specify in cmdline itself.
sql: inline sql query
script_location: s3 location containing query statement
language: language of the program, Scala, SQL, Command, R, or Python
app_id: ID of an Spark job server app
arguments: spark-submit command line arguments
user_program_arguments: arguments that the user program takes in
macros: macro values which were used in query

dbtapquerycmd:

dbexportcmd:

mode: 1 (simple), 2 (advance)
hive_table: Name of the hive table
partition_spec: partition specification for Hive table.
dbtap_id: data store ID of the target database, in Qubole.
db_table: name of the db table
db_update_mode: allowinsert or updateonly
db_update_keys: columns used to determine the uniqueness of rows
export_dir: HDFS/S3 location from which data will be exported.
fields_terminated_by: hex of the char used as column separator in the dataset

dbimportcmd:

mode: 1 (simple), 2 (advance)
hive_table: Name of the hive table
dbtap_id: data store ID of the target database, in Qubole.
db_table: name of the db table
where_clause: where clause, if any
parallelism: number of parallel db connections to use for extracting data
extract_query: SQL query to extract data from db. $CONDITIONS must be part of the where clause.
boundary_query: Query to be used get range of row IDs to be extracted
split_column: Column used as row ID to split data into ranges (mode 2)

template_fields = ['query', 'script_location', 'sub_command', 'script', 'files', 'archives', 'program', 'cmdline', 'sql', 'where_clause', 'tags', 'extract_query', 'boundary_query', 'macros', 'name', 'parameters', 'dbtap_id', 'hive_table', 'db_table', 'split_column', 'note_id', 'db_update_keys', 'export_dir', 'partition_spec', 'qubole_conn_id', 'arguments', 'user_program_arguments', 'cluster_label'][source]¶

get_results(self, ti=None, fp=None, inline=True, delim=None, fetch=True)[source]¶