`airflow.contrib.hooks.spark_sql_hook`¶

Module Contents¶

class airflow.contrib.hooks.spark_sql_hook.SparkSqlHook(sql, conf=None, conn_id='spark_sql_default', total_executor_cores=None, executor_cores=None, executor_memory=None, keytab=None, principal=None, master='yarn', name='default-name', num_executors=None, verbose=True, yarn_queue='default')[source]¶

Bases: airflow.hooks.base_hook.BaseHook

This hook is a wrapper around the spark-sql binary. It requires that the “spark-sql” binary is in the PATH.

Parameters

sql (str) – The SQL query to execute
conf (str (format: PROP=VALUE)) – arbitrary Spark configuration property
conn_id (str) – connection_id string
total_executor_cores (int) – (Standalone & Mesos only) Total cores for all executors (Default: all the available cores on the worker)
executor_cores (int) – (Standalone & YARN only) Number of cores per executor (Default: 2)
executor_memory (str) – Memory per executor (e.g. 1000M, 2G) (Default: 1G)
keytab (str) – Full path to the file that contains the keytab
master (str) – spark://host:port, mesos://host:port, yarn, or local
name (str) – Name of the job.
num_executors (int) – Number of executors to launch
verbose (bool) – Whether to pass the verbose flag to spark-sql
yarn_queue (str) – The YARN queue to submit to (Default: “default”)

get_conn(self)[source]¶

_prepare_command(self, cmd)[source]¶

Construct the spark-sql command to execute. Verbose output is enabled as default.

Parameters: cmd (str) – command to append to the spark-sql command
Returns: full command to be executed

run_query(self, cmd='', **kwargs)[source]¶

Remote Popen (actually execute the Spark-sql query)

Parameters

cmd – command to remotely execute
kwargs – extra arguments to Popen (see subprocess.Popen)

kill(self)[source]¶

airflow.contrib.hooks.spark_sql_hook¶

Module Contents¶

`airflow.contrib.hooks.spark_sql_hook`¶