`airflow.providers.apache.spark.hooks.spark_sql`¶

Module Contents¶

Classes¶

SparkSqlHook

This hook is a wrapper around the spark-sql binary. It requires that the

class airflow.providers.apache.spark.hooks.spark_sql.SparkSqlHook(sql, conf=None, conn_id=default_conn_name, total_executor_cores=None, executor_cores=None, executor_memory=None, keytab=None, principal=None, master=None, name='default-name', num_executors=None, verbose=True, yarn_queue=None)[source]¶

Bases: airflow.hooks.base.BaseHook

This hook is a wrapper around the spark-sql binary. It requires that the "spark-sql" binary is in the PATH.

Parameters

sql (str) -- The SQL query to execute
conf (Optional[str]) -- arbitrary Spark configuration property
conn_id (str) -- connection_id string
total_executor_cores (Optional[int]) -- (Standalone & Mesos only) Total cores for all executors (Default: all the available cores on the worker)
executor_cores (Optional[int]) -- (Standalone & YARN only) Number of cores per executor (Default: 2)
executor_memory (Optional[str]) -- Memory per executor (e.g. 1000M, 2G) (Default: 1G)
keytab (Optional[str]) -- Full path to the file that contains the keytab
master (Optional[str]) -- spark://host:port, mesos://host:port, yarn, or local (Default: The host and port set in the Connection, or "yarn")
name (str) -- Name of the job.
num_executors (Optional[int]) -- Number of executors to launch
verbose (bool) -- Whether to pass the verbose flag to spark-sql
yarn_queue (Optional[str]) -- The YARN queue to submit to (Default: The queue value set in the Connection, or "default")

conn_name_attr = conn_id[source]¶

default_conn_name = spark_sql_default[source]¶

conn_type = spark_sql[source]¶

hook_name = Spark SQL[source]¶

get_conn(self)[source]¶

Returns connection for the hook.

run_query(self, cmd='', **kwargs)[source]¶

Remote Popen (actually execute the Spark-sql query)

Parameters

cmd (str) -- command to append to the spark-sql command
kwargs (Any) -- extra arguments to Popen (see subprocess.Popen)

kill(self)[source]¶

Kill Spark job

airflow.providers.apache.spark.hooks.spark_sql¶

Module Contents¶

Classes¶

`airflow.providers.apache.spark.hooks.spark_sql`¶