airflow.providers.apache.spark.hooks.spark_sql

Module Contents

class airflow.providers.apache.spark.hooks.spark_sql.SparkSqlHook(sql: str, conf: Optional[str] = None, conn_id: str = default_conn_name, total_executor_cores: Optional[int] = None, executor_cores: Optional[int] = None, executor_memory: Optional[str] = None, keytab: Optional[str] = None, principal: Optional[str] = None, master: str = 'yarn', name: str = 'default-name', num_executors: Optional[int] = None, verbose: bool = True, yarn_queue: str = 'default')[source]

Bases: airflow.hooks.base.BaseHook

This hook is a wrapper around the spark-sql binary. It requires that the "spark-sql" binary is in the PATH.

Parameters
  • sql (str) -- The SQL query to execute

  • conf (str (format: PROP=VALUE)) -- arbitrary Spark configuration property

  • conn_id (str) -- connection_id string

  • total_executor_cores (int) -- (Standalone & Mesos only) Total cores for all executors (Default: all the available cores on the worker)

  • executor_cores (int) -- (Standalone & YARN only) Number of cores per executor (Default: 2)

  • executor_memory (str) -- Memory per executor (e.g. 1000M, 2G) (Default: 1G)

  • keytab (str) -- Full path to the file that contains the keytab

  • master (str) -- spark://host:port, mesos://host:port, yarn, or local

  • name (str) -- Name of the job.

  • num_executors (int) -- Number of executors to launch

  • verbose (bool) -- Whether to pass the verbose flag to spark-sql

  • yarn_queue (str) -- The YARN queue to submit to (Default: "default")

conn_name_attr = conn_id[source]
default_conn_name = spark_sql_default[source]
conn_type = spark_sql[source]
hook_name = Spark SQL[source]
get_conn(self)[source]
_prepare_command(self, cmd: Union[str, List[str]])[source]

Construct the spark-sql command to execute. Verbose output is enabled as default.

Parameters

cmd (str or list[str]) -- command to append to the spark-sql command

Returns

full command to be executed

run_query(self, cmd: str = '', **kwargs)[source]

Remote Popen (actually execute the Spark-sql query)

Parameters
  • cmd (str or list[str]) -- command to append to the spark-sql command

  • kwargs (dict) -- extra arguments to Popen (see subprocess.Popen)

kill(self)[source]

Kill Spark job

Was this entry helpful?