airflow.providers.apache.spark.operators.spark_sql

Module Contents

Classes

SparkSqlOperator

Execute Spark SQL query.

class airflow.providers.apache.spark.operators.spark_sql.SparkSqlOperator(*, sql, conf=None, conn_id='spark_sql_default', total_executor_cores=None, executor_cores=None, executor_memory=None, keytab=None, principal=None, master=None, name='default-name', num_executors=None, verbose=True, yarn_queue=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Execute Spark SQL query.

See also

For more information on how to use this operator, take a look at the guide: SparkSqlOperator

Parameters
  • sql (str) – The SQL query to execute. (templated)

  • conf (str | None) – arbitrary Spark configuration property

  • conn_id (str) – connection_id string

  • total_executor_cores (int | None) – (Standalone & Mesos only) Total cores for all executors (Default: all the available cores on the worker)

  • executor_cores (int | None) – (Standalone & YARN only) Number of cores per executor (Default: 2)

  • executor_memory (str | None) – Memory per executor (e.g. 1000M, 2G) (Default: 1G)

  • keytab (str | None) – Full path to the file that contains the keytab

  • master (str | None) – spark://host:port, mesos://host:port, yarn, or local (Default: The host and port set in the Connection, or "yarn")

  • name (str) – Name of the job

  • num_executors (int | None) – Number of executors to launch

  • verbose (bool) – Whether to pass the verbose flag to spark-sql

  • yarn_queue (str | None) – The YARN queue to submit to (Default: The queue value set in the Connection, or "default")

template_fields: Sequence[str] = ('sql',)[source]
template_ext: Sequence[str] = ('.sql', '.hql')[source]
template_fields_renderers[source]
execute(context)[source]

Call the SparkSqlHook to run the provided sql query.

on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

Was this entry helpful?