airflow.providers.apache.spark.operators.spark_sql

Module Contents

Classes

SparkSqlOperator

Execute Spark SQL query

class airflow.providers.apache.spark.operators.spark_sql.SparkSqlOperator(*, sql, conf=None, conn_id='spark_sql_default', total_executor_cores=None, executor_cores=None, executor_memory=None, keytab=None, principal=None, master=None, name='default-name', num_executors=None, verbose=True, yarn_queue=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Execute Spark SQL query

See also

For more information on how to use this operator, take a look at the guide: SparkSqlOperator

Parameters
  • sql (str) -- The SQL query to execute. (templated)

  • conf (Optional[str]) -- arbitrary Spark configuration property

  • conn_id (str) -- connection_id string

  • total_executor_cores (Optional[int]) -- (Standalone & Mesos only) Total cores for all executors (Default: all the available cores on the worker)

  • executor_cores (Optional[int]) -- (Standalone & YARN only) Number of cores per executor (Default: 2)

  • executor_memory (Optional[str]) -- Memory per executor (e.g. 1000M, 2G) (Default: 1G)

  • keytab (Optional[str]) -- Full path to the file that contains the keytab

  • master (Optional[str]) -- spark://host:port, mesos://host:port, yarn, or local (Default: The host and port set in the Connection, or "yarn")

  • name (str) -- Name of the job

  • num_executors (Optional[int]) -- Number of executors to launch

  • verbose (bool) -- Whether to pass the verbose flag to spark-sql

  • yarn_queue (Optional[str]) -- The YARN queue to submit to (Default: The queue value set in the Connection, or "default")

template_fields :Sequence[str] = ['_sql'][source]
template_ext :Sequence[str] = ['.sql', '.hql'][source]
template_fields_renderers[source]
execute(self, context)[source]

Call the SparkSqlHook to run the provided sql query

on_kill(self)[source]

Override this method to cleanup subprocesses when a task instance gets killed. Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up or it will leave ghost processes behind.

Was this entry helpful?