airflow.providers.apache.spark.operators.spark_sql

Module Contents

class airflow.providers.apache.spark.operators.spark_sql.SparkSqlOperator(*, sql: str, conf: Optional[str] = None, conn_id: str = 'spark_sql_default', total_executor_cores: Optional[int] = None, executor_cores: Optional[int] = None, executor_memory: Optional[str] = None, keytab: Optional[str] = None, principal: Optional[str] = None, master: str = 'yarn', name: str = 'default-name', num_executors: Optional[int] = None, verbose: bool = True, yarn_queue: str = 'default', **kwargs)[source]

Bases: airflow.models.BaseOperator

Execute Spark SQL query

See also

For more information on how to use this operator, take a look at the guide: SparkSqlOperator

Parameters
  • sql (str) – The SQL query to execute. (templated)

  • conf (str (format: PROP=VALUE)) – arbitrary Spark configuration property

  • conn_id (str) – connection_id string

  • total_executor_cores (int) – (Standalone & Mesos only) Total cores for all executors (Default: all the available cores on the worker)

  • executor_cores (int) – (Standalone & YARN only) Number of cores per executor (Default: 2)

  • executor_memory (str) – Memory per executor (e.g. 1000M, 2G) (Default: 1G)

  • keytab (str) – Full path to the file that contains the keytab

  • master (str) – spark://host:port, mesos://host:port, yarn, or local

  • name (str) – Name of the job

  • num_executors (int) – Number of executors to launch

  • verbose (bool) – Whether to pass the verbose flag to spark-sql

  • yarn_queue (str) – The YARN queue to submit to (Default: “default”)

template_fields = ['_sql'][source]
template_ext = ['.sql', '.hql'][source]
execute(self, context: Dict[str, Any])[source]

Call the SparkSqlHook to run the provided sql query

on_kill(self)[source]
_get_hook(self)[source]

Get SparkSqlHook

Was this entry helpful?