This module contains the Apache Livy operator.

Module Contents



This operator wraps the Apache Livy batch REST API, allowing to submit a Spark

class airflow.providers.apache.livy.operators.livy.LivyOperator(*, file, class_name=None, args=None, conf=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, proxy_user=None, livy_conn_id='livy_default', polling_interval=0, extra_options=None, extra_headers=None, retry_args=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.

  • file (str) – path of the file containing the application to execute (required).

  • class_name (Optional[str]) – name of the application Java/Spark main class.

  • args (Optional[Sequence[Union[str, int, float]]]) – application command line arguments.

  • jars (Optional[Sequence[str]]) – jars to be used in this sessions.

  • py_files (Optional[Sequence[str]]) – python files to be used in this session.

  • files (Optional[Sequence[str]]) – files to be used in this session.

  • driver_memory (Optional[str]) – amount of memory to use for the driver process.

  • driver_cores (Optional[Union[int, str]]) – number of cores to use for the driver process.

  • executor_memory (Optional[str]) – amount of memory to use per executor process.

  • executor_cores (Optional[Union[int, str]]) – number of cores to use for each executor.

  • num_executors (Optional[Union[int, str]]) – number of executors to launch for this session.

  • archives (Optional[Sequence[str]]) – archives to be used in this session.

  • queue (Optional[str]) – name of the YARN queue to which the application is submitted.

  • name (Optional[str]) – name of this session.

  • conf (Optional[Dict[Any, Any]]) – Spark configuration properties.

  • proxy_user (Optional[str]) – user to impersonate when running the job.

  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • polling_interval (int) – time in seconds between polling for job completion. Don’t poll for values >=0

  • extra_options (Optional[Dict[str, Any]]) – A dictionary of options, where key is string and value depends on the option that’s being modified.

  • extra_headers (Optional[Dict[str, Any]]) – A dictionary of headers passed to the HTTP request to livy.

  • retry_args (Optional[Dict[str, Any]]) – Arguments which define the retry behaviour. See Tenacity documentation at

template_fields :Sequence[str] = ['spark_params'][source]

Get valid hook.



Return type


execute(self, context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

poll_for_termination(self, batch_id)[source]

Pool Livy for batch termination.


batch_id (Union[int, str]) – id of the batch session to monitor.


Override this method to cleanup subprocesses when a task instance gets killed. Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up or it will leave ghost processes behind.


Delete the current batch session.

Was this entry helpful?