`airflow.providers.apache.livy.operators.livy`¶

This module contains the Apache Livy operator.

Module Contents¶

Classes¶

LivyOperator

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark

class airflow.providers.apache.livy.operators.livy.LivyOperator(*, file, class_name=None, args=None, conf=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, proxy_user=None, livy_conn_id='livy_default', polling_interval=0, extra_options=None, extra_headers=None, retry_args=None, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.

Parameters

file (str) -- path of the file containing the application to execute (required).
class_name (Optional[str]) -- name of the application Java/Spark main class.
args (Optional[Sequence[Union[str, int, float]]]) -- application command line arguments.
jars (Optional[Sequence[str]]) -- jars to be used in this sessions.
py_files (Optional[Sequence[str]]) -- python files to be used in this session.
files (Optional[Sequence[str]]) -- files to be used in this session.
driver_memory (Optional[str]) -- amount of memory to use for the driver process.
driver_cores (Optional[Union[int, str]]) -- number of cores to use for the driver process.
executor_memory (Optional[str]) -- amount of memory to use per executor process.
executor_cores (Optional[Union[int, str]]) -- number of cores to use for each executor.
num_executors (Optional[Union[int, str]]) -- number of executors to launch for this session.
archives (Optional[Sequence[str]]) -- archives to be used in this session.
queue (Optional[str]) -- name of the YARN queue to which the application is submitted.
name (Optional[str]) -- name of this session.
conf (Optional[Dict[Any, Any]]) -- Spark configuration properties.
proxy_user (Optional[str]) -- user to impersonate when running the job.
livy_conn_id (str) -- reference to a pre-defined Livy Connection.
polling_interval (int) -- time in seconds between polling for job completion. Don't poll for values >=0
extra_options (Optional[Dict[str, Any]]) -- A dictionary of options, where key is string and value depends on the option that's being modified.
extra_headers (Optional[Dict[str, Any]]) -- A dictionary of headers passed to the HTTP request to livy.
retry_args (Optional[Dict[str, Any]]) -- Arguments which define the retry behaviour. See Tenacity documentation at https://github.com/jd/tenacity

template_fields :Sequence[str] = ['spark_params'][source]¶

get_hook(self)[source]¶

Get valid hook.

Returns: hook
Return type: LivyHook

execute(self, context)[source]¶

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

poll_for_termination(self, batch_id)[source]¶

Pool Livy for batch termination.

Parameters: batch_id (Union[int, str]) -- id of the batch session to monitor.

on_kill(self)[source]¶

Override this method to cleanup subprocesses when a task instance gets killed. Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up or it will leave ghost processes behind.

kill(self)[source]¶

Delete the current batch session.

airflow.providers.apache.livy.operators.livy¶

Module Contents¶

Classes¶

`airflow.providers.apache.livy.operators.livy`¶