`airflow.providers.apache.livy.operators.livy`¶

This module contains the Apache Livy operator.

Module Contents¶

class airflow.providers.apache.livy.operators.livy.LivyOperator(*, file: str, class_name: Optional[str] = None, args: Optional[Sequence[Union[str, int, float]]] = None, conf: Optional[Dict[Any, Any]] = None, jars: Optional[Sequence[str]] = None, py_files: Optional[Sequence[str]] = None, files: Optional[Sequence[str]] = None, driver_memory: Optional[str] = None, driver_cores: Optional[Union[int, str]] = None, executor_memory: Optional[str] = None, executor_cores: Optional[Union[int, str]] = None, num_executors: Optional[Union[int, str]] = None, archives: Optional[Sequence[str]] = None, queue: Optional[str] = None, name: Optional[str] = None, proxy_user: Optional[str] = None, livy_conn_id: str = 'livy_default', polling_interval: int = 0, extra_options: Optional[Dict[str, Any]] = None, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.

Parameters

file (str) -- path of the file containing the application to execute (required).
class_name (str) -- name of the application Java/Spark main class.
args (list) -- application command line arguments.
jars (list) -- jars to be used in this sessions.
py_files (list) -- python files to be used in this session.
files (list) -- files to be used in this session.
driver_memory (str) -- amount of memory to use for the driver process.
driver_cores (str, int) -- number of cores to use for the driver process.
executor_memory (str) -- amount of memory to use per executor process.
executor_cores (str, int) -- number of cores to use for each executor.
num_executors (str, int) -- number of executors to launch for this session.
archives (list) -- archives to be used in this session.
queue (str) -- name of the YARN queue to which the application is submitted.
name (str) -- name of this session.
conf (dict) -- Spark configuration properties.
proxy_user (str) -- user to impersonate when running the job.
livy_conn_id (str) -- reference to a pre-defined Livy Connection.
polling_interval (int) -- time in seconds between polling for job completion. Don't poll for values >=0

template_fields = ['spark_params'][source]¶

get_hook(self)[source]¶

Get valid hook.

Returns: hook
Return type: LivyHook

execute(self, context: Dict[Any, Any])[source]¶

poll_for_termination(self, batch_id: Union[int, str])[source]¶

Pool Livy for batch termination.

Parameters: batch_id (int) -- id of the batch session to monitor.

on_kill(self)[source]¶

kill(self)[source]¶: Delete the current batch session.

airflow.providers.apache.livy.operators.livy¶

Module Contents¶

`airflow.providers.apache.livy.operators.livy`¶