airflow.providers.apache.livy.operators.livy
¶
This module contains the Apache Livy operator.
Module Contents¶
Classes¶
Wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster. |
- class airflow.providers.apache.livy.operators.livy.LivyOperator(*, file, class_name=None, args=None, conf=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, proxy_user=None, livy_conn_id='livy_default', livy_conn_auth_type=None, polling_interval=0, extra_options=None, extra_headers=None, retry_args=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.
- Parameters
file (str) – path of the file containing the application to execute (required). (templated)
class_name (str | None) – name of the application Java/Spark main class. (templated)
args (collections.abc.Sequence[str | int | float] | None) – application command line arguments. (templated)
jars (collections.abc.Sequence[str] | None) – jars to be used in this sessions. (templated)
py_files (collections.abc.Sequence[str] | None) – python files to be used in this session. (templated)
files (collections.abc.Sequence[str] | None) – files to be used in this session. (templated)
driver_memory (str | None) – amount of memory to use for the driver process. (templated)
driver_cores (int | str | None) – number of cores to use for the driver process. (templated)
executor_memory (str | None) – amount of memory to use per executor process. (templated)
executor_cores (int | str | None) – number of cores to use for each executor. (templated)
num_executors (int | str | None) – number of executors to launch for this session. (templated)
archives (collections.abc.Sequence[str] | None) – archives to be used in this session. (templated)
queue (str | None) – name of the YARN queue to which the application is submitted. (templated)
name (str | None) – name of this session. (templated)
conf (dict[Any, Any] | None) – Spark configuration properties. (templated)
proxy_user (str | None) – user to impersonate when running the job. (templated)
livy_conn_id (str) – reference to a pre-defined Livy Connection.
livy_conn_auth_type (Any | None) – The auth type for the Livy Connection.
polling_interval (int) – time in seconds between polling for job completion. Don’t poll for values <= 0
extra_options (dict[str, Any] | None) – A dictionary of options, where key is string and value depends on the option that’s being modified.
extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.
retry_args (dict[str, Any] | None) – Arguments which define the retry behaviour. See Tenacity documentation at https://github.com/jd/tenacity
deferrable (bool) – Run operator in the deferrable mode
- template_fields: collections.abc.Sequence[str] = ('spark_params',)[source]¶
- execute(context)[source]¶
Derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.