airflow.providers.apache.livy.hooks.livy

This module contains the Apache Livy hook.

Module Contents

Classes

BatchState

Batch session states

LivyHook

Hook for Apache Livy through the REST API.

LivyAsyncHook

Hook for Apache Livy through the REST API asynchronously

class airflow.providers.apache.livy.hooks.livy.BatchState[source]

Bases: enum.Enum

Batch session states

NOT_STARTED = 'not_started'[source]
STARTING = 'starting'[source]
RUNNING = 'running'[source]
IDLE = 'idle'[source]
BUSY = 'busy'[source]
SHUTTING_DOWN = 'shutting_down'[source]
ERROR = 'error'[source]
DEAD = 'dead'[source]
KILLED = 'killed'[source]
SUCCESS = 'success'[source]
class airflow.providers.apache.livy.hooks.livy.LivyHook(livy_conn_id=default_conn_name, extra_options=None, extra_headers=None, auth_type=None)[source]

Bases: airflow.providers.http.hooks.http.HttpHook, airflow.utils.log.logging_mixin.LoggingMixin

Hook for Apache Livy through the REST API.

Parameters
  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • extra_options (dict[str, Any] | None) – A dictionary of options passed to Livy.

  • extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.

  • auth_type (Any | None) – The auth type for the service.

See also

For more details refer to the Apache Livy API reference: https://livy.apache.org/docs/latest/rest-api.html

TERMINAL_STATES[source]
conn_name_attr = 'livy_conn_id'[source]
default_conn_name = 'livy_default'[source]
conn_type = 'livy'[source]
hook_name = 'Apache Livy'[source]
get_conn(headers=None)[source]

Returns http session for use with requests

Parameters

headers (dict[str, Any] | None) – additional headers to be passed through as a dictionary

Returns

requests session

Return type

Any

run_method(endpoint, method='GET', data=None, headers=None, retry_args=None)[source]

Wrapper for HttpHook, allows to change method on the same HttpHook

Parameters
  • method (str) – http method

  • endpoint (str) – endpoint

  • data (Any | None) – request payload

  • headers (dict[str, Any] | None) – headers

  • retry_args (dict[str, Any] | None) – Arguments which define the retry behaviour. See Tenacity documentation at https://github.com/jd/tenacity

Returns

http response

Return type

Any

post_batch(*args, **kwargs)[source]

Perform request to submit batch

Returns

batch session id

Return type

int

get_batch(session_id)[source]

Fetch info about the specified batch

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

response body

Return type

dict

get_batch_state(session_id, retry_args=None)[source]

Fetch the state of the specified batch

Parameters
Returns

batch state

Return type

BatchState

delete_batch(session_id)[source]

Delete the specified batch

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

response body

Return type

dict

get_batch_logs(session_id, log_start_position, log_batch_size)[source]

Gets the session logs for a specified batch. :param session_id: identifier of the batch sessions :param log_start_position: Position from where to pull the logs :param log_batch_size: Number of lines to pull in one batch

Returns

response body

Return type

dict

dump_batch_logs(session_id)[source]

Dumps the session logs for a specified batch

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

response body

Return type

None

static build_post_batch_body(file, args=None, class_name=None, jars=None, py_files=None, files=None, archives=None, name=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, queue=None, proxy_user=None, conf=None)[source]

Build the post batch request body.

See also

For more information about the format refer to https://livy.apache.org/docs/latest/rest-api.html

Parameters
  • file (str) – Path of the file containing the application to execute (required).

  • proxy_user (str | None) – User to impersonate when running the job.

  • class_name (str | None) – Application Java/Spark main class string.

  • args (Sequence[str | int | float] | None) – Command line arguments for the application s.

  • jars (list[str] | None) – jars to be used in this sessions.

  • py_files (list[str] | None) – Python files to be used in this session.

  • files (list[str] | None) – files to be used in this session.

  • driver_memory (str | None) – Amount of memory to use for the driver process string.

  • driver_cores (int | str | None) – Number of cores to use for the driver process int.

  • executor_memory (str | None) – Amount of memory to use per executor process string.

  • executor_cores (int | None) – Number of cores to use for each executor int.

  • num_executors (int | str | None) – Number of executors to launch for this session int.

  • archives (list[str] | None) – Archives to be used in this session.

  • queue (str | None) – The name of the YARN queue to which submitted string.

  • name (str | None) – The name of this session string.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

Returns

request body

Return type

dict

class airflow.providers.apache.livy.hooks.livy.LivyAsyncHook(livy_conn_id=default_conn_name, extra_options=None, extra_headers=None)[source]

Bases: airflow.providers.http.hooks.http.HttpAsyncHook, airflow.utils.log.logging_mixin.LoggingMixin

Hook for Apache Livy through the REST API asynchronously

Parameters
  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • extra_options (dict[str, Any] | None) – A dictionary of options passed to Livy.

  • extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.

See also

For more details refer to the Apache Livy API reference: https://livy.apache.org/docs/latest/rest-api.html

TERMINAL_STATES[source]
conn_name_attr = 'livy_conn_id'[source]
default_conn_name = 'livy_default'[source]
conn_type = 'livy'[source]
hook_name = 'Apache Livy'[source]
async run_method(endpoint, method='GET', data=None, headers=None)[source]

Wrapper for HttpAsyncHook, allows to change method on the same HttpAsyncHook

Parameters
  • method (str) – http method

  • endpoint (str) – endpoint

  • data (Any | None) – request payload

  • headers (dict[str, Any] | None) – headers

Returns

http response

Return type

Any

async get_batch_state(session_id)[source]

Fetch the state of the specified batch asynchronously.

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

batch state

Return type

Any

async get_batch_logs(session_id, log_start_position, log_batch_size)[source]

Gets the session logs for a specified batch asynchronously.

Parameters
  • session_id (int | str) – identifier of the batch sessions

  • log_start_position (int) – Position from where to pull the logs

  • log_batch_size (int) – Number of lines to pull in one batch

Returns

response body

Return type

Any

async dump_batch_logs(session_id)[source]

Dumps the session logs for a specified batch asynchronously

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

response body

Return type

Any

static build_post_batch_body(file, args=None, class_name=None, jars=None, py_files=None, files=None, archives=None, name=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, queue=None, proxy_user=None, conf=None)[source]

Build the post batch request body.

Parameters
  • file (str) – Path of the file containing the application to execute (required).

  • proxy_user (str | None) – User to impersonate when running the job.

  • class_name (str | None) – Application Java/Spark main class string.

  • args (Sequence[str | int | float] | None) – Command line arguments for the application s.

  • jars (list[str] | None) – jars to be used in this sessions.

  • py_files (list[str] | None) – Python files to be used in this session.

  • files (list[str] | None) – files to be used in this session.

  • driver_memory (str | None) – Amount of memory to use for the driver process string.

  • driver_cores (int | str | None) – Number of cores to use for the driver process int.

  • executor_memory (str | None) – Amount of memory to use per executor process string.

  • executor_cores (int | None) – Number of cores to use for each executor int.

  • num_executors (int | str | None) – Number of executors to launch for this session int.

  • archives (list[str] | None) – Archives to be used in this session.

  • queue (str | None) – The name of the YARN queue to which submitted string.

  • name (str | None) – The name of this session string.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

Returns

request body

Return type

dict[str, Any]

Was this entry helpful?