airflow.providers.apache.livy.hooks.livy

Classes

BatchState

Batch session states.

LivyHook

Hook for Apache Livy through the REST API.

LivyAsyncHook

Hook for Apache Livy through the REST API asynchronously.

Functions

sanitize_endpoint_prefix(endpoint_prefix)

Ensure that the endpoint prefix is prefixed with a slash.

Module Contents

class airflow.providers.apache.livy.hooks.livy.BatchState[source]

Bases: enum.Enum

Batch session states.

NOT_STARTED = 'not_started'[source]
STARTING = 'starting'[source]
RUNNING = 'running'[source]
IDLE = 'idle'[source]
BUSY = 'busy'[source]
SHUTTING_DOWN = 'shutting_down'[source]
ERROR = 'error'[source]
DEAD = 'dead'[source]
KILLED = 'killed'[source]
SUCCESS = 'success'[source]
airflow.providers.apache.livy.hooks.livy.sanitize_endpoint_prefix(endpoint_prefix)[source]

Ensure that the endpoint prefix is prefixed with a slash.

class airflow.providers.apache.livy.hooks.livy.LivyHook(livy_conn_id=default_conn_name, extra_options=None, extra_headers=None, auth_type=None, endpoint_prefix=None)[source]

Bases: airflow.providers.http.hooks.http.HttpHook

Hook for Apache Livy through the REST API.

Parameters:
  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • extra_options (dict[str, Any] | None) – A dictionary of options passed to Livy.

  • extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.

  • auth_type (Any | None) – The auth type for the service.

See also

For more details refer to the Apache Livy API reference: https://livy.apache.org/docs/latest/rest-api.html

TERMINAL_STATES[source]
conn_name_attr = 'livy_conn_id'[source]
default_conn_name = 'livy_default'[source]
conn_type = 'livy'[source]
hook_name = 'Apache Livy'[source]
default_headers[source]
method = 'POST'[source]
http_conn_id = 'livy_default'[source]
extra_headers[source]
extra_options[source]
endpoint_prefix = ''[source]
run_method(endpoint, method='GET', data=None, headers=None, retry_args=None)[source]

Wrap HttpHook; allows to change method on the same HttpHook.

Parameters:
  • method (str) – http method

  • endpoint (str) – endpoint

  • data (Any | None) – request payload

  • headers (dict[str, Any] | None) – headers

  • retry_args (dict[str, Any] | None) – Arguments which define the retry behaviour. See Tenacity documentation at https://github.com/jd/tenacity

Returns:

http response

Return type:

Any

post_batch(*args, **kwargs)[source]

Perform request to submit batch.

Returns:

batch session id

Return type:

int

get_batch(session_id)[source]

Fetch info about the specified batch.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

response body

Return type:

dict

get_batch_state(session_id, retry_args=None)[source]

Fetch the state of the specified batch.

Parameters:
Returns:

batch state

Return type:

BatchState

delete_batch(session_id)[source]

Delete the specified batch.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

response body

Return type:

dict

get_batch_logs(session_id, log_start_position, log_batch_size)[source]

Get the session logs for a specified batch.

Parameters:
  • session_id (int | str) – identifier of the batch sessions

  • log_start_position – Position from where to pull the logs

  • log_batch_size – Number of lines to pull in one batch

Returns:

response body

Return type:

dict

dump_batch_logs(session_id)[source]

Dump the session logs for a specified batch.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

response body

Return type:

None

static build_post_batch_body(file, args=None, class_name=None, jars=None, py_files=None, files=None, archives=None, name=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, queue=None, proxy_user=None, conf=None)[source]

Build the post batch request body.

See also

For more information about the format refer to https://livy.apache.org/docs/latest/rest-api.html

Parameters:
  • file (str) – Path of the file containing the application to execute (required).

  • proxy_user (str | None) – User to impersonate when running the job.

  • class_name (str | None) – Application Java/Spark main class string.

  • args (collections.abc.Sequence[str | int | float] | None) – Command line arguments for the application s.

  • jars (list[str] | None) – jars to be used in this sessions.

  • py_files (list[str] | None) – Python files to be used in this session.

  • files (list[str] | None) – files to be used in this session.

  • driver_memory (str | None) – Amount of memory to use for the driver process string.

  • driver_cores (int | str | None) – Number of cores to use for the driver process int.

  • executor_memory (str | None) – Amount of memory to use per executor process string.

  • executor_cores (int | None) – Number of cores to use for each executor int.

  • num_executors (int | str | None) – Number of executors to launch for this session int.

  • archives (list[str] | None) – Archives to be used in this session.

  • queue (str | None) – The name of the YARN queue to which submitted string.

  • name (str | None) – The name of this session string.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

Returns:

request body

Return type:

dict

class airflow.providers.apache.livy.hooks.livy.LivyAsyncHook(livy_conn_id=default_conn_name, extra_options=None, extra_headers=None, endpoint_prefix=None)[source]

Bases: airflow.providers.http.hooks.http.HttpAsyncHook

Hook for Apache Livy through the REST API asynchronously.

Parameters:
  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • extra_options (dict[str, Any] | None) – A dictionary of options passed to Livy.

  • extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.

See also

For more details refer to the Apache Livy API reference: https://livy.apache.org/docs/latest/rest-api.html

TERMINAL_STATES[source]
conn_name_attr = 'livy_conn_id'[source]
default_conn_name = 'livy_default'[source]
conn_type = 'livy'[source]
hook_name = 'Apache Livy'[source]
method = 'POST'[source]
http_conn_id = 'livy_default'[source]
extra_headers[source]
extra_options[source]
endpoint_prefix = ''[source]
async run_method(endpoint, method='GET', data=None, headers=None)[source]

Wrap HttpAsyncHook; allows to change method on the same HttpAsyncHook.

Parameters:
  • method (str) – http method

  • endpoint (str) – endpoint

  • data (Any | None) – request payload

  • headers (dict[str, Any] | None) – headers

Returns:

http response

Return type:

Any

async get_batch_state(session_id)[source]

Fetch the state of the specified batch asynchronously.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

batch state

Return type:

Any

async get_batch_logs(session_id, log_start_position, log_batch_size)[source]

Get the session logs for a specified batch asynchronously.

Parameters:
  • session_id (int | str) – identifier of the batch sessions

  • log_start_position (int) – Position from where to pull the logs

  • log_batch_size (int) – Number of lines to pull in one batch

Returns:

response body

Return type:

Any

async dump_batch_logs(session_id)[source]

Dump the session logs for a specified batch asynchronously.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

response body

Return type:

Any

static build_post_batch_body(file, args=None, class_name=None, jars=None, py_files=None, files=None, archives=None, name=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, queue=None, proxy_user=None, conf=None)[source]

Build the post batch request body.

Parameters:
  • file (str) – Path of the file containing the application to execute (required).

  • proxy_user (str | None) – User to impersonate when running the job.

  • class_name (str | None) – Application Java/Spark main class string.

  • args (collections.abc.Sequence[str | int | float] | None) – Command line arguments for the application s.

  • jars (list[str] | None) – jars to be used in this sessions.

  • py_files (list[str] | None) – Python files to be used in this session.

  • files (list[str] | None) – files to be used in this session.

  • driver_memory (str | None) – Amount of memory to use for the driver process string.

  • driver_cores (int | str | None) – Number of cores to use for the driver process int.

  • executor_memory (str | None) – Amount of memory to use per executor process string.

  • executor_cores (int | None) – Number of cores to use for each executor int.

  • num_executors (int | str | None) – Number of executors to launch for this session int.

  • archives (list[str] | None) – Archives to be used in this session.

  • queue (str | None) – The name of the YARN queue to which submitted string.

  • name (str | None) – The name of this session string.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

Returns:

request body

Return type:

dict[str, Any]

Was this entry helpful?