airflow.providers.amazon.aws.sensors.emr

Module Contents

Classes

EmrBaseSensor

Contains general sensor behavior for EMR.

EmrContainerSensor

Asks for the state of the job run until it reaches a failure state or success state.

EmrJobFlowSensor

Asks for the state of the EMR JobFlow (Cluster) until it reaches

EmrStepSensor

Asks for the state of the step until it reaches any of the target states.

class airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor(*, aws_conn_id='aws_default', **kwargs)[source]

Bases: airflow.sensors.base.BaseSensorOperator

Contains general sensor behavior for EMR.

Subclasses should implement following methods:
  • get_emr_response()

  • state_from_response()

  • failure_message_from_response()

Subclasses should set target_states and failed_states fields.

Parameters

aws_conn_id (str) -- aws connection to uses

ui_color = #66c3ff[source]
get_hook(self)[source]

Get EmrHook

poke(self, context)[source]

Function that the sensors defined while deriving this class should override.

abstract get_emr_response(self)[source]

Make an API call with boto3 and get response.

Returns

response

Return type

dict[str, Any]

abstract static state_from_response(response)[source]

Get state from response dictionary.

Parameters

response (Dict[str, Any]) -- response from AWS API

Returns

state

Return type

str

abstract static failure_message_from_response(response)[source]

Get failure message from response dictionary.

Parameters

response (Dict[str, Any]) -- response from AWS API

Returns

failure message

Return type

Optional[str]

class airflow.providers.amazon.aws.sensors.emr.EmrContainerSensor(*, virtual_cluster_id, job_id, max_retries=None, aws_conn_id='aws_default', poll_interval=10, **kwargs)[source]

Bases: airflow.sensors.base.BaseSensorOperator

Asks for the state of the job run until it reaches a failure state or success state. If the job run fails, the task will fail.

Parameters
  • job_id (str) -- job_id to check the state of

  • max_retries (Optional[int]) -- Number of times to poll for query state before returning the current state, defaults to None

  • aws_conn_id (str) -- aws connection to use, defaults to 'aws_default'

  • poll_interval (int) -- Time in seconds to wait between two consecutive call to check query status on athena, defaults to 10

INTERMEDIATE_STATES = ['PENDING', 'SUBMITTED', 'RUNNING'][source]
FAILURE_STATES = ['FAILED', 'CANCELLED', 'CANCEL_PENDING'][source]
SUCCESS_STATES = ['COMPLETED'][source]
template_fields :Sequence[str] = ['virtual_cluster_id', 'job_id'][source]
template_ext :Sequence[str] = [][source]
ui_color = #66c3ff[source]
poke(self, context)[source]

Function that the sensors defined while deriving this class should override.

hook(self)[source]

Create and return an EmrContainerHook

class airflow.providers.amazon.aws.sensors.emr.EmrJobFlowSensor(*, job_flow_id, target_states=None, failed_states=None, **kwargs)[source]

Bases: EmrBaseSensor

Asks for the state of the EMR JobFlow (Cluster) until it reaches any of the target states. If it fails the sensor errors, failing the task.

With the default target states, sensor waits cluster to be terminated. When target_states is set to ['RUNNING', 'WAITING'] sensor waits until job flow to be ready (after 'STARTING' and 'BOOTSTRAPPING' states)

Parameters
  • job_flow_id (str) -- job_flow_id to check the state of

  • target_states (Optional[Iterable[str]]) -- the target states, sensor waits until job flow reaches any of these states

  • failed_states (Optional[Iterable[str]]) -- the failure states, sensor fails when job flow reaches any of these states

template_fields :Sequence[str] = ['job_flow_id', 'target_states', 'failed_states'][source]
template_ext :Sequence[str] = [][source]
get_emr_response(self)[source]

Make an API call with boto3 and get cluster-level details.

Returns

response

Return type

dict[str, Any]

static state_from_response(response)[source]

Get state from response dictionary.

Parameters

response (Dict[str, Any]) -- response from AWS API

Returns

current state of the cluster

Return type

str

static failure_message_from_response(response)[source]

Get failure message from response dictionary.

Parameters

response (Dict[str, Any]) -- response from AWS API

Returns

failure message

Return type

Optional[str]

class airflow.providers.amazon.aws.sensors.emr.EmrStepSensor(*, job_flow_id, step_id, target_states=None, failed_states=None, **kwargs)[source]

Bases: EmrBaseSensor

Asks for the state of the step until it reaches any of the target states. If it fails the sensor errors, failing the task.

With the default target states, sensor waits step to be completed.

Parameters
  • job_flow_id (str) -- job_flow_id which contains the step check the state of

  • step_id (str) -- step to check the state of

  • target_states (Optional[Iterable[str]]) -- the target states, sensor waits until step reaches any of these states

  • failed_states (Optional[Iterable[str]]) -- the failure states, sensor fails when step reaches any of these states

template_fields :Sequence[str] = ['job_flow_id', 'step_id', 'target_states', 'failed_states'][source]
template_ext :Sequence[str] = [][source]
get_emr_response(self)[source]

Make an API call with boto3 and get details about the cluster step.

Returns

response

Return type

dict[str, Any]

static state_from_response(response)[source]

Get state from response dictionary.

Parameters

response (Dict[str, Any]) -- response from AWS API

Returns

execution state of the cluster step

Return type

str

static failure_message_from_response(response)[source]

Get failure message from response dictionary.

Parameters

response (Dict[str, Any]) -- response from AWS API

Returns

failure message

Return type

Optional[str]

Was this entry helpful?