airflow.providers.amazon.aws.triggers.emr

Module Contents

Classes

EmrAddStepsTrigger

AWS Emr Add Steps Trigger

EmrCreateJobFlowTrigger

Trigger for EmrCreateJobFlowOperator.

EmrTerminateJobFlowTrigger

Trigger that terminates a running EMR Job Flow.

EmrContainerSensorTrigger

Poll for the status of EMR container until reaches terminal state.

class airflow.providers.amazon.aws.triggers.emr.EmrAddStepsTrigger(job_flow_id, step_ids, aws_conn_id, max_attempts, poll_interval)[source]

Bases: airflow.triggers.base.BaseTrigger

AWS Emr Add Steps Trigger The trigger will asynchronously poll the boto3 API and wait for the steps to finish executing. :param job_flow_id: The id of the job flow. :param step_ids: The id of the steps being waited upon. :param poll_interval: The amount of time in seconds to wait between attempts. :param max_attempts: The maximum number of attempts to be made. :param aws_conn_id: The Airflow connection used for AWS credentials.

serialize()[source]

Returns the information needed to reconstruct this Trigger.

Returns

Tuple of (class path, keyword arguments needed to re-instantiate).

Return type

tuple[str, dict[str, Any]]

async run()[source]

Runs the trigger in an asynchronous context.

The trigger should yield an Event whenever it wants to fire off an event, and return None if it is finished. Single-event triggers should thus yield and then immediately return.

If it yields, it is likely that it will be resumed very quickly, but it may not be (e.g. if the workload is being moved to another triggerer process, or a multi-event trigger was being used for a single-event task defer).

In either case, Trigger classes should assume they will be persisted, and then rely on cleanup() being called when they are no longer needed.

class airflow.providers.amazon.aws.triggers.emr.EmrCreateJobFlowTrigger(job_flow_id, poll_interval, max_attempts, aws_conn_id)[source]

Bases: airflow.triggers.base.BaseTrigger

Trigger for EmrCreateJobFlowOperator. The trigger will asynchronously poll the boto3 API and wait for the JobFlow to finish executing.

Parameters
  • job_flow_id (str) – The id of the job flow to wait for.

  • poll_interval (int) – The amount of time in seconds to wait between attempts.

  • max_attempts (int) – The maximum number of attempts to be made.

  • aws_conn_id (str) – The Airflow connection used for AWS credentials.

serialize()[source]

Returns the information needed to reconstruct this Trigger.

Returns

Tuple of (class path, keyword arguments needed to re-instantiate).

Return type

tuple[str, dict[str, Any]]

async run()[source]

Runs the trigger in an asynchronous context.

The trigger should yield an Event whenever it wants to fire off an event, and return None if it is finished. Single-event triggers should thus yield and then immediately return.

If it yields, it is likely that it will be resumed very quickly, but it may not be (e.g. if the workload is being moved to another triggerer process, or a multi-event trigger was being used for a single-event task defer).

In either case, Trigger classes should assume they will be persisted, and then rely on cleanup() being called when they are no longer needed.

class airflow.providers.amazon.aws.triggers.emr.EmrTerminateJobFlowTrigger(job_flow_id, poll_interval, max_attempts, aws_conn_id)[source]

Bases: airflow.triggers.base.BaseTrigger

Trigger that terminates a running EMR Job Flow. The trigger will asynchronously poll the boto3 API and wait for the JobFlow to finish terminating.

Parameters
  • job_flow_id (str) – ID of the EMR Job Flow to terminate

  • poll_interval (int) – The amount of time in seconds to wait between attempts.

  • max_attempts (int) – The maximum number of attempts to be made.

  • aws_conn_id (str) – The Airflow connection used for AWS credentials.

serialize()[source]

Returns the information needed to reconstruct this Trigger.

Returns

Tuple of (class path, keyword arguments needed to re-instantiate).

Return type

tuple[str, dict[str, Any]]

async run()[source]

Runs the trigger in an asynchronous context.

The trigger should yield an Event whenever it wants to fire off an event, and return None if it is finished. Single-event triggers should thus yield and then immediately return.

If it yields, it is likely that it will be resumed very quickly, but it may not be (e.g. if the workload is being moved to another triggerer process, or a multi-event trigger was being used for a single-event task defer).

In either case, Trigger classes should assume they will be persisted, and then rely on cleanup() being called when they are no longer needed.

class airflow.providers.amazon.aws.triggers.emr.EmrContainerSensorTrigger(virtual_cluster_id, job_id, aws_conn_id='aws_default', poll_interval=30, **kwargs)[source]

Bases: airflow.triggers.base.BaseTrigger

Poll for the status of EMR container until reaches terminal state.

Parameters
  • virtual_cluster_id (str) – Reference Emr cluster id

  • job_id (str) – job_id to check the state

  • aws_conn_id (str) – Reference to AWS connection id

  • poll_interval (int) – polling period in seconds to check for the status

hook()[source]
serialize()[source]

Serializes EmrContainerSensorTrigger arguments and classpath.

async run()[source]

Runs the trigger in an asynchronous context.

The trigger should yield an Event whenever it wants to fire off an event, and return None if it is finished. Single-event triggers should thus yield and then immediately return.

If it yields, it is likely that it will be resumed very quickly, but it may not be (e.g. if the workload is being moved to another triggerer process, or a multi-event trigger was being used for a single-event task defer).

In either case, Trigger classes should assume they will be persisted, and then rely on cleanup() being called when they are no longer needed.

Was this entry helpful?