airflow.providers.amazon.aws.operators.emr_containers

Module Contents

class airflow.providers.amazon.aws.operators.emr_containers.EMRContainerOperator(*, name: str, virtual_cluster_id: str, execution_role_arn: str, release_label: str, job_driver: dict, configuration_overrides: Optional[dict] = None, client_request_token: Optional[str] = None, aws_conn_id: str = 'aws_default', poll_interval: int = 30, max_tries: Optional[int] = None, **kwargs)[source]

Bases: airflow.models.BaseOperator

An operator that submits jobs to EMR on EKS virtual clusters.

Parameters
  • name (str) – The name of the job run.

  • virtual_cluster_id (str) – The EMR on EKS virtual cluster ID

  • execution_role_arn (str) – The IAM role ARN associated with the job run.

  • release_label (str) – The Amazon EMR release version to use for the job run.

  • job_driver (dict) – Job configuration details, e.g. the Spark job parameters.

  • configuration_overrides (dict) – The configuration overrides for the job run, specifically either application configuration or monitoring configuration.

  • client_request_token (str) – The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. If no token is provided, a UUIDv4 token will be generated for you.

  • aws_conn_id (str) – The Airflow connection used for AWS credentials.

  • poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check query status on EMR

  • max_tries (int) – Maximum number of times to wait for the job run to finish. Defaults to None, which will poll until the job is not in a pending, submitted, or running state.

template_fields = ['name', 'virtual_cluster_id', 'execution_role_arn', 'release_label', 'job_driver'][source]
ui_color = #f9c915[source]
hook(self)[source]

Create and return an EMRContainerHook.

execute(self, context: dict)[source]

Run job on EMR Containers

on_kill(self)[source]

Cancel the submitted job run

Was this entry helpful?