airflow.providers.amazon.aws.operators.emr_containers
¶
Module Contents¶
-
class
airflow.providers.amazon.aws.operators.emr_containers.
EMRContainerOperator
(*, name: str, virtual_cluster_id: str, execution_role_arn: str, release_label: str, job_driver: dict, configuration_overrides: Optional[dict] = None, client_request_token: Optional[str] = None, aws_conn_id: str = 'aws_default', poll_interval: int = 30, max_tries: Optional[int] = None, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
An operator that submits jobs to EMR on EKS virtual clusters.
- Parameters
name (str) – The name of the job run.
virtual_cluster_id (str) – The EMR on EKS virtual cluster ID
execution_role_arn (str) – The IAM role ARN associated with the job run.
release_label (str) – The Amazon EMR release version to use for the job run.
job_driver (dict) – Job configuration details, e.g. the Spark job parameters.
configuration_overrides (dict) – The configuration overrides for the job run, specifically either application configuration or monitoring configuration.
client_request_token (str) – The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. If no token is provided, a UUIDv4 token will be generated for you.
aws_conn_id (str) – The Airflow connection used for AWS credentials.
poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check query status on EMR
max_tries (int) – Maximum number of times to wait for the job run to finish. Defaults to None, which will poll until the job is not in a pending, submitted, or running state.