airflow.providers.amazon.aws.operators.emr¶
Module Contents¶
Classes¶
| An operator that adds steps to an existing EMR job_flow. | |
| An operator that starts an EMR notebook execution. | |
| An operator that stops a running EMR notebook execution. | |
| An operator that creates EMR on EKS virtual clusters. | |
| An operator that submits jobs to EMR on EKS virtual clusters. | |
| Creates an EMR JobFlow, reading the config from the EMR connection. | |
| An operator that modifies an existing EMR cluster. | |
| Operator to terminate EMR JobFlows. | |
| Operator to create Serverless EMR Application. | |
| Operator to start EMR Serverless job. | |
| Operator to stop an EMR Serverless application. | |
| Operator to delete EMR Serverless application. | 
- class airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator(*, job_flow_id=None, job_flow_name=None, cluster_states=None, aws_conn_id='aws_default', steps=None, wait_for_completion=False, waiter_delay=30, waiter_max_attempts=60, execution_role_arn=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- An operator that adds steps to an existing EMR job_flow. - See also - For more information on how to use this operator, take a look at the guide: Add Steps to an EMR job flow - Parameters
- job_flow_id (str | None) – id of the JobFlow to add steps to. (templated) 
- job_flow_name (str | None) – name of the JobFlow to add steps to. Use as an alternative to passing job_flow_id. will search for id of JobFlow with matching name in one of the states in param cluster_states. Exactly one cluster like this should exist or will fail. (templated) 
- cluster_states (list[str] | None) – Acceptable cluster states when searching for JobFlow id by job_flow_name. (templated) 
- aws_conn_id (str) – aws connection to uses 
- steps (list[dict] | str | None) – boto3 style steps or reference to a steps file (must be ‘.json’) to be added to the jobflow. (templated) 
- wait_for_completion (bool) – If True, the operator will wait for all the steps to be completed. 
- execution_role_arn (str | None) – The ARN of the runtime role for a step on the cluster. 
- do_xcom_push – if True, job_flow_id is pushed to XCom with key job_flow_id. 
- wait_for_completion – Whether to wait for job run completion. (default: True) 
- deferrable (bool) – If True, the operator will wait asynchronously for the job to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False) 
 
 - template_fields: Sequence[str] = ('job_flow_id', 'job_flow_name', 'cluster_states', 'steps', 'execution_role_arn')[source]¶
 
- class airflow.providers.amazon.aws.operators.emr.EmrStartNotebookExecutionOperator(editor_id, relative_path, cluster_id, service_role, notebook_execution_name=None, notebook_params=None, notebook_instance_security_group_id=None, master_instance_security_group_id=None, tags=None, wait_for_completion=False, aws_conn_id='aws_default', waiter_max_attempts=NOTSET, waiter_delay=NOTSET, waiter_countdown=25 * 60, waiter_check_interval_seconds=60, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- An operator that starts an EMR notebook execution. - See also - For more information on how to use this operator, take a look at the guide: Start an EMR notebook execution - Parameters
- editor_id (str) – The unique identifier of the EMR notebook to use for notebook execution. 
- relative_path (str) – The path and file name of the notebook file for this execution, relative to the path specified for the EMR notebook. 
- cluster_id (str) – The unique identifier of the EMR cluster the notebook is attached to. 
- service_role (str) – The name or ARN of the IAM role that is used as the service role for Amazon EMR (the EMR role) for the notebook execution. 
- notebook_execution_name (str | None) – Optional name for the notebook execution. 
- notebook_params (str | None) – Input parameters in JSON format passed to the EMR notebook at runtime for execution. 
- tags (list | None) – Optional list of key value pair to associate with the notebook execution. 
- waiter_max_attempts (int | None | airflow.utils.types.ArgNotSet) – Maximum number of tries before failing. 
- waiter_delay (int | None | airflow.utils.types.ArgNotSet) – Number of seconds between polling the state of the notebook. 
- waiter_countdown (int) – Total amount of time the operator will wait for the notebook to stop. Defaults to 25 * 60 seconds. (Deprecated. Please use waiter_max_attempts.) 
- waiter_check_interval_seconds (int) – Number of seconds between polling the state of the notebook. Defaults to 60 seconds. (Deprecated. Please use waiter_delay.) 
 
- Param
- notebook_instance_security_group_id: The unique identifier of the Amazon EC2 security group to associate with the EMR notebook for this notebook execution. 
- Param
- master_instance_security_group_id: Optional unique ID of an EC2 security group to associate with the master instance of the EMR cluster for this notebook execution. 
 
- class airflow.providers.amazon.aws.operators.emr.EmrStopNotebookExecutionOperator(notebook_execution_id, wait_for_completion=False, aws_conn_id='aws_default', waiter_max_attempts=NOTSET, waiter_delay=NOTSET, waiter_countdown=25 * 60, waiter_check_interval_seconds=60, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- An operator that stops a running EMR notebook execution. - See also - For more information on how to use this operator, take a look at the guide: Stop an EMR notebook execution - Parameters
- notebook_execution_id (str) – The unique identifier of the notebook execution. 
- wait_for_completion (bool) – If True, the operator will wait for the notebook. to be in a STOPPED or FINISHED state. Defaults to False. 
- aws_conn_id (str) – aws connection to use. 
- waiter_max_attempts (int | None | airflow.utils.types.ArgNotSet) – Maximum number of tries before failing. 
- waiter_delay (int | None | airflow.utils.types.ArgNotSet) – Number of seconds between polling the state of the notebook. 
- waiter_countdown (int) – Total amount of time the operator will wait for the notebook to stop. Defaults to 25 * 60 seconds. (Deprecated. Please use waiter_max_attempts.) 
- waiter_check_interval_seconds (int) – Number of seconds between polling the state of the notebook. Defaults to 60 seconds. (Deprecated. Please use waiter_delay.) 
 
 
- class airflow.providers.amazon.aws.operators.emr.EmrEksCreateClusterOperator(*, virtual_cluster_name, eks_cluster_name, eks_namespace, virtual_cluster_id='', aws_conn_id='aws_default', tags=None, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- An operator that creates EMR on EKS virtual clusters. - See also - For more information on how to use this operator, take a look at the guide: Create an Amazon EMR EKS virtual cluster - Parameters
- virtual_cluster_name (str) – The name of the EMR EKS virtual cluster to create. 
- eks_cluster_name (str) – The EKS cluster used by the EMR virtual cluster. 
- eks_namespace (str) – namespace used by the EKS cluster. 
- virtual_cluster_id (str) – The EMR on EKS virtual cluster id. 
- aws_conn_id (str) – The Airflow connection used for AWS credentials. 
- tags (dict | None) – The tags assigned to created cluster. Defaults to None 
 
 
- class airflow.providers.amazon.aws.operators.emr.EmrContainerOperator(*, name, virtual_cluster_id, execution_role_arn, release_label, job_driver, configuration_overrides=None, client_request_token=None, aws_conn_id='aws_default', wait_for_completion=True, poll_interval=30, max_tries=None, tags=None, max_polling_attempts=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- An operator that submits jobs to EMR on EKS virtual clusters. - See also - For more information on how to use this operator, take a look at the guide: Submit a job to an Amazon EMR virtual cluster - Parameters
- name (str) – The name of the job run. 
- virtual_cluster_id (str) – The EMR on EKS virtual cluster ID 
- execution_role_arn (str) – The IAM role ARN associated with the job run. 
- release_label (str) – The Amazon EMR release version to use for the job run. 
- job_driver (dict) – Job configuration details, e.g. the Spark job parameters. 
- configuration_overrides (dict | None) – The configuration overrides for the job run, specifically either application configuration or monitoring configuration. 
- client_request_token (str | None) – The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. If no token is provided, a UUIDv4 token will be generated for you. 
- aws_conn_id (str) – The Airflow connection used for AWS credentials. 
- wait_for_completion (bool) – Whether or not to wait in the operator for the job to complete. 
- poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check query status on EMR 
- max_tries (int | None) – Deprecated - use max_polling_attempts instead. 
- max_polling_attempts (int | None) – Maximum number of times to wait for the job run to finish. Defaults to None, which will poll until the job is not in a pending, submitted, or running state. 
- tags (dict | None) – The tags assigned to job runs. Defaults to None 
- deferrable (bool) – Run operator in the deferrable mode. 
 
 
- class airflow.providers.amazon.aws.operators.emr.EmrCreateJobFlowOperator(*, aws_conn_id='aws_default', emr_conn_id='emr_default', job_flow_overrides=None, region_name=None, wait_for_completion=False, waiter_max_attempts=NOTSET, waiter_delay=NOTSET, waiter_countdown=None, waiter_check_interval_seconds=60, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Creates an EMR JobFlow, reading the config from the EMR connection. - A dictionary of JobFlow overrides can be passed that override the config from the connection. - See also - For more information on how to use this operator, take a look at the guide: Create an EMR job flow - Parameters
- aws_conn_id (str) – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node) 
- emr_conn_id (str | None) – Amazon Elastic MapReduce Connection. Use to receive an initial Amazon EMR cluster configuration: - boto3.client('emr').run_job_flowrequest body. If this is None or empty or the connection does not exist, then an empty initial configuration is used.
- job_flow_overrides (str | dict[str, Any] | None) – boto3 style arguments or reference to an arguments file (must be ‘.json’) to override specific - emr_conn_idextra parameters. (templated)
- region_name (str | None) – Region named passed to EmrHook 
- wait_for_completion (bool) – Whether to finish task immediately after creation (False) or wait for jobflow completion (True) 
- waiter_max_attempts (int | None | airflow.utils.types.ArgNotSet) – Maximum number of tries before failing. 
- waiter_delay (int | None | airflow.utils.types.ArgNotSet) – Number of seconds between polling the state of the notebook. 
- waiter_countdown (int | None) – Max. seconds to wait for jobflow completion (only in combination with wait_for_completion=True, None = no limit) (Deprecated. Please use waiter_max_attempts.) 
- waiter_check_interval_seconds (int) – Number of seconds between polling the jobflow state. Defaults to 60 seconds. (Deprecated. Please use waiter_delay.) 
- deferrable (bool) – If True, the operator will wait asynchronously for the crawl to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False) 
 
 - template_fields: Sequence[str] = ('job_flow_overrides', 'waiter_delay', 'waiter_max_attempts')[source]¶
 
- class airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator(*, cluster_id, step_concurrency_level, aws_conn_id='aws_default', **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- An operator that modifies an existing EMR cluster. - See also - For more information on how to use this operator, take a look at the guide: Modify Amazon EMR container - Parameters
 
- class airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator(*, job_flow_id, aws_conn_id='aws_default', waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Operator to terminate EMR JobFlows. - See also - For more information on how to use this operator, take a look at the guide: Terminate an EMR job flow - Parameters
- job_flow_id (str) – id of the JobFlow to terminate. (templated) 
- aws_conn_id (str) – aws connection to uses 
- waiter_delay (int) – Time (in seconds) to wait between two consecutive calls to check JobFlow status 
- waiter_max_attempts (int) – The maximum number of times to poll for JobFlow status. 
- deferrable (bool) – If True, the operator will wait asynchronously for the crawl to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False) 
 
 
- class airflow.providers.amazon.aws.operators.emr.EmrServerlessCreateApplicationOperator(release_label, job_type, client_request_token='', config=None, wait_for_completion=True, aws_conn_id='aws_default', waiter_countdown=NOTSET, waiter_check_interval_seconds=NOTSET, waiter_max_attempts=NOTSET, waiter_delay=NOTSET, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Operator to create Serverless EMR Application. - See also - For more information on how to use this operator, take a look at the guide: Create an EMR Serverless Application - Parameters
- release_label (str) – The EMR release version associated with the application. 
- job_type (str) – The type of application you want to start, such as Spark or Hive. 
- wait_for_completion (bool) – If true, wait for the Application to start before returning. Default to True. If set to False, - waiter_max_attemptsand- waiter_delaywill only be applied when waiting for the application to be in the- CREATEDstate.
- client_request_token (str) – The client idempotency token of the application to create. Its value must be unique for each request. 
- config (dict | None) – Optional dictionary for arbitrary parameters to the boto API create_application call. 
- aws_conn_id (str) – AWS connection to use 
- waiter_countdown (int | airflow.utils.types.ArgNotSet) – (deprecated) Total amount of time, in seconds, the operator will wait for the application to start. Defaults to 25 minutes. 
- waiter_check_interval_seconds (int | airflow.utils.types.ArgNotSet) – (deprecated) Number of seconds between polling the state of the application. Defaults to 60 seconds. 
- waiter_delay (int | airflow.utils.types.ArgNotSet) – Number of seconds between polling the state of the application. 
- deferrable (bool) – If True, the operator will wait asynchronously for application to be created. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False, but can be overridden in config file by setting default_deferrable to True) 
 
- Waiter_max_attempts
- Number of times the waiter should poll the application to check the state. If not set, the waiter will use its default value. 
 
- class airflow.providers.amazon.aws.operators.emr.EmrServerlessStartJobOperator(application_id, execution_role_arn, job_driver, configuration_overrides, client_request_token='', config=None, wait_for_completion=True, aws_conn_id='aws_default', name=None, waiter_countdown=NOTSET, waiter_check_interval_seconds=NOTSET, waiter_max_attempts=NOTSET, waiter_delay=NOTSET, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Operator to start EMR Serverless job. - See also - For more information on how to use this operator, take a look at the guide: Start an EMR Serverless Job - Parameters
- application_id (str) – ID of the EMR Serverless application to start. 
- execution_role_arn (str) – ARN of role to perform action. 
- job_driver (dict) – Driver that the job runs on. 
- configuration_overrides (dict | None) – Configuration specifications to override existing configurations. 
- client_request_token (str) – The client idempotency token of the application to create. Its value must be unique for each request. 
- config (dict | None) – Optional dictionary for arbitrary parameters to the boto API start_job_run call. 
- wait_for_completion (bool) – If true, waits for the job to start before returning. Defaults to True. If set to False, - waiter_countdownand- waiter_check_interval_secondswill only be applied when waiting for the application be to in the- STARTEDstate.
- aws_conn_id (str) – AWS connection to use. 
- name (str | None) – Name for the EMR Serverless job. If not provided, a default name will be assigned. 
- waiter_countdown (int | airflow.utils.types.ArgNotSet) – (deprecated) Total amount of time, in seconds, the operator will wait for the job finish. Defaults to 25 minutes. 
- waiter_check_interval_seconds (int | airflow.utils.types.ArgNotSet) – (deprecated) Number of seconds between polling the state of the job. Defaults to 60 seconds. 
- waiter_delay (int | airflow.utils.types.ArgNotSet) – Number of seconds between polling the state of the job run. 
- deferrable (bool) – If True, the operator will wait asynchronously for the crawl to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False, but can be overridden in config file by setting default_deferrable to True) 
 
- Waiter_max_attempts
- Number of times the waiter should poll the application to check the state. If not set, the waiter will use its default value. 
 - template_fields: Sequence[str] = ('application_id', 'config', 'execution_role_arn', 'job_driver', 'configuration_overrides')[source]¶
 
- class airflow.providers.amazon.aws.operators.emr.EmrServerlessStopApplicationOperator(application_id, wait_for_completion=True, aws_conn_id='aws_default', waiter_countdown=NOTSET, waiter_check_interval_seconds=NOTSET, waiter_max_attempts=NOTSET, waiter_delay=NOTSET, force_stop=False, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Operator to stop an EMR Serverless application. - See also - For more information on how to use this operator, take a look at the guide: Stop an EMR Serverless Application - Parameters
- application_id (str) – ID of the EMR Serverless application to stop. 
- wait_for_completion (bool) – If true, wait for the Application to stop before returning. Default to True 
- aws_conn_id (str) – AWS connection to use 
- waiter_countdown (int | airflow.utils.types.ArgNotSet) – (deprecated) Total amount of time, in seconds, the operator will wait for the application be stopped. Defaults to 5 minutes. 
- waiter_check_interval_seconds (int | airflow.utils.types.ArgNotSet) – (deprecated) Number of seconds between polling the state of the application. Defaults to 60 seconds. 
- force_stop (bool) – If set to True, any job for that app that is not in a terminal state will be cancelled. Otherwise, trying to stop an app with running jobs will return an error. If you want to wait for the jobs to finish gracefully, use - airflow.providers.amazon.aws.sensors.emr.EmrServerlessJobSensor
- waiter_delay (int | airflow.utils.types.ArgNotSet) – Number of seconds between polling the state of the application. Default is 60 seconds. 
- deferrable (bool) – If True, the operator will wait asynchronously for the application to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False, but can be overridden in config file by setting default_deferrable to True) 
 
- Waiter_max_attempts
- Number of times the waiter should poll the application to check the state. Default is 25. 
 
- class airflow.providers.amazon.aws.operators.emr.EmrServerlessDeleteApplicationOperator(application_id, wait_for_completion=True, aws_conn_id='aws_default', waiter_countdown=NOTSET, waiter_check_interval_seconds=NOTSET, waiter_max_attempts=NOTSET, waiter_delay=NOTSET, force_stop=False, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - EmrServerlessStopApplicationOperator- Operator to delete EMR Serverless application. - See also - For more information on how to use this operator, take a look at the guide: Delete an EMR Serverless Application - Parameters
- application_id (str) – ID of the EMR Serverless application to delete. 
- wait_for_completion (bool) – If true, wait for the Application to be deleted before returning. Defaults to True. Note that this operator will always wait for the application to be STOPPED first. 
- aws_conn_id (str) – AWS connection to use 
- waiter_countdown (int | airflow.utils.types.ArgNotSet) – (deprecated) Total amount of time, in seconds, the operator will wait for each step of first,the application to be stopped, and then deleted. Defaults to 25 minutes. 
- waiter_check_interval_seconds (int | airflow.utils.types.ArgNotSet) – (deprecated) Number of seconds between polling the state of the application. Defaults to 60 seconds. 
- waiter_delay (int | airflow.utils.types.ArgNotSet) – Number of seconds between polling the state of the application. Defaults to 60 seconds. 
- deferrable (bool) – If True, the operator will wait asynchronously for application to be deleted. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False, but can be overridden in config file by setting default_deferrable to True) 
- force_stop (bool) – If set to True, any job for that app that is not in a terminal state will be cancelled. Otherwise, trying to delete an app with running jobs will return an error. If you want to wait for the jobs to finish gracefully, use - airflow.providers.amazon.aws.sensors.emr.EmrServerlessJobSensor
 
- Waiter_max_attempts
- Number of times the waiter should poll the application to check the state. Defaults to 25. 
 
