airflow.providers.amazon.aws.operators.emr¶
Module Contents¶
Classes¶
| An operator that adds steps to an existing EMR job_flow. | |
| An operator that submits jobs to EMR on EKS virtual clusters. | |
| Operator link for EmrCreateJobFlowOperator. It allows users to access the EMR Cluster | |
| Creates an EMR JobFlow, reading the config from the EMR connection. | |
| An operator that modifies an existing EMR cluster. | |
| Operator to terminate EMR JobFlows. | 
- class airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator(*, job_flow_id=None, job_flow_name=None, cluster_states=None, aws_conn_id='aws_default', steps=None, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- An operator that adds steps to an existing EMR job_flow. - Parameters
- job_flow_id (Optional[str]) -- id of the JobFlow to add steps to. (templated) 
- job_flow_name (Optional[str]) -- name of the JobFlow to add steps to. Use as an alternative to passing job_flow_id. will search for id of JobFlow with matching name in one of the states in param cluster_states. Exactly one cluster like this should exist or will fail. (templated) 
- cluster_states (Optional[List[str]]) -- Acceptable cluster states when searching for JobFlow id by job_flow_name. (templated) 
- aws_conn_id (str) -- aws connection to uses 
- steps (Optional[Union[List[dict], str]]) -- boto3 style steps or reference to a steps file (must be '.json') to be added to the jobflow. (templated) 
- do_xcom_push -- if True, job_flow_id is pushed to XCom with key job_flow_id. 
 
 
- class airflow.providers.amazon.aws.operators.emr.EmrContainerOperator(*, name, virtual_cluster_id, execution_role_arn, release_label, job_driver, configuration_overrides=None, client_request_token=None, aws_conn_id='aws_default', poll_interval=30, max_tries=None, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- An operator that submits jobs to EMR on EKS virtual clusters. - Parameters
- name (str) -- The name of the job run. 
- virtual_cluster_id (str) -- The EMR on EKS virtual cluster ID 
- execution_role_arn (str) -- The IAM role ARN associated with the job run. 
- release_label (str) -- The Amazon EMR release version to use for the job run. 
- job_driver (dict) -- Job configuration details, e.g. the Spark job parameters. 
- configuration_overrides (Optional[dict]) -- The configuration overrides for the job run, specifically either application configuration or monitoring configuration. 
- client_request_token (Optional[str]) -- The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. If no token is provided, a UUIDv4 token will be generated for you. 
- aws_conn_id (str) -- The Airflow connection used for AWS credentials. 
- poll_interval (int) -- Time (in seconds) to wait between two consecutive calls to check query status on EMR 
- max_tries (Optional[int]) -- Maximum number of times to wait for the job run to finish. Defaults to None, which will poll until the job is not in a pending, submitted, or running state. 
 
 
- class airflow.providers.amazon.aws.operators.emr.EmrClusterLink[source]¶
- Bases: - airflow.models.BaseOperatorLink- Operator link for EmrCreateJobFlowOperator. It allows users to access the EMR Cluster - get_link(self, operator, dttm)[source]¶
- Get link to EMR cluster. - Parameters
- operator (airflow.models.BaseOperator) -- operator 
- dttm (datetime.datetime) -- datetime 
 
- Returns
- url link 
- Return type
 
 
- class airflow.providers.amazon.aws.operators.emr.EmrCreateJobFlowOperator(*, aws_conn_id='aws_default', emr_conn_id='emr_default', job_flow_overrides=None, region_name=None, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Creates an EMR JobFlow, reading the config from the EMR connection. A dictionary of JobFlow overrides can be passed that override the config from the connection. - Parameters
- aws_conn_id (str) -- aws connection to uses 
- emr_conn_id (str) -- emr connection to use 
- job_flow_overrides (Optional[Union[str, Dict[str, Any]]]) -- boto3 style arguments or reference to an arguments file (must be '.json') to override emr_connection extra. (templated) 
- region_name (Optional[str]) -- Region named passed to EmrHook 
 
 
- class airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator(*, cluster_id, step_concurrency_level, aws_conn_id='aws_default', **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- An operator that modifies an existing EMR cluster. :param cluster_id: cluster identifier :param step_concurrency_level: Concurrency of the cluster :param aws_conn_id: aws connection to uses :param do_xcom_push: if True, cluster_id is pushed to XCom with key cluster_id.