airflow.providers.amazon.aws.hooks.emr¶
Module Contents¶
Classes¶
| Interact with Amazon Elastic MapReduce Service (EMR). | |
| Interact with Amazon EMR Serverless. | |
| Interact with Amazon EMR Containers (Amazon EMR on EKS). | 
- class airflow.providers.amazon.aws.hooks.emr.EmrHook(emr_conn_id=default_conn_name, *args, **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook- Interact with Amazon Elastic MapReduce Service (EMR). - Provide thick wrapper around - boto3.client("emr").- Parameters
- emr_conn_id (str | None) – Amazon Elastic MapReduce Connection. This attribute is only necessary when using the - airflow.providers.amazon.aws.hooks.emr.EmrHook.create_job_flow().
 - Additional arguments (such as - aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.- See also - get_cluster_id_by_name(emr_cluster_name, cluster_states)[source]¶
- Fetch id of EMR cluster with given name and (optional) states; returns only if single id is found. - See also 
 - create_job_flow(job_flow_overrides)[source]¶
- Create and start running a new cluster (job flow). - See also - This method uses - EmrHook.emr_conn_idto receive the initial Amazon EMR cluster configuration. If- EmrHook.emr_conn_idis empty or the connection does not exist, then an empty initial configuration is used.- Parameters
- job_flow_overrides (dict[str, Any]) – Is used to overwrite the parameters in the initial Amazon EMR configuration cluster. The resulting configuration will be used in the - EMR.Client.run_job_flow().
 
 - add_job_flow_steps(job_flow_id, steps=None, wait_for_completion=False, waiter_delay=None, waiter_max_attempts=None, execution_role_arn=None)[source]¶
- Add new steps to a running cluster. - See also - Parameters
- job_flow_id (str) – The id of the job flow to which the steps are being added 
- steps (list[dict] | str | None) – A list of the steps to be executed by the job flow 
- wait_for_completion (bool) – If True, wait for the steps to be completed. Default is False 
- waiter_delay (int | None) – The amount of time in seconds to wait between attempts. Default is 5 
- waiter_max_attempts (int | None) – The maximum number of attempts to be made. Default is 100 
- execution_role_arn (str | None) – The ARN of the runtime role for a step on the cluster. 
 
 
 - test_connection()[source]¶
- Return failed state for test Amazon Elastic MapReduce Connection (untestable). - We need to overwrite this method because this hook is based on - AwsGenericHook, otherwise it will try to test connection to AWS STS by using the default boto3 credential strategy.
 
- class airflow.providers.amazon.aws.hooks.emr.EmrServerlessHook(*args, **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook- Interact with Amazon EMR Serverless. - Provide thin wrapper around - boto3.client("emr-serverless").- Additional arguments (such as - aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.- cancel_running_jobs(application_id, waiter_config=None, wait_for_completion=True)[source]¶
- Cancel jobs in an intermediate state, and return the number of cancelled jobs. - If wait_for_completion is True, then the method will wait until all jobs are cancelled before returning. - Note: if new jobs are triggered while this operation is ongoing, it’s going to time out and return an error. 
 
- class airflow.providers.amazon.aws.hooks.emr.EmrContainerHook(*args, virtual_cluster_id=None, **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook- Interact with Amazon EMR Containers (Amazon EMR on EKS). - Provide thick wrapper around - boto3.client("emr-containers").- Parameters
- virtual_cluster_id (str | None) – Cluster ID of the EMR on EKS virtual cluster 
 - Additional arguments (such as - aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.- create_emr_on_eks_cluster(virtual_cluster_name, eks_cluster_name, eks_namespace, tags=None)[source]¶
 - submit_job(name, execution_role_arn, release_label, job_driver, configuration_overrides=None, client_request_token=None, tags=None)[source]¶
- Submit a job to the EMR Containers API and return the job ID. - A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS. - See also - Parameters
- name (str) – The name of the job run. 
- execution_role_arn (str) – The IAM role ARN associated with the job run. 
- release_label (str) – The Amazon EMR release version to use for the job run. 
- job_driver (dict) – Job configuration details, e.g. the Spark job parameters. 
- configuration_overrides (dict | None) – The configuration overrides for the job run, specifically either application configuration or monitoring configuration. 
- client_request_token (str | None) – The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. 
- tags (dict | None) – The tags assigned to job runs. 
 
- Returns
- The ID of the job run request. 
- Return type
 
 - get_job_failure_reason(job_id)[source]¶
- Fetch the reason for a job failure (e.g. error message). Returns None or reason string. - Parameters
- job_id (str) – The ID of the job run request. 
 
 - check_query_status(job_id)[source]¶
- Fetch the status of submitted job run. Returns None or one of valid query states. - Parameters
- job_id (str) – The ID of the job run request. 
 
 
