airflow.providers.amazon.aws.hooks.emr¶
Module Contents¶
Classes¶
| Interact with Amazon Elastic MapReduce Service. | |
| Interact with EMR Serverless API. | |
| Interact with AWS EMR Virtual Cluster to run, poll jobs and return job status | 
- class airflow.providers.amazon.aws.hooks.emr.EmrHook(emr_conn_id=default_conn_name, *args, **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook- Interact with Amazon Elastic MapReduce Service. - Parameters
- emr_conn_id (str | None) – Amazon Elastic MapReduce Connection. This attribute is only necessary when using the - create_job_flow()method.
 - Additional arguments (such as - aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.- See also - get_cluster_id_by_name(emr_cluster_name, cluster_states)[source]¶
- Fetch id of EMR cluster with given name and (optional) states. Will return only if single id is found. 
 - create_job_flow(job_flow_overrides)[source]¶
- Create and start running a new cluster (job flow). - This method uses - EmrHook.emr_conn_idto receive the initial Amazon EMR cluster configuration. If- EmrHook.emr_conn_idis empty or the connection does not exist, then an empty initial configuration is used.
 - add_job_flow_steps(job_flow_id, steps=None, wait_for_completion=False)[source]¶
- Add new steps to a running cluster. 
 - test_connection()[source]¶
- Return failed state for test Amazon Elastic MapReduce Connection (untestable). - We need to overwrite this method because this hook is based on - AwsGenericHook, otherwise it will try to test connection to AWS STS by using the default boto3 credential strategy.
 
- class airflow.providers.amazon.aws.hooks.emr.EmrServerlessHook(*args, **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook- Interact with EMR Serverless API. - Additional arguments (such as - aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.- See also - waiter(get_state_callable, get_state_args, parse_response, desired_state, failure_states, object_type, action, countdown=25 * 60, check_interval_seconds=60)[source]¶
- Will run the sensor until it turns True. - Parameters
- get_state_callable (Callable) – A callable to run until it returns True 
- get_state_args (dict) – Arguments to pass to get_state_callable 
- parse_response (list) – Dictionary keys to extract state from response of get_state_callable 
- desired_state (set) – Wait until the getter returns this value 
- failure_states (set) – A set of states which indicate failure and should throw an exception if any are reached before the desired_state 
- object_type (str) – Used for the reporting string. What are you waiting for? (application, job, etc) 
- action (str) – Used for the reporting string. What action are you waiting for? (created, deleted, etc) 
- countdown (int) – Total amount of time the waiter should wait for the desired state before timing out (in seconds). Defaults to 25 * 60 seconds. 
- check_interval_seconds (int) – Number of seconds waiter should wait before attempting to retry get_state_callable. Defaults to 60 seconds. 
 
 
 
- class airflow.providers.amazon.aws.hooks.emr.EmrContainerHook(*args, virtual_cluster_id=None, **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook- Interact with AWS EMR Virtual Cluster to run, poll jobs and return job status Additional arguments (such as - aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.- See also - Parameters
- virtual_cluster_id (str | None) – Cluster ID of the EMR on EKS virtual cluster 
 - create_emr_on_eks_cluster(virtual_cluster_name, eks_cluster_name, eks_namespace, tags=None)[source]¶
 - submit_job(name, execution_role_arn, release_label, job_driver, configuration_overrides=None, client_request_token=None, tags=None)[source]¶
- Submit a job to the EMR Containers API and return the job ID. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.start_job_run # noqa: E501 - Parameters
- name (str) – The name of the job run. 
- execution_role_arn (str) – The IAM role ARN associated with the job run. 
- release_label (str) – The Amazon EMR release version to use for the job run. 
- job_driver (dict) – Job configuration details, e.g. the Spark job parameters. 
- configuration_overrides (dict | None) – The configuration overrides for the job run, specifically either application configuration or monitoring configuration. 
- client_request_token (str | None) – The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. 
- tags (dict | None) – The tags assigned to job runs. 
 
- Returns
- Job ID 
- Return type
 
 - get_job_failure_reason(job_id)[source]¶
- Fetch the reason for a job failure (e.g. error message). Returns None or reason string. 
 - check_query_status(job_id)[source]¶
- Fetch the status of submitted job run. Returns None or one of valid query states. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.describe_job_run # noqa: E501 :param job_id: Id of submitted job run :return: str 
 
