`airflow.providers.amazon.aws.hooks.emr`¶

Module Contents¶

Classes¶

`EmrHook`	Interact with AWS EMR. emr_conn_id is only necessary for using the
`EmrContainerHook`	Interact with AWS EMR Virtual Cluster to run, poll jobs and return job status

class airflow.providers.amazon.aws.hooks.emr.EmrHook(emr_conn_id=default_conn_name, *args, **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with AWS EMR. emr_conn_id is only necessary for using the create_job_flow method.

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

See also

AwsBaseHook

Parameters: virtual_cluster_id (Optional[str]) -- Cluster ID of the EMR on EKS virtual cluster

INTERMEDIATE_STATES = ['PENDING', 'SUBMITTED', 'RUNNING'][source]¶

FAILURE_STATES = ['FAILED', 'CANCELLED', 'CANCEL_PENDING'][source]¶

SUCCESS_STATES = ['COMPLETED'][source]¶

TERMINAL_STATES = ['COMPLETED', 'FAILED', 'CANCELLED', 'CANCEL_PENDING'][source]¶

submit_job(self, name, execution_role_arn, release_label, job_driver, configuration_overrides=None, client_request_token=None)[source]¶

Submit a job to the EMR Containers API and return the job ID. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.start_job_run # noqa: E501

Parameters

name (str) -- The name of the job run.
execution_role_arn (str) -- The IAM role ARN associated with the job run.
release_label (str) -- The Amazon EMR release version to use for the job run.
job_driver (dict) -- Job configuration details, e.g. the Spark job parameters.
configuration_overrides (Optional[dict]) -- The configuration overrides for the job run, specifically either application configuration or monitoring configuration.
client_request_token (Optional[str]) -- The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started.

Returns

Job ID

Return type

str

get_job_failure_reason(self, job_id)[source]¶

Fetch the reason for a job failure (e.g. error message). Returns None or reason string.

Parameters: job_id (str) -- Id of submitted job run
Returns: str
Return type: Optional[str]

check_query_status(self, job_id)[source]¶

Fetch the status of submitted job run. Returns None or one of valid query states. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr-containers.html#EMRContainers.Client.describe_job_run # noqa: E501 :param job_id: Id of submitted job run :return: str

poll_query_status(self, job_id, max_tries=None, poll_interval=30)[source]¶

Poll the status of submitted job run until query state reaches final state. Returns one of the final states.

Parameters

job_id (str) -- Id of submitted job run
max_tries (Optional[int]) -- Number of times to poll for query state before function exits
poll_interval (int) -- Time (in seconds) to wait between calls to check query status on EMR

Returns

str

Return type

Optional[str]

stop_query(self, job_id)[source]¶

Cancel the submitted job_run

Parameters: job_id (str) -- Id of submitted job_run
Returns: dict
Return type: Dict

airflow.providers.amazon.aws.hooks.emr¶

Module Contents¶

Classes¶

`airflow.providers.amazon.aws.hooks.emr`¶