`airflow.providers.amazon.aws.operators.emr`¶

Module Contents¶

Classes¶

`EmrAddStepsOperator`	An operator that adds steps to an existing EMR job_flow.
`EmrContainerOperator`	An operator that submits jobs to EMR on EKS virtual clusters.
`EmrCreateJobFlowOperator`	Creates an EMR JobFlow, reading the config from the EMR connection.
`EmrModifyClusterOperator`	An operator that modifies an existing EMR cluster.
`EmrTerminateJobFlowOperator`	Operator to terminate EMR JobFlows.
`EmrServerlessCreateApplicationOperator`	Operator to create Serverless EMR Application
`EmrServerlessStartJobOperator`	Operator to start EMR Serverless job.
`EmrServerlessDeleteApplicationOperator`	Operator to delete EMR Serverless application

class airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator(*, job_flow_id=None, job_flow_name=None, cluster_states=None, aws_conn_id='aws_default', steps=None, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

An operator that adds steps to an existing EMR job_flow.

See also

For more information on how to use this operator, take a look at the guide: Add Steps to an EMR job flow

Parameters

job_flow_id (Optional[str]) – id of the JobFlow to add steps to. (templated)
job_flow_name (Optional[str]) – name of the JobFlow to add steps to. Use as an alternative to passing job_flow_id. will search for id of JobFlow with matching name in one of the states in param cluster_states. Exactly one cluster like this should exist or will fail. (templated)
cluster_states (Optional[List[str]]) – Acceptable cluster states when searching for JobFlow id by job_flow_name. (templated)
aws_conn_id (str) – aws connection to uses
steps (Optional[Union[List[dict], str]]) – boto3 style steps or reference to a steps file (must be ‘.json’) to be added to the jobflow. (templated)
do_xcom_push – if True, job_flow_id is pushed to XCom with key job_flow_id.

template_fields :Sequence[str] = ['job_flow_id', 'job_flow_name', 'cluster_states', 'steps'][source]¶

template_ext :Sequence[str] = ['.json'][source]¶

template_fields_renderers[source]¶

ui_color = #f9c915[source]¶

operator_extra_links[source]¶

execute(context)[source]¶

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrContainerOperator(*, name, virtual_cluster_id, execution_role_arn, release_label, job_driver, configuration_overrides=None, client_request_token=None, aws_conn_id='aws_default', wait_for_completion=True, poll_interval=30, max_tries=None, tags=None, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

An operator that submits jobs to EMR on EKS virtual clusters.

See also

For more information on how to use this operator, take a look at the guide: Submit a job to an Amazon EMR virtual cluster

Parameters

name (str) – The name of the job run.
virtual_cluster_id (str) – The EMR on EKS virtual cluster ID
execution_role_arn (str) – The IAM role ARN associated with the job run.
release_label (str) – The Amazon EMR release version to use for the job run.
job_driver (dict) – Job configuration details, e.g. the Spark job parameters.
configuration_overrides (Optional[dict]) – The configuration overrides for the job run, specifically either application configuration or monitoring configuration.
client_request_token (Optional[str]) – The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. If no token is provided, a UUIDv4 token will be generated for you.
aws_conn_id (str) – The Airflow connection used for AWS credentials.
wait_for_completion (bool) – Whether or not to wait in the operator for the job to complete.
poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check query status on EMR
max_tries (Optional[int]) – Maximum number of times to wait for the job run to finish. Defaults to None, which will poll until the job is not in a pending, submitted, or running state.
tags (Optional[dict]) – The tags assigned to job runs. Defaults to None

template_fields :Sequence[str] = ['name', 'virtual_cluster_id', 'execution_role_arn', 'release_label', 'job_driver'][source]¶

ui_color = #f9c915[source]¶

hook()[source]¶

Create and return an EmrContainerHook.

execute(context)[source]¶

Run job on EMR Containers

on_kill()[source]¶

Cancel the submitted job run

class airflow.providers.amazon.aws.operators.emr.EmrCreateJobFlowOperator(*, aws_conn_id='aws_default', emr_conn_id='emr_default', job_flow_overrides=None, region_name=None, **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Creates an EMR JobFlow, reading the config from the EMR connection. A dictionary of JobFlow overrides can be passed that override the config from the connection.

See also

For more information on how to use this operator, take a look at the guide: Create an EMR job flow

Parameters

aws_conn_id (str) – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node)
emr_conn_id (str) – emr connection to use for run_job_flow request body. This will be overridden by the job_flow_overrides param
job_flow_overrides (Optional[Union[str, Dict[str, Any]]]) – boto3 style arguments or reference to an arguments file (must be ‘.json’) to override emr_connection extra. (templated)
region_name (Optional[str]) – Region named passed to EmrHook

template_fields :Sequence[str] = ['job_flow_overrides'][source]¶

template_ext :Sequence[str] = ['.json'][source]¶

template_fields_renderers[source]¶

ui_color = #f9c915[source]¶

operator_extra_links[source]¶

execute(context)[source]¶

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator(*, cluster_id, step_concurrency_level, aws_conn_id='aws_default', **kwargs)[source]¶

Bases: airflow.models.BaseOperator

An operator that modifies an existing EMR cluster.

See also

For more information on how to use this operator, take a look at the guide: Modify Amazon EMR container

Parameters

cluster_id (str) – cluster identifier
step_concurrency_level (int) – Concurrency of the cluster
aws_conn_id (str) – aws connection to uses
do_xcom_push – if True, cluster_id is pushed to XCom with key cluster_id.

template_fields :Sequence[str] = ['cluster_id', 'step_concurrency_level'][source]¶

template_ext :Sequence[str] = [][source]¶

ui_color = #f9c915[source]¶

operator_extra_links[source]¶

execute(context)[source]¶

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator(*, job_flow_id, aws_conn_id='aws_default', **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Operator to terminate EMR JobFlows.

See also

For more information on how to use this operator, take a look at the guide: Terminate an EMR job flow

Parameters

job_flow_id (str) – id of the JobFlow to terminate. (templated)
aws_conn_id (str) – aws connection to uses

template_fields :Sequence[str] = ['job_flow_id'][source]¶

template_ext :Sequence[str] = [][source]¶

ui_color = #f9c915[source]¶

operator_extra_links[source]¶

execute(context)[source]¶

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrServerlessCreateApplicationOperator(release_label, job_type, client_request_token='', config=None, wait_for_completion=True, aws_conn_id='aws_default', **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Operator to create Serverless EMR Application

See also

For more information on how to use this operator, take a look at the guide: Create an EMR Serverless Application

Parameters

release_label (str) – The EMR release version associated with the application.
job_type (str) – The type of application you want to start, such as Spark or Hive.
wait_for_completion (bool) – If true, wait for the Application to start before returning. Default to True
client_request_token (str) – The client idempotency token of the application to create. Its value must be unique for each request.
config (Optional[dict]) – Optional dictionary for arbitrary parameters to the boto API create_application call.
aws_conn_id (str) – AWS connection to use

hook()[source]¶

Create and return an EmrServerlessHook.

execute(context)[source]¶

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrServerlessStartJobOperator(application_id, execution_role_arn, job_driver, configuration_overrides, client_request_token='', config=None, wait_for_completion=True, aws_conn_id='aws_default', **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Operator to start EMR Serverless job.

See also

For more information on how to use this operator, take a look at the guide: Start an EMR Serverless Job

Parameters

application_id (str) – ID of the EMR Serverless application to start.
execution_role_arn (str) – ARN of role to perform action.
job_driver (dict) – Driver that the job runs on.
configuration_overrides (Optional[dict]) – Configuration specifications to override existing configurations.
client_request_token (str) – The client idempotency token of the application to create. Its value must be unique for each request.
config (Optional[dict]) – Optional dictionary for arbitrary parameters to the boto API start_job_run call.
wait_for_completion (bool) – If true, waits for the job to start before returning. Defaults to True.
aws_conn_id (str) – AWS connection to use

template_fields :Sequence[str] = ['application_id', 'execution_role_arn', 'job_driver', 'configuration_overrides'][source]¶

hook()[source]¶

Create and return an EmrServerlessHook.

execute(context)[source]¶

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrServerlessDeleteApplicationOperator(application_id, wait_for_completion=True, aws_conn_id='aws_default', **kwargs)[source]¶

Bases: airflow.models.BaseOperator

Operator to delete EMR Serverless application

See also

For more information on how to use this operator, take a look at the guide: Delete an EMR Serverless Application

Parameters

application_id (str) – ID of the EMR Serverless application to delete.
wait_for_completion (bool) – If true, wait for the Application to start before returning. Default to True
aws_conn_id (str) – AWS connection to use

template_fields :Sequence[str] = ['application_id'][source]¶

hook()[source]¶

Create and return an EmrServerlessHook.

execute(context)[source]¶

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

airflow.providers.amazon.aws.operators.emr¶

Module Contents¶

Classes¶

`airflow.providers.amazon.aws.operators.emr`¶