airflow.providers.amazon.aws.operators.emr

Module Contents

Classes

EmrAddStepsOperator

An operator that adds steps to an existing EMR job_flow.

EmrContainerOperator

An operator that submits jobs to EMR on EKS virtual clusters.

EmrCreateJobFlowOperator

Creates an EMR JobFlow, reading the config from the EMR connection.

EmrModifyClusterOperator

An operator that modifies an existing EMR cluster.

EmrTerminateJobFlowOperator

Operator to terminate EMR JobFlows.

EmrServerlessCreateApplicationOperator

Operator to create Serverless EMR Application

EmrServerlessStartJobOperator

Operator to start EMR Serverless job.

EmrServerlessDeleteApplicationOperator

Operator to delete EMR Serverless application

class airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator(*, job_flow_id=None, job_flow_name=None, cluster_states=None, aws_conn_id='aws_default', steps=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

An operator that adds steps to an existing EMR job_flow.

See also

For more information on how to use this operator, take a look at the guide: Add Steps to an EMR job flow

Parameters
  • job_flow_id (Optional[str]) – id of the JobFlow to add steps to. (templated)

  • job_flow_name (Optional[str]) – name of the JobFlow to add steps to. Use as an alternative to passing job_flow_id. will search for id of JobFlow with matching name in one of the states in param cluster_states. Exactly one cluster like this should exist or will fail. (templated)

  • cluster_states (Optional[List[str]]) – Acceptable cluster states when searching for JobFlow id by job_flow_name. (templated)

  • aws_conn_id (str) – aws connection to uses

  • steps (Optional[Union[List[dict], str]]) – boto3 style steps or reference to a steps file (must be ‘.json’) to be added to the jobflow. (templated)

  • do_xcom_push – if True, job_flow_id is pushed to XCom with key job_flow_id.

template_fields :Sequence[str] = ['job_flow_id', 'job_flow_name', 'cluster_states', 'steps'][source]
template_ext :Sequence[str] = ['.json'][source]
template_fields_renderers[source]
ui_color = #f9c915[source]
execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrContainerOperator(*, name, virtual_cluster_id, execution_role_arn, release_label, job_driver, configuration_overrides=None, client_request_token=None, aws_conn_id='aws_default', wait_for_completion=True, poll_interval=30, max_tries=None, tags=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

An operator that submits jobs to EMR on EKS virtual clusters.

See also

For more information on how to use this operator, take a look at the guide: Submit a job to an Amazon EMR virtual cluster

Parameters
  • name (str) – The name of the job run.

  • virtual_cluster_id (str) – The EMR on EKS virtual cluster ID

  • execution_role_arn (str) – The IAM role ARN associated with the job run.

  • release_label (str) – The Amazon EMR release version to use for the job run.

  • job_driver (dict) – Job configuration details, e.g. the Spark job parameters.

  • configuration_overrides (Optional[dict]) – The configuration overrides for the job run, specifically either application configuration or monitoring configuration.

  • client_request_token (Optional[str]) – The client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. If no token is provided, a UUIDv4 token will be generated for you.

  • aws_conn_id (str) – The Airflow connection used for AWS credentials.

  • wait_for_completion (bool) – Whether or not to wait in the operator for the job to complete.

  • poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check query status on EMR

  • max_tries (Optional[int]) – Maximum number of times to wait for the job run to finish. Defaults to None, which will poll until the job is not in a pending, submitted, or running state.

  • tags (Optional[dict]) – The tags assigned to job runs. Defaults to None

template_fields :Sequence[str] = ['name', 'virtual_cluster_id', 'execution_role_arn', 'release_label', 'job_driver'][source]
ui_color = #f9c915[source]
hook()[source]

Create and return an EmrContainerHook.

execute(context)[source]

Run job on EMR Containers

on_kill()[source]

Cancel the submitted job run

class airflow.providers.amazon.aws.operators.emr.EmrCreateJobFlowOperator(*, aws_conn_id='aws_default', emr_conn_id='emr_default', job_flow_overrides=None, region_name=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates an EMR JobFlow, reading the config from the EMR connection. A dictionary of JobFlow overrides can be passed that override the config from the connection.

See also

For more information on how to use this operator, take a look at the guide: Create an EMR job flow

Parameters
  • aws_conn_id (str) – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node)

  • emr_conn_id (str) – emr connection to use for run_job_flow request body. This will be overridden by the job_flow_overrides param

  • job_flow_overrides (Optional[Union[str, Dict[str, Any]]]) – boto3 style arguments or reference to an arguments file (must be ‘.json’) to override emr_connection extra. (templated)

  • region_name (Optional[str]) – Region named passed to EmrHook

template_fields :Sequence[str] = ['job_flow_overrides'][source]
template_ext :Sequence[str] = ['.json'][source]
template_fields_renderers[source]
ui_color = #f9c915[source]
execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator(*, cluster_id, step_concurrency_level, aws_conn_id='aws_default', **kwargs)[source]

Bases: airflow.models.BaseOperator

An operator that modifies an existing EMR cluster.

See also

For more information on how to use this operator, take a look at the guide: Modify Amazon EMR container

Parameters
  • cluster_id (str) – cluster identifier

  • step_concurrency_level (int) – Concurrency of the cluster

  • aws_conn_id (str) – aws connection to uses

  • do_xcom_push – if True, cluster_id is pushed to XCom with key cluster_id.

template_fields :Sequence[str] = ['cluster_id', 'step_concurrency_level'][source]
template_ext :Sequence[str] = [][source]
ui_color = #f9c915[source]
execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator(*, job_flow_id, aws_conn_id='aws_default', **kwargs)[source]

Bases: airflow.models.BaseOperator

Operator to terminate EMR JobFlows.

See also

For more information on how to use this operator, take a look at the guide: Terminate an EMR job flow

Parameters
  • job_flow_id (str) – id of the JobFlow to terminate. (templated)

  • aws_conn_id (str) – aws connection to uses

template_fields :Sequence[str] = ['job_flow_id'][source]
template_ext :Sequence[str] = [][source]
ui_color = #f9c915[source]
execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrServerlessCreateApplicationOperator(release_label, job_type, client_request_token='', config=None, wait_for_completion=True, aws_conn_id='aws_default', **kwargs)[source]

Bases: airflow.models.BaseOperator

Operator to create Serverless EMR Application

See also

For more information on how to use this operator, take a look at the guide: Create an EMR Serverless Application

Parameters
  • release_label (str) – The EMR release version associated with the application.

  • job_type (str) – The type of application you want to start, such as Spark or Hive.

  • wait_for_completion (bool) – If true, wait for the Application to start before returning. Default to True

  • client_request_token (str) – The client idempotency token of the application to create. Its value must be unique for each request.

  • config (Optional[dict]) – Optional dictionary for arbitrary parameters to the boto API create_application call.

  • aws_conn_id (str) – AWS connection to use

hook()[source]

Create and return an EmrServerlessHook.

execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrServerlessStartJobOperator(application_id, execution_role_arn, job_driver, configuration_overrides, client_request_token='', config=None, wait_for_completion=True, aws_conn_id='aws_default', **kwargs)[source]

Bases: airflow.models.BaseOperator

Operator to start EMR Serverless job.

See also

For more information on how to use this operator, take a look at the guide: Start an EMR Serverless Job

Parameters
  • application_id (str) – ID of the EMR Serverless application to start.

  • execution_role_arn (str) – ARN of role to perform action.

  • job_driver (dict) – Driver that the job runs on.

  • configuration_overrides (Optional[dict]) – Configuration specifications to override existing configurations.

  • client_request_token (str) – The client idempotency token of the application to create. Its value must be unique for each request.

  • config (Optional[dict]) – Optional dictionary for arbitrary parameters to the boto API start_job_run call.

  • wait_for_completion (bool) – If true, waits for the job to start before returning. Defaults to True.

  • aws_conn_id (str) – AWS connection to use

template_fields :Sequence[str] = ['application_id', 'execution_role_arn', 'job_driver', 'configuration_overrides'][source]
hook()[source]

Create and return an EmrServerlessHook.

execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.emr.EmrServerlessDeleteApplicationOperator(application_id, wait_for_completion=True, aws_conn_id='aws_default', **kwargs)[source]

Bases: airflow.models.BaseOperator

Operator to delete EMR Serverless application

See also

For more information on how to use this operator, take a look at the guide: Delete an EMR Serverless Application

Parameters
  • application_id (str) – ID of the EMR Serverless application to delete.

  • wait_for_completion (bool) – If true, wait for the Application to start before returning. Default to True

  • aws_conn_id (str) – AWS connection to use

template_fields :Sequence[str] = ['application_id'][source]
hook()[source]

Create and return an EmrServerlessHook.

execute(context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?