airflow.providers.amazon.aws.operators.batch

AWS Batch services.

Module Contents

Classes

BatchOperator

Execute a job on AWS Batch.

BatchCreateComputeEnvironmentOperator

Create an AWS Batch compute environment.

class airflow.providers.amazon.aws.operators.batch.BatchOperator(*, job_name, job_definition, job_queue, overrides=None, container_overrides=None, array_properties=None, node_overrides=None, share_identifier=None, scheduling_priority_override=None, parameters=None, retry_strategy=None, job_id=None, waiters=None, max_retries=4200, status_retries=None, aws_conn_id=None, region_name=None, tags=None, wait_for_completion=True, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), poll_interval=30, awslogs_enabled=False, awslogs_fetch_interval=timedelta(seconds=30), **kwargs)[source]

Bases: airflow.models.BaseOperator

Execute a job on AWS Batch.

See also

For more information on how to use this operator, take a look at the guide: Submit a new AWS Batch job

Parameters
  • job_name (str) – the name for the job that will run on AWS Batch (templated)

  • job_definition (str) – the job definition name on AWS Batch

  • job_queue (str) – the queue name on AWS Batch

  • overrides (dict | None) – DEPRECATED, use container_overrides instead with the same value.

  • container_overrides (dict | None) – the containerOverrides parameter for boto3 (templated)

  • node_overrides (dict | None) – the nodeOverrides parameter for boto3 (templated)

  • share_identifier (str | None) – The share identifier for the job. Don’t specify this parameter if the job queue doesn’t have a scheduling policy.

  • scheduling_priority_override (int | None) – The scheduling priority for the job. Jobs with a higher scheduling priority are scheduled before jobs with a lower scheduling priority. This overrides any scheduling priority in the job definition

  • array_properties (dict | None) – the arrayProperties parameter for boto3

  • parameters (dict | None) – the parameters for boto3 (templated)

  • job_id (str | None) – the job ID, usually unknown (None) until the submit_job operation gets the jobId defined by AWS Batch

  • waiters (Any | None) – an BatchWaiters object (see note below); if None, polling is used with max_retries and status_retries.

  • max_retries (int) – exponential back-off retries, 4200 = 48 hours; polling is only used when waiters is None

  • status_retries (int | None) – number of HTTP retries to get job status, 10; polling is only used when waiters is None

  • aws_conn_id (str | None) – connection id of AWS credentials / region name. If None, credential boto3 strategy will be used.

  • region_name (str | None) – region name to use in AWS Hook. Override the region_name in connection (if provided)

  • tags (dict | None) – collection of tags to apply to the AWS Batch job submission if None, no tags are submitted

  • deferrable (bool) – Run operator in the deferrable mode.

  • awslogs_enabled (bool) – Specifies whether logs from CloudWatch should be printed or not, False. If it is an array job, only the logs of the first task will be printed.

  • awslogs_fetch_interval (datetime.timedelta) – The interval with which cloudwatch logs are to be fetched, 30 sec.

  • poll_interval (int) – (Deferrable mode only) Time in seconds to wait between polling.

Note

Any custom waiters must return a waiter for these calls: .. code-block:: python

waiter = waiters.get_waiter(“JobExists”) waiter = waiters.get_waiter(“JobRunning”) waiter = waiters.get_waiter(“JobComplete”)

ui_color = '#c3dae0'[source]
arn: str | None[source]
template_fields: Sequence[str] = ('job_id', 'job_name', 'job_definition', 'job_queue', 'container_overrides', 'array_properties',...[source]
template_fields_renderers[source]
hook()[source]
execute(context)[source]

Submit and monitor an AWS Batch job.

Raises

AirflowException

execute_complete(context, event=None)[source]
on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

submit_job(context)[source]

Submit an AWS Batch job.

Raises

AirflowException

monitor_job(context)[source]

Monitor an AWS Batch job.

This can raise an exception or an AirflowTaskTimeout if the task was created with execution_timeout.

class airflow.providers.amazon.aws.operators.batch.BatchCreateComputeEnvironmentOperator(compute_environment_name, environment_type, state, compute_resources, unmanaged_v_cpus=None, service_role=None, tags=None, poll_interval=30, max_retries=None, aws_conn_id=None, region_name=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), status_retries=NOTSET, **kwargs)[source]

Bases: airflow.models.BaseOperator

Create an AWS Batch compute environment.

See also

For more information on how to use this operator, take a look at the guide: Create an AWS Batch compute environment

Parameters
  • compute_environment_name (str) – Name of the AWS batch compute environment (templated).

  • environment_type (str) – Type of the compute-environment.

  • state (str) – State of the compute-environment.

  • compute_resources (dict) – Details about the resources managed by the compute-environment (templated). More details: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/batch.html#Batch.Client.create_compute_environment

  • unmanaged_v_cpus (int | None) – Maximum number of vCPU for an unmanaged compute environment. This parameter is only supported when the type parameter is set to UNMANAGED.

  • service_role (str | None) – IAM role that allows Batch to make calls to other AWS services on your behalf (templated).

  • tags (dict | None) – Tags that you apply to the compute-environment to help you categorize and organize your resources.

  • poll_interval (int) – How long to wait in seconds between 2 polls at the environment status. Only useful when deferrable is True.

  • max_retries (int | None) – How many times to poll for the environment status. Only useful when deferrable is True.

  • aws_conn_id (str | None) – Connection ID of AWS credentials / region name. If None, credential boto3 strategy will be used.

  • region_name (str | None) – Region name to use in AWS Hook. Overrides the region_name in connection if provided.

  • deferrable (bool) – If True, the operator will wait asynchronously for the environment to be created. This mode requires aiobotocore module to be installed. (default: False)

template_fields: Sequence[str] = ('compute_environment_name', 'compute_resources', 'service_role')[source]
template_fields_renderers[source]
hook()[source]

Create and return a BatchClientHook.

execute(context)[source]

Create an AWS batch compute environment.

execute_complete(context, event=None)[source]

Was this entry helpful?