airflow.providers.docker.operators.docker_swarm

Run ephemeral Docker Swarm services.

Module Contents

Classes

DockerSwarmOperator

Execute a command as an ephemeral docker swarm service.

class airflow.providers.docker.operators.docker_swarm.DockerSwarmOperator(*, image, args=None, enable_logging=True, configs=None, secrets=None, mode=None, networks=None, placement=None, container_resources=None, **kwargs)[source]

Bases: airflow.providers.docker.operators.docker.DockerOperator

Execute a command as an ephemeral docker swarm service.

Example use-case - Using Docker Swarm orchestration to make one-time scripts highly available.

A temporary directory is created on the host and mounted into a container to allow storing files that together exceed the default disk size of 10GB in a container. The path to the mounted directory can be accessed via the environment variable AIRFLOW_TMP_DIR.

If a login to a private registry is required prior to pulling the image, a Docker connection needs to be configured in Airflow and the connection ID be provided with the parameter docker_conn_id.

Parameters
  • image (str) – Docker image from which to create the container. If image tag is omitted, “latest” will be used.

  • api_version – Remote API version. Set to auto to automatically detect the server’s version.

  • auto_remove – Auto-removal of the container on daemon side when the container’s process exits. The default is False.

  • command – Command to be run in the container. (templated)

  • args (str | list[str] | None) – Arguments to the command.

  • docker_url – URL of the host running the docker daemon. Default is the value of the DOCKER_HOST environment variable or unix://var/run/docker.sock if it is unset.

  • environment – Environment variables to set in the container. (templated)

  • force_pull – Pull the docker image on every run. Default is False.

  • mem_limit – Maximum amount of memory the container can use. Either a float value, which represents the limit in bytes, or a string like 128m or 1g.

  • tls_ca_cert – Path to a PEM-encoded certificate authority to secure the docker connection.

  • tls_client_cert – Path to the PEM-encoded certificate used to authenticate docker client.

  • tls_client_key – Path to the PEM-encoded key used to authenticate docker client.

  • tls_hostname – Hostname to match against the docker server certificate or False to disable the check.

  • tls_ssl_version – Version of SSL to use when communicating with docker daemon.

  • tmp_dir – Mount point inside the container to a temporary directory created on the host by the operator. The path is also made available via the environment variable AIRFLOW_TMP_DIR inside the container.

  • user – Default user inside the docker container.

  • docker_conn_id – The Docker connection id

  • tty – Allocate pseudo-TTY to the container of this service This needs to be set see logs of the Docker container / service.

  • enable_logging (bool) – Show the application’s logs in operator’s logs. Supported only if the Docker engine is using json-file or journald logging drivers. The tty parameter should be set to use this with Python applications.

  • configs (list[docker.types.ConfigReference] | None) – List of docker configs to be exposed to the containers of the swarm service. The configs are ConfigReference objects as per the docker api [https://docker-py.readthedocs.io/en/stable/services.html#docker.models.services.ServiceCollection.create]_

  • secrets (list[docker.types.SecretReference] | None) – List of docker secrets to be exposed to the containers of the swarm service. The secrets are SecretReference objects as per the docker create_service api. [https://docker-py.readthedocs.io/en/stable/services.html#docker.models.services.ServiceCollection.create]_

  • mode (docker.types.ServiceMode | None) – Indicate whether a service should be deployed as a replicated or global service, and associated parameters

  • networks (list[str | docker.types.NetworkAttachmentConfig] | None) – List of network names or IDs or NetworkAttachmentConfig to attach the service to.

  • placement (docker.types.Placement | list[docker.types.Placement] | None) – Placement instructions for the scheduler. If a list is passed instead, it is assumed to be a list of constraints as part of a Placement object.

  • container_resources (docker.types.Resources | None) – Resources for the launched container. The resources are Resources as per the docker api [https://docker-py.readthedocs.io/en/stable/api.html#docker.types.Resources]_ This parameter has precedence on the mem_limit parameter.

execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

static format_args(args)[source]

Retrieve args.

The args string is parsed to a list.

Parameters

args (list[str] | str | None) – args to the docker service

Returns

the args as list

Return type

list[str] | None

on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

Was this entry helpful?