Running Airflow in Docker

This quick-start guide will allow you to quickly start Airflow with CeleryExecutor in Docker. This is the fastest way to start Airflow.

Before you begin

Follow these steps to install the necessary tools.

  1. Install Docker Community Edition (CE) on your workstation.

  2. Install Docker Compose v1.27.0 and newer on your workstation.

Older versions of docker-compose do not support all features required by docker-compose.yaml file, so double check that it meets the minimum version requirements.

docker-compose.yaml

To deploy Airflow on Docker Compose, you should fetch docker-compose.yaml.

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.0.1/docker-compose.yaml'

This file contains several service definitions:

  • airflow-scheduler - The scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete.

  • airflow-webserver - The webserver available at http://localhost:8080.

  • airflow-worker - The worker that executes the tasks given by the scheduler.

  • airflow-init - The initialization service.

  • flower - The flower app for monitoring the environment. It is available at http://localhost:8080.

  • postgres - The database.

  • redis - The redis - broker that forwards messages from scheduler to worker.

All these services allow you to run Airflow with CeleryExecutor. For more information, see Basic Airflow architecture.

Some directories in the container are mounted, which means that their contents are synchronized between your computer and the container.

  • ./dags - you can put your DAG files here.

  • ./logs - contains logs from task execution and scheduler.

  • ./plugins - you can put your custom plugins here.

Initializing Environment

Before starting Airflow for the first time, You need to prepare your environment, i.e. create the necessary files, directories and initialize the database.

On Linux, the mounted volumes in container use the native Linux filesystem user/group permissions, so you have to make sure the container and host computer have matching file permissions.

mkdir ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env

On all operating system, you need to run database migrations and create the first user account. To do it, run.

docker-compose up airflow-init

After initialization is complete, you should see a message like below.

airflow-init_1       | Upgrades done
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.0.1
start_airflow-init_1 exited with code 0

The account created has the login airflow and the password airflow.

Running Airflow

Now you can start all services:

docker-compose up

In the second terminal you can check the condition of the containers and make sure that no containers are in unhealthy condition:

$ docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                    PORTS                              NAMES
247ebe6cf87a   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   3 minutes ago    Up 3 minutes              8080/tcp                           compose_airflow-worker_1
ed9b09fc84b1   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   3 minutes ago    Up 3 minutes              8080/tcp                           compose_airflow-scheduler_1
65ac1da2c219   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   3 minutes ago    Up 3 minutes (healthy)    0.0.0.0:5555->5555/tcp, 8080/tcp   compose_flower_1
7cb1fb603a98   apache/airflow:2.0.1   "/usr/bin/dumb-init …"   3 minutes ago    Up 3 minutes (healthy)    0.0.0.0:8080->8080/tcp             compose_airflow-webserver_1
74f3bbe506eb   postgres:13            "docker-entrypoint.s…"   18 minutes ago   Up 17 minutes (healthy)   5432/tcp                           compose_postgres_1
0bd6576d23cb   redis:latest           "docker-entrypoint.s…"   10 hours ago     Up 17 minutes (healthy)   0.0.0.0:6379->6379/tcp             compose_redis_1

Accessing the environment

After starting Airflow, you can interact with it in 3 ways;

Running the CLI commands

You can also run CLI commands, but you have to do it in one of the defined airflow-* services. For example, to run airflow info, run the following command:

docker-compose run airflow-worker airflow info

If you have Linux or Mac OS, you can make your work easier and download a optional wrapper scripts that will allow you to run commands with a simpler command.

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.0.1/airflow.sh'
chmod +x airflow.sh

Now you can run commands easier.

./airflow.sh info

You can also use bash as parameter to enter interactive bash shell in the container or python to enter python container.

./airflow.sh bash
./airflow.sh python

Accessing the web interface

Once the cluster has started up, you can log in to the web interface and try to run some tasks.

The webserver available at: http://localhost:8080. The default account has the login airflow and the password airflow.

Sending requests to the REST API

Basic username password authentication is currently supported for the REST API, which means you can use common tools to send requests to the API.

The webserver available at: http://localhost:8080. The default account has the login airflow and the password airflow.

Here is a sample curl command, which sends a request to retrieve a pool list:

ENDPOINT_URL="http://localhost:8080/"
curl -X GET  \
    --user "airflow:airflow" \
    "${ENDPOINT_URL}/api/v1/pools"

Cleaning up

To stop and delete containers, delete volumes with database data and download images, run:

docker-compose down --volumes --rmi all

Notes

By default, the Docker Compose file uses the latest Airflow image (apache/airflow). If you need, you can customize and extend it.

What’s Next?

From this point, you can head to the Tutorial section for further examples or the How-to Guides section if you’re ready to get your hands dirty.

Was this entry helpful?