If you are using the default entrypoint of the production image, there are a few actions that are automatically performed when the container starts. In some cases, you can pass environment variables to the image to trigger some of that behaviour.
The variables that control the "execution" behaviour start with
_AIRFLOW to distinguish them
from the variables used to build the image starting with
Allowing arbitrary user to run the container¶
Airflow image is Open-Shift compatible, which means that you can start it with random user ID and the
root). If you want to run the image with user different than Airflow, you MUST set
GID of the user to
0. In case you try to use different group, the entrypoint exits with error.
OpenShift randomly assigns UID when it starts the container, but you can utilise this flexible UID
also in case of running the image manually. This might be useful for example in case you want to
logs folders from host system on Linux, in which case the UID should be set
the same ID as your host user.
This can be achieved in various ways - you can change USER when you extend or customize the image or
you can dynamically pass the user to
docker run command, by adding
--user flag in one of
those formats (See Docker Run reference for details):
[ user | user:group | uid | uid:gid | user:gid | uid:group ]
In case of Docker Compose environment it can be changed via
user: entry in the
See Docker compose reference
for details. In our Quickstart Guide using Docker-Compose, the UID and GID can be passed via
AIRFLOW_GID variables as described in
Initializing docker compose environment.
GID is set to
0, the user can be any UID, but in case UID is different than the default
airflow (UID=50000), the user will be automatically created when entering the container.
In order to accommodate a number of external libraries and projects, Airflow will automatically create
such an arbitrary user in (/etc/passwd) and make it's home directory point to
Many of 3rd-party libraries and packages require home directory of the user to be present, because they
need to write some cache information there, so such a dynamic creation of a user is necessary.
Such arbitrary user has to be able to write to certain directories that needs write access, and since
it is not advised to allow write access to "other" for security reasons, the OpenShift
guidelines introduced the concept of making all such folders have the
root) group id (GID).
All the directories that need write access in the Airflow production image have GID set to 0 (and
they are writable for the group). We are following that concept and all the directories that need
write access follow that.
The GID=0 is set as default for the
airflow user, so any directories it creates have GID set to 0
by default. The entrypoint sets
umask to be
0002 - this means that any directories created by
the user have also "group write" access for group
0 - they will be writable by other users with
root group. Also whenever any "arbitrary" user creates a folder (for example in a mounted volume), that
folder will have a "group write" access and
GID=0, so that execution with another, arbitrary user
will still continue to work, even if such directory is mounted by another arbitrary user later.
umask setting however only works for runtime of the container - it is not used during building of
the image. If you would like to extend the image and add your own packages, you should remember to add
umask 0002 in front of your docker command - this way the directories created by any installation
that need group access will also be writable for the group. This can be done for example this way:
RUN umask 0002; \ do_something; \ do_otherthing;
You can read more about it in the "Support arbitrary user ids" chapter in the Openshift best practices.
Waits for Airflow DB connection¶
The entrypoint is waiting for a connection to the database independent of the database engine. This allows us to increase the stability of the environment.
Waiting for connection involves executing
airflow db check command, which means that a
select 1 as is_alive; statement
is executed. Then it loops until the the command will be successful.
CONNECTION_CHECK_MAX_COUNT times and sleeps
CONNECTION_CHECK_SLEEP_TIME between checks
To disable check, set
Waits for celery broker connection¶
In case CeleryExecutor is used, and one of the
commands are used the entrypoint will wait until the celery broker DB connection is available.
The script detects backend type depending on the URL schema and assigns default port numbers if not specified
in the URL. Then it loops until connection to the host/port specified can be established
CONNECTION_CHECK_MAX_COUNT times and sleeps
CONNECTION_CHECK_SLEEP_TIME between checks.
To disable check, set
amqp(s)://(rabbitmq) - default port 5672
redis://- default port 6379
postgres://- default port 5432
mysql://- default port 3306
Waiting for connection involves checking if a matching port is open. The host information is derived from the Airflow configuration.
If first argument equals to "bash" - you are dropped to a bash shell or you can executes bash command if you specify extra arguments. For example:
docker run -it apache/airflow:2.1.0-python3.6 bash -c "ls -la" total 16 drwxr-xr-x 4 airflow root 4096 Jun 5 18:12 . drwxr-xr-x 1 root root 4096 Jun 5 18:12 .. drwxr-xr-x 2 airflow root 4096 Jun 5 18:12 dags drwxr-xr-x 2 airflow root 4096 Jun 5 18:12 logs
If first argument is equal to
python - you are dropped in python shell or python commands are executed if
you pass extra parameters. For example:
> docker run -it apache/airflow:2.1.0-python3.6 python -c "print('test')" test
If first argument equals to "airflow" - the rest of the arguments is treated as an airflow command to execute. Example:
docker run -it apache/airflow:2.1.0-python3.6 airflow webserver
If there are any other arguments - they are simply passed to the "airflow" command
> docker run -it apache/airflow:2.1.0-python3.6 version 2.1.0
dumb-init to run as "init" in the entrypoint. This is in order to propagate
signals and reap child processes properly. This means that the process that you run does not have
to install signal handlers to work properly and be killed when the container is gracefully terminated.
The behaviour of signal propagation is configured by
DUMB_INIT_SETSID variable which is set to
1 by default - meaning that the signals will be propagated to the whole process group, but you can
set it to
0 to enable
single-child behaviour of
dumb-init which only propagates the
signals to only single child process.
The table below summarizes
DUMB_INIT_SETSID possible values and their use cases.
Propagates signals to all processes in the process group of the main process running in the container.
If you run your processes via
Propagates signals to the main process only.
This is useful if your main process handles signals gracefully.
A good example is warm shutdown of Celery workers. The
For Airflow's Celery worker, you should set the variable to 0
and either use
Additional quick test options¶
The options below are mostly used for quick testing the image - for example with quick-start docker-compose or when you want to perform a local test with new packages added. They are not supposed to be run in the production environment as they add additional overhead for execution of additional commands. Those options in production should be realized either as maintenance operations on the database or should be embedded in the custom image used (when you want to add new packages).
Upgrading Airflow DB¶
If you set
_AIRFLOW_DB_UPGRADE variable to a non-empty value, the entrypoint will run
airflow db upgrade command right after verifying the connection. You can also use this
when you are running airflow with internal SQLite database (default) to upgrade the db and create
admin users at entrypoint, so that you can start the webserver immediately. Note - using SQLite is
intended only for testing purpose, never use SQLite in production as it has severe limitations when it
comes to concurrency.
Creating admin user¶
The entrypoint can also create webserver user automatically when you enter it. you need to set
_AIRFLOW_WWW_USER_CREATE to a non-empty value in order to do that. This is not intended for
production, it is only useful if you would like to run a quick test with the production image.
You need to pass at least password to create such user via
_AIRFLOW_WWW_USER_PASSWORD_CMD similarly like for other
*_CMD variables, the content of
*_CMD will be evaluated as shell command and it's output will be set as password.
User creation will fail if none of the
PASSWORD variables are set - there is no default for
password for security reasons.
In case the password is specified, the user will be attempted to be created, but the entrypoint will not fail if the attempt fails (this accounts for the case that the user is already created).
You can, for example start the webserver in the production image with initializing the internal SQLite
database and creating an
admin/admin Admin user with the following command:
docker run -it -p 8080:8080 \ --env "_AIRFLOW_DB_UPGRADE=true" \ --env "_AIRFLOW_WWW_USER_CREATE=true" \ --env "_AIRFLOW_WWW_USER_PASSWORD=admin" \ apache/airflow:main-python3.8 webserver
docker run -it -p 8080:8080 \ --env "_AIRFLOW_DB_UPGRADE=true" \ --env "_AIRFLOW_WWW_USER_CREATE=true" \ --env "_AIRFLOW_WWW_USER_PASSWORD_CMD=echo admin" \ apache/airflow:main-python3.8 webserver
The commands above perform initialization of the SQLite database, create admin user with admin password
and Admin role. They also forward local port
8080 to the webserver port and finally start the webserver.
Installing additional requirements¶
Installing additional requirements can be done by specifying
The variable should contain a list of requirements that should be installed additionally when entering
the containers. Note that this option slows down starting of Airflow as every time any container starts
it must install new packages. Therefore this option should only be used for testing. When testing is
finished, you should create your custom image with dependencies baked in.
docker run -it -p 8080:8080 \ --env "_PIP_ADDITIONAL_REQUIREMENTS=lxml==4.6.3 charset-normalizer==1.4.1" \ --env "_AIRFLOW_DB_UPGRADE=true" \ --env "_AIRFLOW_WWW_USER_CREATE=true" \ --env "_AIRFLOW_WWW_USER_PASSWORD_CMD=echo admin" \ apache/airflow:master-python3.8 webserver
This method is only available starting from Docker image of Airflow 2.1.1 and above.