Installation

Getting Airflow

Airflow is published as apache-airflow package in PyPI. Installing it however might be sometimes tricky because Airflow is a bit of both a library and application. Libraries usually keep their dependencies open and applications usually pin them, but we should do neither and both at the same time. We decided to keep our dependencies as open as possible (in setup.py) so users can install different version of libraries if needed. This means that from time to time plain pip install apache-airflow will not work or will produce unusable Airflow installation.

In order to have repeatable installation, however, starting from Airflow 1.10.10 and updated in Airflow 1.10.12 we also keep a set of “known-to-be-working” constraint files in the constraints-master and constraints-1-10 orphan branches. Those “known-to-be-working” constraints are per major/minor python version. You can use them as constraint files when installing Airflow from PyPI. Note that you have to specify correct Airflow version and python versions in the URL.

Prerequisites

On Debian based Linux OS:

sudo apt-get update
sudo apt-get install build-essential
  1. Installing just airflow

pip install \
 apache-airflow==1.10.12 \
 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-1.10.12/constraints-3.7.txt"
  1. Installing with extras (for example postgres, gcp)

pip install \
 apache-airflow[postgres,gcp]==1.10.12 \
 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-1.10.12/constraints-3.7.txt"

You need certain system level requirements in order to install Airflow. Those are requirements that are known to be needed for Linux system (Tested on Ubuntu Buster LTS) :

sudo apt-get install -y --no-install-recommends \
        freetds-bin \
        krb5-user \
        ldap-utils \
        libffi6 \
        libsasl2-2 \
        libsasl2-modules \
        libssl1.1 \
        locales  \
        lsb-release \
        sasl2-bin \
        sqlite3 \
        unixodbc

You also need database client packages (Postgres or MySQL) if you want to use those databases.

If the airflow command is not getting recognized (can happen on Windows when using WSL), then ensure that ~/.local/bin is in your PATH environment variable, and add it in if necessary:

PATH=$PATH:~/.local/bin

Extra Packages

The apache-airflow PyPI basic package only installs what’s needed to get started. Subpackages can be installed depending on what will be useful in your environment. For instance, if you don’t need connectivity with Postgres, you won’t have to go through the trouble of installing the postgres-devel yum package, or whatever equivalent applies on the distribution you are using.

Behind the scenes, Airflow does conditional imports of operators that require these extra dependencies.

Here’s the list of the subpackages and what they enable:

subpackage

install command

enables

all

pip install 'apache-airflow[all]'

All Airflow features known to man

all_dbs

pip install 'apache-airflow[all_dbs]'

All databases integrations

async

pip install 'apache-airflow[async]'

Async worker classes for Gunicorn

aws

pip install 'apache-airflow[aws]'

Amazon Web Services

azure

pip install 'apache-airflow[azure]'

Microsoft Azure

celery

pip install 'apache-airflow[celery]'

CeleryExecutor

cloudant

pip install 'apache-airflow[cloudant]'

Cloudant hook

crypto

pip install 'apache-airflow[crypto]'

Encrypt connection passwords in metadata db

devel

pip install 'apache-airflow[devel]'

Minimum dev tools requirements

devel_hadoop

pip install 'apache-airflow[devel_hadoop]'

Airflow + dependencies on the Hadoop stack

druid

pip install 'apache-airflow[druid]'

Druid related operators & hooks

gcp

pip install 'apache-airflow[gcp]'

Google Cloud Platform

github_enterprise

pip install 'apache-airflow[github_enterprise]'

GitHub Enterprise auth backend

google_auth

pip install 'apache-airflow[google_auth]'

Google auth backend

hashicorp

pip install 'apache-airflow[hashicorp]'

Hashicorp Services (Vault)

hdfs

pip install 'apache-airflow[hdfs]'

HDFS hooks and operators

hive

pip install 'apache-airflow[hive]'

All Hive related operators

jdbc

pip install 'apache-airflow[jdbc]'

JDBC hooks and operators

kerberos

pip install 'apache-airflow[kerberos]'

Kerberos integration for Kerberized Hadoop

kubernetes

pip install 'apache-airflow[kubernetes]'

Kubernetes Executor and operator

ldap

pip install 'apache-airflow[ldap]'

LDAP authentication for users

mssql

pip install 'apache-airflow[mssql]'

Microsoft SQL Server operators and hook, support as an Airflow backend

mysql

pip install 'apache-airflow[mysql]'

MySQL operators and hook, support as an Airflow backend. The version of MySQL server has to be 5.6.4+. The exact version upper bound depends on version of mysqlclient package. For example, mysqlclient 1.3.12 can only be used with MySQL server 5.6.4 through 5.7.

oracle

pip install 'apache-airflow[oracle]'

Oracle hooks and operators

password

pip install 'apache-airflow[password]'

Password authentication for users

postgres

pip install 'apache-airflow[postgres]'

PostgreSQL operators and hook, support as an Airflow backend

presto

pip install 'apache-airflow[presto]'

All Presto related operators & hooks

qds

pip install 'apache-airflow[qds]'

Enable QDS (Qubole Data Service) support

rabbitmq

pip install 'apache-airflow[rabbitmq]'

RabbitMQ support as a Celery backend

redis

pip install 'apache-airflow[redis]'

Redis hooks and sensors

samba

pip install apache-airflow[samba]'

airflow.operators.hive_to_samba_operator.Hive2SambaOperator

slack

pip install 'apache-airflow[slack']

airflow.operators.slack_operator.SlackAPIOperator

ssh

pip install 'apache-airflow[ssh]'

SSH hooks and Operator

vertica

pip install 'apache-airflow[vertica]'

Vertica hook support as an Airflow backend

Initializing Airflow Database

Airflow requires a database to be initialized before you can run tasks. If you’re just experimenting and learning Airflow, you can stick with the default SQLite option. If you don’t want to use SQLite, then take a look at Initializing a Database Backend to setup a different database.

After configuration, you’ll need to initialize the database before you can run tasks:

airflow initdb

Was this entry helpful?