Installation¶
Getting Airflow¶
Airflow is published as apache-airflow
package in PyPI. Installing it however might be sometimes tricky
because Airflow is a bit of both a library and application. Libraries usually keep their dependencies open and
applications usually pin them, but we should do neither and both at the same time. We decided to keep
our dependencies as open as possible (in setup.py
) so users can install different version of libraries
if needed. This means that from time to time plain pip install apache-airflow
will not work or will
produce unusable Airflow installation.
In order to have repeatable installation, however, starting from Airflow 1.10.10 and updated in
Airflow 1.10.12 we also keep a set of “known-to-be-working” constraint files in the
constraints-master
and constraints-1-10
orphan branches.
Those “known-to-be-working” constraints are per major/minor python version. You can use them as constraint
files when installing Airflow from PyPI. Note that you have to specify correct Airflow version
and python versions in the URL.
Prerequisites
On Debian based Linux OS:
sudo apt-get update sudo apt-get install build-essential
Installing just airflow
pip install \
apache-airflow==1.10.12 \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-1.10.12/constraints-3.7.txt"
Installing with extras (for example postgres, gcp)
pip install \
apache-airflow[postgres,gcp]==1.10.12 \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-1.10.12/constraints-3.7.txt"
You need certain system level requirements in order to install Airflow. Those are requirements that are known to be needed for Linux system (Tested on Ubuntu Buster LTS) :
sudo apt-get install -y --no-install-recommends \
freetds-bin \
krb5-user \
ldap-utils \
libffi6 \
libsasl2-2 \
libsasl2-modules \
libssl1.1 \
locales \
lsb-release \
sasl2-bin \
sqlite3 \
unixodbc
You also need database client packages (Postgres or MySQL) if you want to use those databases.
If the airflow
command is not getting recognized (can happen on Windows when using WSL), then
ensure that ~/.local/bin
is in your PATH
environment variable, and add it in if necessary:
PATH=$PATH:~/.local/bin
Extra Packages¶
The apache-airflow
PyPI basic package only installs what’s needed to get started.
Subpackages can be installed depending on what will be useful in your
environment. For instance, if you don’t need connectivity with Postgres,
you won’t have to go through the trouble of installing the postgres-devel
yum package, or whatever equivalent applies on the distribution you are using.
Behind the scenes, Airflow does conditional imports of operators that require these extra dependencies.
Here’s the list of the subpackages and what they enable:
subpackage |
install command |
enables |
---|---|---|
all |
|
All Airflow features known to man |
all_dbs |
|
All databases integrations |
async |
|
Async worker classes for Gunicorn |
aws |
|
Amazon Web Services |
azure |
|
Microsoft Azure |
celery |
|
CeleryExecutor |
cloudant |
|
Cloudant hook |
crypto |
|
Encrypt connection passwords in metadata db |
devel |
|
Minimum dev tools requirements |
devel_hadoop |
|
Airflow + dependencies on the Hadoop stack |
druid |
|
Druid related operators & hooks |
gcp |
|
Google Cloud Platform |
github_enterprise |
|
GitHub Enterprise auth backend |
google_auth |
|
Google auth backend |
hashicorp |
|
Hashicorp Services (Vault) |
hdfs |
|
HDFS hooks and operators |
hive |
|
All Hive related operators |
jdbc |
|
JDBC hooks and operators |
kerberos |
|
Kerberos integration for Kerberized Hadoop |
kubernetes |
|
Kubernetes Executor and operator |
ldap |
|
LDAP authentication for users |
mssql |
|
Microsoft SQL Server operators and hook, support as an Airflow backend |
mysql |
|
MySQL operators and hook, support as an Airflow
backend. The version of MySQL server has to be
5.6.4+. The exact version upper bound depends
on version of |
oracle |
|
Oracle hooks and operators |
password |
|
Password authentication for users |
postgres |
|
PostgreSQL operators and hook, support as an Airflow backend |
presto |
|
All Presto related operators & hooks |
qds |
|
Enable QDS (Qubole Data Service) support |
rabbitmq |
|
RabbitMQ support as a Celery backend |
redis |
|
Redis hooks and sensors |
samba |
|
|
slack |
|
|
ssh |
|
SSH hooks and Operator |
vertica |
|
Vertica hook support as an Airflow backend |
Initializing Airflow Database¶
Airflow requires a database to be initialized before you can run tasks. If you’re just experimenting and learning Airflow, you can stick with the default SQLite option. If you don’t want to use SQLite, then take a look at Initializing a Database Backend to setup a different database.
After configuration, you’ll need to initialize the database before you can run tasks:
airflow initdb