Initializing a Database Backend¶
If you want to take a real test drive of Airflow, you should consider setting up a real database backend and switching to the LocalExecutor.
Airflow was built to interact with its metadata using SqlAlchemy with MySQL, Postgres and SQLite as supported backends (SQLite is used primarily for development purpose).
See also
Scheduler HA Database Requirements if you plan on running more than one scheduler
Note
We rely on more strict ANSI SQL settings for MySQL in order to have
sane defaults. Make sure to have specified explicit_defaults_for_timestamp=1
in your my.cnf under [mysqld]
Note
If you decide to use MySQL, we recommend using the mysqlclient
driver and specifying it in your SqlAlchemy connection string. (I.e.,
mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname>
.)
But we also support the mysql-connector-python
driver (I.e.,
mysql+mysqlconnector://<user>:<password>@<host>[:<port>]/<dbname>
.) which lets you connect through SSL
without any cert options provided. However if you want to use other drivers visit the
SqlAlchemy docs for more information regarding download
and setup of the SqlAlchemy connection.
Note
If you decide to use Postgres, we recommend using the psycopg2
driver and specifying it in your SqlAlchemy connection string. (I.e.,
postgresql+psycopg2://<user>:<password>@<host>/<db>
.)
Also note that since SqlAlchemy does not expose a way to target a
specific schema in the Postgres connection URI, you may
want to set a default schema for your role with a
command similar to ALTER ROLE username SET search_path = airflow, foobar;
Setup your database to host Airflow¶
Create a database called airflow
and a database user that Airflow
will use to access this database.
Example, for MySQL:
CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci;
CREATE USER 'airflow' IDENTIFIED BY 'airflow';
GRANT ALL PRIVILEGES ON airflow.* TO 'airflow';
Example, for Postgres:
CREATE DATABASE airflow;
CREATE USER airflow WITH PASSWORD 'airflow';
GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;
You may need to update your Postgres pg_hba.conf
to add the
airflow
user to the database access control list; and to reload
the database configuration to load your change. See
The pg_hba.conf File
in the Postgres documentation to learn more.
Configure Airflow's database connection string¶
Once you have setup your database to host Airflow, you'll need to alter the
SqlAlchemy connection string located in sql_alchemy_conn
option in [core]
section in your configuration file
$AIRFLOW_HOME/airflow.cfg
.
You can also define connection URI using AIRFLOW__CORE__SQL_ALCHEMY_CONN
environment variable.
Configure a worker that supports parallelism¶
You should then also change the executor
option in the [core]
option to use LocalExecutor
, an executor that can parallelize task instances locally.
Initialize the database¶
# initialize the database
airflow db init