Scheduler¶
The Airflow scheduler monitors all tasks and all DAGs and triggers the Task instances whose dependencies have been met. Behind the scenes, it spins up a subprocess, which monitors and stays in sync with a folder for all DAG objects it may contain, and periodically (every minute or so) collects DAG parsing results and inspects active tasks to see whether they can be triggered.
The Airflow scheduler is designed to run as a persistent service in an
Airflow production environment. To kick it off, all you need to do is
execute the airflow scheduler
command. It uses the configuration specified in
airflow.cfg
.
The scheduler uses the configured Executor to run tasks that are ready.
To start a scheduler, simply run the command:
airflow scheduler
Your DAGs will start executing once the scheduler is running successfully.
Note
The first DAG Run is created based on the minimum start_date
for the tasks in your DAG.
Subsequent DAG Runs are created by the scheduler process, based on your DAG’s schedule_interval
,
sequentially.
The scheduler won’t trigger your tasks until the period it covers has ended e.g., A job with schedule_interval
set as @daily
runs after the day
has ended. This technique makes sure that whatever data is required for that period is fully available before the dag is executed.
In the UI, it appears as if Airflow is running your tasks a day late
Note
If you run a DAG on a schedule_interval
of one day, the run with execution_date
2019-11-21
triggers soon after 2019-11-21T23:59
.
Let’s Repeat That, the scheduler runs your job one schedule_interval
AFTER the start date, at the END of the period.
You should refer DAG Runs for details on scheduling a DAG.