Scheduler

The Airflow scheduler monitors all tasks and all DAGs and triggers the Task instances whose dependencies have been met. Behind the scenes, it spins up a subprocess, which monitors and stays in sync with a folder for all DAG objects it may contain, and periodically (every minute or so) collects DAG parsing results and inspects active tasks to see whether they can be triggered.

The Airflow scheduler is designed to run as a persistent service in an Airflow production environment. To kick it off, all you need to do is execute the airflow scheduler command. It uses the configuration specified in airflow.cfg.

The scheduler uses the configured Executor to run tasks that are ready.

To start a scheduler, simply run the command:

airflow scheduler

Your DAGs will start executing once the scheduler is running successfully.

Note

The first DAG Run is created based on the minimum start_date for the tasks in your DAG. Subsequent DAG Runs are created by the scheduler process, based on your DAG’s schedule_interval, sequentially.

The scheduler won’t trigger your tasks until the period it covers has ended e.g., A job with schedule_interval set as @daily runs after the day has ended. This technique makes sure that whatever data is required for that period is fully available before the dag is executed. In the UI, it appears as if Airflow is running your tasks a day late

Note

If you run a DAG on a schedule_interval of one day, the run with execution_date 2019-11-21 triggers soon after 2019-11-21T23:59.

Let’s Repeat That, the scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.

You should refer DAG Runs for details on scheduling a DAG.

Was this entry helpful?