Timetables

A DAG’s scheduling strategy is determined by its internal “timetable”. This timetable can be created by specifying the DAG’s schedule_interval argument, as described in DAG Run, or by passing a timetable argument directly. The timetable also dictates the data interval and the logical time of each run created for the DAG.

Cron expressions and timedeltas are still supported (using CronDataIntervalTimetable and DeltaDataIntervalTimetable under the hood respectively), however, there are situations where they cannot properly express the schedule. Some examples are:

  • Data intervals with “holes” between. (Instead of continuous, as both the cron expression and timedelta schedules represent.)

  • Run tasks at different times each day. For example, an astronomer may find it useful to run a task at dawn to process data collected from the previous night-time period.

  • Schedules not following the Gregorian calendar. For example, create a run for each month in the Traditional Chinese Calendar. This is conceptually similar to the sunset case above, but for a different time scale.

  • Rolling windows, or overlapping data intervals. For example, one may want to have a run each day, but make each run cover the period of the previous seven days. It is possible to “hack” this with a cron expression, but a custom data interval would be a more natural representation.

As such, Airflow allows for custom timetables to be written in plugins and used by DAGs. An example demonstrating a custom timetable can be found in the Customizing DAG Scheduling with Timetables how-to guide.

Built In Timetables

Airflow comes with several common timetables built in to cover the most common use cases. Additional timetables may be available in plugins.

CronDataIntervalTimetable

Set schedule based on a cron expression. Can be selected by providing a string that is a valid cron expression to the schedule_interval parameter of a DAG as described in the DAGs documentation.

@dag(schedule_interval="0 1 * * 3")  # At 01:00 on Wednesday.
def example_dag():
    pass

DeltaDataIntervalTimetable

Schedules data intervals with a time delta. Can be selected by providing a datetime.timedelta or dateutil.relativedelta.relativedelta to the schedule_interval parameter of a DAG.

@dag(schedule_interval=datetime.timedelta(minutes=30))
def example_dag():
    pass

EventsTimetable

Simply pass a list of datetimes for the DAG to run after. Useful for timing based on sporting events, planned communication campaigns, and other schedules that are arbitrary and irregular but predictable.

The list of events must be finite and of reasonable size as it must be loaded every time the DAG is parsed. Optionally, the restrict_to_events flag can be used to force manual runs of the DAG to use the time of the most recent (or very first) event for the data interval, otherwise manual runs will run with a data_interval_start and data_interval_end equal to the time at which the manual run was begun. You can also name the set of events using the description parameter, which will be displayed in the Airflow UI.

from airflow.timetables.events import EventsTimetable


@dag(
    timetable=EventsTimetable(
        event_dates=[
            pendulum.datetime(2022, 4, 5, 8, 27, tz="America/Chicago"),
            pendulum.datetime(2022, 4, 17, 8, 27, tz="America/Chicago"),
            pendulum.datetime(2022, 4, 22, 20, 50, tz="America/Chicago"),
        ],
        description="My Team's Baseball Games",
        restrict_to_events=False,
    ),
)
def example_dag():
    pass

Was this entry helpful?