Example DAG for demonstrating the behavior of the Datasets feature in Airflow, including conditional and dataset expression-based scheduling.

Notes on usage:

Turn on all the DAGs.

dataset_produces_1 is scheduled to run daily. Once it completes, it triggers several DAGs due to its dataset being updated. dataset_consumes_1 is triggered immediately, as it depends solely on the dataset produced by dataset_produces_1. consume_1_or_2_with_dataset_expressions will also be triggered, as its condition of either dataset_produces_1 or dataset_produces_2 being updated is satisfied with dataset_produces_1.

dataset_consumes_1_and_2 will not be triggered after dataset_produces_1 runs because it requires the dataset from dataset_produces_2, which has no schedule and must be manually triggered.

After manually triggering dataset_produces_2, several DAGs will be affected. dataset_consumes_1_and_2 should run because both its dataset dependencies are now met. consume_1_and_2_with_dataset_expressions will be triggered, as it requires both dataset_produces_1 and dataset_produces_2 datasets to be updated. consume_1_or_2_with_dataset_expressions will be triggered again, since it’s conditionally set to run when either dataset is updated.

consume_1_or_both_2_and_3_with_dataset_expressions demonstrates complex dataset dependency logic. This DAG triggers if dataset_produces_1 is updated or if both dataset_produces_2 and dag3_dataset are updated. This example highlights the capability to combine updates from multiple datasets with logical expressions for advanced scheduling.

conditional_dataset_and_time_based_timetable illustrates the integration of time-based scheduling with dataset dependencies. This DAG is configured to execute either when both dataset_produces_1 and dataset_produces_2 datasets have been updated or according to a specific cron schedule, showcasing Airflow’s versatility in handling mixed triggers for dataset and time-based scheduling.

The DAGs dataset_consumes_1_never_scheduled and dataset_consumes_unknown_never_scheduled will not run automatically as they depend on datasets that do not get updated or are not produced by any scheduled tasks.

Module Contents


Was this entry helpful?