airflow.models.dataset
¶
Module Contents¶
Classes¶
A table to store datasets. |
|
References from a DAG to a dataset of which it is a consumer. |
|
References from a task to a dataset that it updates / produces. |
|
Model for storing dataset events that need processing. |
|
A table to store datasets events. |
Attributes¶
- class airflow.models.dataset.DatasetModel(uri, **kwargs)[source]¶
Bases:
airflow.models.base.Base
A table to store datasets.
- Parameters
uri (str) – a string that uniquely identifies the dataset
extra – JSON field for arbitrary extra info
- class airflow.models.dataset.DagScheduleDatasetReference[source]¶
Bases:
airflow.models.base.Base
References from a DAG to a dataset of which it is a consumer.
- class airflow.models.dataset.TaskOutletDatasetReference[source]¶
Bases:
airflow.models.base.Base
References from a task to a dataset that it updates / produces.
- class airflow.models.dataset.DatasetDagRunQueue[source]¶
Bases:
airflow.models.base.Base
Model for storing dataset events that need processing.
- class airflow.models.dataset.DatasetEvent[source]¶
Bases:
airflow.models.base.Base
A table to store datasets events.
- Parameters
dataset_id – reference to DatasetModel record
extra – JSON field for arbitrary extra info
source_task_id – the task_id of the TI which updated the dataset
source_dag_id – the dag_id of the TI which updated the dataset
source_run_id – the run_id of the TI which updated the dataset
source_map_index – the map_index of the TI which updated the dataset
timestamp – the time the event was logged
We use relationships instead of foreign keys so that dataset events are not deleted even if the foreign key object is.