airflow.models.dataset

Module Contents

Classes

DatasetModel

A table to store datasets.

DagScheduleDatasetReference

References from a DAG to a dataset of which it is a consumer.

TaskOutletDatasetReference

References from a task to a dataset that it updates / produces.

DatasetDagRunQueue

Model for storing dataset events that need processing.

DatasetEvent

A table to store datasets events.

Attributes

association_table

class airflow.models.dataset.DatasetModel(uri, **kwargs)[source]

Bases: airflow.models.base.Base

A table to store datasets.

Parameters
  • uri (str) – a string that uniquely identifies the dataset

  • extra – JSON field for arbitrary extra info

id[source]
uri[source]
extra[source]
created_at[source]
updated_at[source]
is_orphaned[source]
consuming_dags[source]
producing_tasks[source]
__tablename__ = 'dataset'[source]
__table_args__ = ()[source]
classmethod from_public(obj)[source]
__eq__(other)[source]
__hash__()[source]
__repr__()[source]
class airflow.models.dataset.DagScheduleDatasetReference[source]

Bases: airflow.models.base.Base

References from a DAG to a dataset of which it is a consumer.

dataset_id[source]
dag_id[source]
created_at[source]
updated_at[source]
dataset[source]
queue_records[source]
__tablename__ = 'dag_schedule_dataset_reference'[source]
__table_args__ = ()[source]
__eq__(other)[source]
__hash__()[source]
__repr__()[source]
class airflow.models.dataset.TaskOutletDatasetReference[source]

Bases: airflow.models.base.Base

References from a task to a dataset that it updates / produces.

dataset_id[source]
dag_id[source]
task_id[source]
created_at[source]
updated_at[source]
dataset[source]
__tablename__ = 'task_outlet_dataset_reference'[source]
__table_args__ = ()[source]
__eq__(other)[source]
__hash__()[source]
__repr__()[source]
class airflow.models.dataset.DatasetDagRunQueue[source]

Bases: airflow.models.base.Base

Model for storing dataset events that need processing.

dataset_id[source]
target_dag_id[source]
created_at[source]
__tablename__ = 'dataset_dag_run_queue'[source]
__table_args__ = ()[source]
__eq__(other)[source]
__hash__()[source]
__repr__()[source]
airflow.models.dataset.association_table[source]
class airflow.models.dataset.DatasetEvent[source]

Bases: airflow.models.base.Base

A table to store datasets events.

Parameters
  • dataset_id – reference to DatasetModel record

  • extra – JSON field for arbitrary extra info

  • source_task_id – the task_id of the TI which updated the dataset

  • source_dag_id – the dag_id of the TI which updated the dataset

  • source_run_id – the run_id of the TI which updated the dataset

  • source_map_index – the map_index of the TI which updated the dataset

  • timestamp – the time the event was logged

We use relationships instead of foreign keys so that dataset events are not deleted even if the foreign key object is.

property uri[source]
id[source]
dataset_id[source]
extra[source]
source_task_id[source]
source_dag_id[source]
source_run_id[source]
source_map_index[source]
timestamp[source]
__tablename__ = 'dataset_event'[source]
__table_args__ = ()[source]
created_dagruns[source]
source_task_instance[source]
source_dag_run[source]
dataset[source]
__repr__()[source]

Was this entry helpful?