Using OpenLineage integration

OpenLineage is an open framework for data lineage collection and analysis. At its core is an extensible specification that systems can use to interoperate with lineage metadata. Check out OpenLineage docs.

No change to user DAG files is required to use OpenLineage. Basic configuration is needed so that OpenLineage knows where to send events.



OpenLineage Provider offers a diverse range of data transport options (http, kafka, file etc.), including the flexibility to create a custom solution. Configuration can be managed through several approaches and there is an extensive array of settings available for users to fine-tune and enhance their use of OpenLineage. For a comprehensive explanation of these features, please refer to the subsequent sections of this document.

This example is a basic demonstration of OpenLineage setup.

  1. Install provider package or add it to requirements.txt file.

    pip install apache-airflow-providers-openlineage
  2. Provide a Transport configuration so that OpenLineage knows where to send the events. Within airflow.cfg file

    transport = {"type": "http", "url": "", "endpoint": "api/v1/lineage"}

    or with AIRFLOW__OPENLINEAGE__TRANSPORT environment variable

    AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "http", "url": "", "endpoint": "api/v1/lineage"}'
  3. That’s it ! OpenLineage events should be sent to the configured backend when DAGs are run.


When enabled and configured, the integration requires no further action from the user. It will automatically:

  • Collect task input / output metadata (source, schema, etc.).

  • Collect task run-level metadata (execution time, state, parameters, etc.)

  • Collect task job-level metadata (owners, type, description, etc.)

  • Collect task-specific metadata (bigquery job id, python source code, etc.) - depending on the Operator

All this data will be sent as OpenLineage events to the configured backend as described in Job Hierarchy.

Transport setup

Primary, and recommended method of configuring OpenLineage Airflow Provider is Airflow configuration (airflow.cfg file). All possible configuration options, with example values, can be found in the configuration section.

At minimum, one thing that needs to be set up in every case is Transport - where do you wish for your events to end up - for example Marquez.

Transport as JSON string

The transport option in Airflow configuration is used for that purpose.

transport = {"type": "http", "url": "", "endpoint": "api/v1/lineage"}

AIRFLOW__OPENLINEAGE__TRANSPORT environment variable is an equivalent.

AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "http", "url": "", "endpoint": "api/v1/lineage"}'

If you want to look at OpenLineage events without sending them anywhere, you can set up ConsoleTransport - the events will end up in task logs.

transport = {"type": "console"}


For full list of built-in transport types, specific transport’s options or instructions on how to implement your custom transport, refer to Python client documentation.

Transport as config file

You can also configure OpenLineage Transport using a YAML file (f.e. openlineage.yml). Provide the path to the YAML file as config_path option in Airflow configuration.

config_path = '/path/to/openlineage.yml'

AIRFLOW__OPENLINEAGE__CONFIG_PATH environment variable is an equivalent.


Example content of config YAML file:

  type: http
  url: https://backend:5000
  endpoint: events/receive
    type: api_key
    apiKey: f048521b-dfe8-47cd-9c65-0cb07d57591e


Detailed description of that configuration method, together with example config files, can be found in Python client documentation.

Configuration precedence

As there are multiple possible ways of configuring OpenLineage, it’s important to keep in mind the precedence of different configurations. OpenLineage Airflow Provider looks for the configuration in the following order:

  1. Check config_path in airflow.cfg under openlineage section (or AIRFLOW__OPENLINEAGE__CONFIG_PATH environment variable)

  2. Check transport in airflow.cfg under openlineage section (or AIRFLOW__OPENLINEAGE__TRANSPORT environment variable)

  3. If all the above options are missing, the OpenLineage Python client used underneath looks for configuration in the order described in this documentation. Please note that using Airflow configuration is encouraged and is the only future proof solution.

Backwards compatibility


Below variables should not be used and can be removed in the future. Consider using Airflow configuration (described above) for a future proof solution.

For backwards compatibility with openlineage-airflow package, some environment variables are still available:






  • OPENLINEAGE_URL can be used to set up simple http transport. This method has some limitations and may require using other environment variables to achieve desired output. See docs.

Additional Options


It’s very useful to set up OpenLineage namespace for this particular instance. That way, if you use multiple OpenLineage producers, events coming from them will be logically separated. If not set, it’s using default namespace. Provide the name of the namespace as namespace option in Airflow configuration.

transport = {"type": "http", "url": "", "endpoint": "api/v1/lineage"}
namespace = 'my-team-airflow-instance'

AIRFLOW__OPENLINEAGE__NAMESPACE environment variable is an equivalent.



You can disable sending OpenLineage events without uninstalling OpenLineage provider by setting disabled option to true in Airflow configuration.

transport = {"type": "http", "url": "", "endpoint": "api/v1/lineage"}
disabled = true

AIRFLOW__OPENLINEAGE__DISABLED environment variable is an equivalent.


Disable source code

Several Operators (f.e. Python, Bash) will by default include their source code in their OpenLineage events. To prevent that, set disable_source_code option to true in Airflow configuration.

transport = {"type": "http", "url": "", "endpoint": "api/v1/lineage"}
disable_source_code = true

AIRFLOW__OPENLINEAGE__DISABLE_SOURCE_CODE environment variable is an equivalent.


Disabled for Operators

You can easily exclude some Operators from emitting OpenLineage events by passing a string of semicolon separated full import paths of Airflow Operators to disable as disabled_for_operators field in Airflow configuration.

transport = {"type": "http", "url": "", "endpoint": "api/v1/lineage"}
disabled_for_operators = 'airflow.operators.bash.BashOperator;airflow.operators.python.PythonOperator'

AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS environment variable is an equivalent.


Custom Extractors

If you use custom Extractors feature, register the extractors by passing a string of semicolon separated Airflow Operators full import paths to extractors option in Airflow configuration.

transport = {"type": "http", "url": "", "endpoint": "api/v1/lineage"}
extractors =;

AIRFLOW__OPENLINEAGE__EXTRACTORS environment variable is an equivalent.


Enabling OpenLineage on DAG/task level

One can selectively enable OpenLineage for specific DAGs and tasks by using the selective_enable policy. To enable this policy, set the selective_enable option to True in the [openlineage] section of your Airflow configuration file:

selective_enable = True

While selective_enable enables selective control, the disabled option still has precedence. If you set disabled to True in the configuration, OpenLineage will be disabled for all DAGs and tasks regardless of the selective_enable setting.

Once the selective_enable policy is enabled, you can choose to enable OpenLineage for individual DAGs and tasks using the enable_lineage and disable_lineage functions.

  1. Enabling Lineage on a DAG:

from airflow.providers.openlineage.utils.selective_enable import disable_lineage, enable_lineage

with enable_lineage(DAG(...)):
    # Tasks within this DAG will have lineage tracking enabled

  1. Enabling Lineage on a Task:

While enabling lineage on a DAG implicitly enables it for all tasks within that DAG, you can still selectively disable it for specific tasks:

from airflow.providers.openlineage.utils.selective_enable import disable_lineage, enable_lineage

with DAG(...) as dag:
    t1 = MyOperator(...)
    t2 = AnotherOperator(...)

# Enable lineage for the entire DAG

# Disable lineage for task t1

Enabling lineage on the DAG level automatically enables it for all tasks within that DAG unless explicitly disabled per task.

Enabling lineage on the task level implicitly enables lineage on its DAG. This is because each emitting task sends a ParentRunFacet, which requires the DAG-level lineage to be enabled in some OpenLineage backend systems. Disabling DAG-level lineage while enabling task-level lineage might cause errors or inconsistencies.


See Local troubleshooting for details on how to troubleshoot OpenLineage locally.

Adding support for custom Operators

If you want to add OpenLineage coverage for particular Operator, take a look at Implementing OpenLineage in Operators

Where can I learn more?


You can reach out to us on slack and leave us feedback!

How to contribute

We welcome your contributions! OpenLineage is an Open Source project under active development, and we’d love your help!

Sounds fun? Check out our new contributor guide to get started.

Was this entry helpful?