Using OpenLineage integration

Usage

No change to user DAG files is required to use OpenLineage. However, it needs to be configured. Primary, and recommended method of configuring OpenLineage Airflow Provider is Airflow configuration.

At minimum, one thing that needs to be set up in every case is Transport - where do you wish for your events to end up - for example Marquez. The transport field in configuration is used for that purpose.

[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'

If you want to look at OpenLineage events without sending them anywhere, you can set up ConsoleTransport - the events will end up in task logs.

[openlineage]
transport = '{"type": "console"}'

You can also configure OpenLineage transport using openlineage.yml file. Detailed description of that configuration method is in OpenLineage python docs. To do that, you also need to set up path to the file in Airflow config, or point OPENLINEAGE_CONFIG variable to it:

[openlineage]
config_path = '/path/to/openlineage.yml'

Lastly, you can set up http transport using OPENLINEAGE_URL environment variable, passing it the URL target of the OpenLineage consumer.

It’s also very useful to set up OpenLineage namespace for this particular instance. If not set, it’s using default namespace. That way, if you use multiple OpenLineage producers, events coming from them will be logically separated.

[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'
namespace = 'my-team-airflow-instance`

Additional Options

You can disable sending OpenLineage events without uninstalling OpenLineage provider by setting disabled to true or setting OPENLINEAGE_DISABLED environment variable to True.

[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'
disabled = true

Several operators - for example Python, Bash - will by default include their source code in their OpenLineage events. To prevent that, set disable_source_code to true.

[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'
disable_source_code = true

If you used OpenLineage previously, and use Custom Extractors feature, you can also use them in OpenLineage provider. Register the extractors using extractors config option.

[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'
extractors = full.path.to.ExtractorClass;full.path.to.AnotherExtractorClass

Other

If you want to add OpenLineage coverage for particular operator, take a look at

Implementing OpenLineage in Operators

For more explanation visit OpenLineage docs

Was this entry helpful?