airflow.providers.informatica.lineage.validation

Shared lineage validation logic for the Informatica provider.

This module provides functions that resolve inlet/outlet URIs and table references against Informatica EDC. The resolution logic is intentionally separated from the listener so it can also be used as an operator pre_execute hook — where a raised exception does fail the task.

Usage as pre_execute:

from airflow.providers.informatica.lineage.validation import validate_informatica_lineage

SQLExecuteQueryOperator(
    task_id="my_task",
    conn_id="my_conn",
    sql="INSERT INTO dst SELECT * FROM src",
    pre_execute=validate_informatica_lineage,
)

When passed as pre_execute, any InformaticaLineageResolutionError propagates through the task runner and fails the task before execute() is called.

The listener calls the same functions but wraps them in a try/except so resolution errors are logged as warnings instead of failing the task (listener exceptions are swallowed by the Airflow task runner).

Attributes

log

Exceptions

InformaticaLineageResolutionError

Raised when an EDC object cannot be resolved for a lineage URI.

Functions

pop_pre_execute_result(key)

Remove and return a cached pre-execute result, or None if absent.

resolve_uri_to_object_id(hook, uri)

Resolve an EDC lineage URI to an Informatica catalog object ID.

resolve_uris(hook, items, role, task_id)

Resolve URI items to (uri, edc_object_id) tuples.

resolve_table_refs(hook, refs, task_id)

Resolve TableRef objects to (table_label, edc_object_id) tuples.

resolve_informatica_lineage(task, task_id[, hook])

Resolve all inlet/outlet URIs or auto-detected tables for task.

validate_informatica_lineage(context)

Pre-execute hook that validates Informatica lineage before task execution.

Module Contents

airflow.providers.informatica.lineage.validation.log[source]
airflow.providers.informatica.lineage.validation.pop_pre_execute_result(key)[source]

Remove and return a cached pre-execute result, or None if absent.

exception airflow.providers.informatica.lineage.validation.InformaticaLineageResolutionError[source]

Bases: RuntimeError

Raised when an EDC object cannot be resolved for a lineage URI.

airflow.providers.informatica.lineage.validation.resolve_uri_to_object_id(hook, uri)[source]

Resolve an EDC lineage URI to an Informatica catalog object ID.

Manual lineage entries are treated as concrete object identifiers/uris. They are validated directly via get_object instead of being reparsed and looked up again with find_object_id.

Raises:

InformaticaLineageResolutionError – When the URI cannot be resolved.

airflow.providers.informatica.lineage.validation.resolve_uris(hook, items, role, task_id)[source]

Resolve URI items to (uri, edc_object_id) tuples.

Raises:

InformaticaLineageResolutionError – On the first URI that cannot be resolved.

airflow.providers.informatica.lineage.validation.resolve_table_refs(hook, refs, task_id)[source]

Resolve TableRef objects to (table_label, edc_object_id) tuples.

Calls find_object_id which searches EDC by table name and narrows by schema/database when multiple results are returned.

Raises:

InformaticaLineageResolutionError – On the first unresolvable table.

airflow.providers.informatica.lineage.validation.resolve_informatica_lineage(task, task_id, hook=None)[source]

Resolve all inlet/outlet URIs or auto-detected tables for task.

Returns:

(valid_inlets, valid_outlets) — each a list of (uri_or_label, edc_object_id) tuples.

Raises:

InformaticaLineageResolutionError – When any URI or table cannot be resolved in the Informatica catalog.

Return type:

tuple[list[tuple[str, str]], list[tuple[str, str]]]

airflow.providers.informatica.lineage.validation.validate_informatica_lineage(context)[source]

Pre-execute hook that validates Informatica lineage before task execution.

Pass this function as pre_execute on any operator to fail the task when inlet/outlet URIs cannot be resolved in the Informatica catalog:

SQLExecuteQueryOperator(
    task_id="my_task",
    conn_id="my_conn",
    sql="INSERT INTO dst SELECT * FROM src",
    pre_execute=validate_informatica_lineage,
)

Resolved pairs are cached so the listener’s on_task_instance_success can create lineage links without making a second round of EDC calls.

Raises:

InformaticaLineageResolutionError – When any URI or table cannot be resolved.

Was this entry helpful?