airflow.providers.informatica.lineage.validation¶
Shared lineage validation logic for the Informatica provider.
This module provides functions that resolve inlet/outlet URIs and table
references against Informatica EDC. The resolution logic is intentionally
separated from the listener so it can also be used as an operator
pre_execute hook — where a raised exception does fail the task.
Usage as pre_execute:
from airflow.providers.informatica.lineage.validation import validate_informatica_lineage
SQLExecuteQueryOperator(
task_id="my_task",
conn_id="my_conn",
sql="INSERT INTO dst SELECT * FROM src",
pre_execute=validate_informatica_lineage,
)
When passed as pre_execute, any
InformaticaLineageResolutionError propagates through the task
runner and fails the task before execute() is called.
The listener calls the same functions but wraps them in a try/except
so resolution errors are logged as warnings instead of failing the task
(listener exceptions are swallowed by the Airflow task runner).
Attributes¶
Exceptions¶
Raised when an EDC object cannot be resolved for a lineage URI. |
Functions¶
Remove and return a cached pre-execute result, or |
|
|
Resolve an EDC lineage URI to an Informatica catalog object ID. |
|
Resolve URI items to |
|
Resolve TableRef objects to |
|
Resolve all inlet/outlet URIs or auto-detected tables for task. |
|
Pre-execute hook that validates Informatica lineage before task execution. |
Module Contents¶
- airflow.providers.informatica.lineage.validation.pop_pre_execute_result(key)[source]¶
Remove and return a cached pre-execute result, or
Noneif absent.
- exception airflow.providers.informatica.lineage.validation.InformaticaLineageResolutionError[source]¶
Bases:
RuntimeErrorRaised when an EDC object cannot be resolved for a lineage URI.
- airflow.providers.informatica.lineage.validation.resolve_uri_to_object_id(hook, uri)[source]¶
Resolve an EDC lineage URI to an Informatica catalog object ID.
Manual lineage entries are treated as concrete object identifiers/uris. They are validated directly via
get_objectinstead of being reparsed and looked up again withfind_object_id.- Raises:
InformaticaLineageResolutionError – When the URI cannot be resolved.
- airflow.providers.informatica.lineage.validation.resolve_uris(hook, items, role, task_id)[source]¶
Resolve URI items to
(uri, edc_object_id)tuples.- Raises:
InformaticaLineageResolutionError – On the first URI that cannot be resolved.
- airflow.providers.informatica.lineage.validation.resolve_table_refs(hook, refs, task_id)[source]¶
Resolve TableRef objects to
(table_label, edc_object_id)tuples.Calls
find_object_idwhich searches EDC by table name and narrows by schema/database when multiple results are returned.- Raises:
InformaticaLineageResolutionError – On the first unresolvable table.
- airflow.providers.informatica.lineage.validation.resolve_informatica_lineage(task, task_id, hook=None)[source]¶
Resolve all inlet/outlet URIs or auto-detected tables for task.
- airflow.providers.informatica.lineage.validation.validate_informatica_lineage(context)[source]¶
Pre-execute hook that validates Informatica lineage before task execution.
Pass this function as
pre_executeon any operator to fail the task when inlet/outlet URIs cannot be resolved in the Informatica catalog:SQLExecuteQueryOperator( task_id="my_task", conn_id="my_conn", sql="INSERT INTO dst SELECT * FROM src", pre_execute=validate_informatica_lineage, )
Resolved pairs are cached so the listener’s
on_task_instance_successcan create lineage links without making a second round of EDC calls.- Raises:
InformaticaLineageResolutionError – When any URI or table cannot be resolved.