airflow.providers.google.cloud.openlineage.utils
¶
Module Contents¶
Classes¶
Facet that represents relevant statistics of bigquery run. |
|
Represents errors that can happen during execution of BigqueryExtractor. |
Functions¶
|
Get facets from BigQuery table object. |
|
Get column lineage facet. |
|
Get object from nested structure of objects, where it's not guaranteed that all keys in the nested structure exist. |
Attributes¶
- airflow.providers.google.cloud.openlineage.utils.get_facets_from_bq_table(table)[source]¶
Get facets from BigQuery table object.
- airflow.providers.google.cloud.openlineage.utils.get_identity_column_lineage_facet(field_names, input_datasets)[source]¶
Get column lineage facet.
Simple lineage will be created, where each source column corresponds to single destination column in each input dataset and there are no transformations made.
- class airflow.providers.google.cloud.openlineage.utils.BigQueryJobRunFacet[source]¶
Bases:
airflow.providers.common.compat.openlineage.facet.RunFacet
Facet that represents relevant statistics of bigquery run.
This facet is used to provide statistics about bigquery run.
- Parameters
cached – BigQuery caches query results. Rest of the statistics will not be provided for cached queries.
billedBytes – How many bytes BigQuery bills for.
properties – Full property tree of BigQUery run.
- class airflow.providers.google.cloud.openlineage.utils.BigQueryErrorRunFacet[source]¶
Bases:
airflow.providers.common.compat.openlineage.facet.RunFacet
Represents errors that can happen during execution of BigqueryExtractor.
- Parameters
clientError – represents errors originating in bigquery client
parserError – represents errors that happened during parsing SQL provided to bigquery
- airflow.providers.google.cloud.openlineage.utils.get_from_nullable_chain(source, chain)[source]¶
Get object from nested structure of objects, where it’s not guaranteed that all keys in the nested structure exist.
Intended to replace chain of dict.get() statements.
Example usage:
if ( not job._properties.get("statistics") or not job._properties.get("statistics").get("query") or not job._properties.get("statistics").get("query").get("referencedTables") ): return None result = job._properties.get("statistics").get("query").get("referencedTables")
becomes:
result = get_from_nullable_chain(properties, ["statistics", "query", "queryPlan"]) if not result: return None