airflow.providers.google.cloud.openlineage.utils

Module Contents

Classes

BigQueryJobRunFacet

Facet that represents relevant statistics of bigquery run.

BigQueryErrorRunFacet

Represents errors that can happen during execution of BigqueryExtractor.

Functions

get_facets_from_bq_table(table)

Get facets from BigQuery table object.

get_identity_column_lineage_facet(field_names, ...)

Get column lineage facet.

get_from_nullable_chain(source, chain)

Get object from nested structure of objects, where it's not guaranteed that all keys in the nested structure exist.

Attributes

BIGQUERY_NAMESPACE

BIGQUERY_URI

airflow.providers.google.cloud.openlineage.utils.BIGQUERY_NAMESPACE = 'bigquery'[source]
airflow.providers.google.cloud.openlineage.utils.BIGQUERY_URI = 'bigquery'[source]
airflow.providers.google.cloud.openlineage.utils.get_facets_from_bq_table(table)[source]

Get facets from BigQuery table object.

airflow.providers.google.cloud.openlineage.utils.get_identity_column_lineage_facet(field_names, input_datasets)[source]

Get column lineage facet.

Simple lineage will be created, where each source column corresponds to single destination column in each input dataset and there are no transformations made.

class airflow.providers.google.cloud.openlineage.utils.BigQueryJobRunFacet[source]

Bases: airflow.providers.common.compat.openlineage.facet.RunFacet

Facet that represents relevant statistics of bigquery run.

This facet is used to provide statistics about bigquery run.

Parameters
  • cached – BigQuery caches query results. Rest of the statistics will not be provided for cached queries.

  • billedBytes – How many bytes BigQuery bills for.

  • properties – Full property tree of BigQUery run.

cached: bool[source]
billedBytes: int | None[source]
properties: str | None[source]
class airflow.providers.google.cloud.openlineage.utils.BigQueryErrorRunFacet[source]

Bases: airflow.providers.common.compat.openlineage.facet.RunFacet

Represents errors that can happen during execution of BigqueryExtractor.

Parameters
  • clientError – represents errors originating in bigquery client

  • parserError – represents errors that happened during parsing SQL provided to bigquery

clientError: str | None[source]
parserError: str | None[source]
airflow.providers.google.cloud.openlineage.utils.get_from_nullable_chain(source, chain)[source]

Get object from nested structure of objects, where it’s not guaranteed that all keys in the nested structure exist.

Intended to replace chain of dict.get() statements.

Example usage:

if (
    not job._properties.get("statistics")
    or not job._properties.get("statistics").get("query")
    or not job._properties.get("statistics").get("query").get("referencedTables")
):
    return None
result = job._properties.get("statistics").get("query").get("referencedTables")

becomes:

result = get_from_nullable_chain(properties, ["statistics", "query", "queryPlan"])
if not result:
    return None

Was this entry helpful?