airflow.providers.apache.hive.operators.hive_stats
¶
Module Contents¶
Classes¶
Gather partition statistics and insert them into MySQL. |
- class airflow.providers.apache.hive.operators.hive_stats.HiveStatsCollectionOperator(*, table, partition, extra_exprs=None, excluded_columns=None, assignment_func=None, metastore_conn_id='metastore_default', presto_conn_id='presto_default', mysql_conn_id='airflow_db', ds='{{ ds }}', dttm='{{ logical_date.isoformat() }}', **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Gather partition statistics and insert them into MySQL.
Statistics are gathered with a dynamically generated Presto query and inserted with this format. Stats overwrite themselves if you rerun the same date/partition.
CREATE TABLE hive_stats ( ds VARCHAR(16), table_name VARCHAR(500), metric VARCHAR(200), value BIGINT );
- Parameters
metastore_conn_id (str) – Reference to the Hive Metastore connection id.
table (str) – the source table, in the format
database.table_name
. (templated)partition (Any) – the source partition. (templated)
extra_exprs (dict[str, Any] | None) – dict of expression to run against the table where keys are metric names and values are Presto compatible expressions
excluded_columns (list[str] | None) – list of columns to exclude, consider excluding blobs, large json columns, …
assignment_func (Callable[[str, str], dict[Any, Any] | None] | None) – a function that receives a column name and a type, and returns a dict of metric names and an Presto expressions. If None is returned, the global defaults are applied. If an empty dictionary is returned, no stats are computed for that column.