airflow.providers.apache.hive.operators.hive_stats
¶
Module Contents¶
-
class
airflow.providers.apache.hive.operators.hive_stats.
HiveStatsCollectionOperator
(*, table: str, partition: Any, extra_exprs: Optional[Dict[str, Any]] = None, excluded_columns: Optional[List[str]] = None, assignment_func: Optional[Callable[[str, str], Optional[Dict[Any, Any]]]] = None, metastore_conn_id: str = 'metastore_default', presto_conn_id: str = 'presto_default', mysql_conn_id: str = 'airflow_db', **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Gathers partition statistics using a dynamically generated Presto query, inserts the stats into a MySql table with this format. Stats overwrite themselves if you rerun the same date/partition.
- Parameters
table (str) – the source table, in the format
database.table_name
. (templated)partition (dict of {col:value}) – the source partition. (templated)
extra_exprs (dict) – dict of expression to run against the table where keys are metric names and values are Presto compatible expressions
excluded_columns (list) – list of columns to exclude, consider excluding blobs, large json columns, …
assignment_func (function) – a function that receives a column name and a type, and returns a dict of metric names and an Presto expressions. If None is returned, the global defaults are applied. If an empty dictionary is returned, no stats are computed for that column.