airflow.providers.apache.hive.operators.hive_stats¶
Module Contents¶
Classes¶
| Gathers partition statistics using a dynamically generated Presto | 
- class airflow.providers.apache.hive.operators.hive_stats.HiveStatsCollectionOperator(*, table: str, partition: Any, extra_exprs: Optional[Dict[str, Any]] = None, excluded_columns: Optional[List[str]] = None, assignment_func: Optional[Callable[[str, str], Optional[Dict[Any, Any]]]] = None, metastore_conn_id: str = 'metastore_default', presto_conn_id: str = 'presto_default', mysql_conn_id: str = 'airflow_db', **kwargs: Any)[source]¶
- Bases: - airflow.models.BaseOperator- Gathers partition statistics using a dynamically generated Presto query, inserts the stats into a MySql table with this format. Stats overwrite themselves if you rerun the same date/partition. - CREATE TABLE hive_stats ( ds VARCHAR(16), table_name VARCHAR(500), metric VARCHAR(200), value BIGINT ); - Parameters
- metastore_conn_id (str) -- Reference to the Hive Metastore connection id. 
- table (str) -- the source table, in the format - database.table_name. (templated)
- partition (dict of {col:value}) -- the source partition. (templated) 
- extra_exprs (dict) -- dict of expression to run against the table where keys are metric names and values are Presto compatible expressions 
- excluded_columns (list) -- list of columns to exclude, consider excluding blobs, large json columns, ... 
- assignment_func (function) -- a function that receives a column name and a type, and returns a dict of metric names and an Presto expressions. If None is returned, the global defaults are applied. If an empty dictionary is returned, no stats are computed for that column.