airflow.operators.hive_stats_operator¶
Module Contents¶
- 
class airflow.operators.hive_stats_operator.HiveStatsCollectionOperator(table, partition, extra_exprs=None, excluded_columns=None, assignment_func=None, metastore_conn_id='metastore_default', presto_conn_id='presto_default', mysql_conn_id='airflow_db', *args, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Gathers partition statistics using a dynamically generated Presto query, inserts the stats into a MySql table with this format. Stats overwrite themselves if you rerun the same date/partition. - CREATE TABLE hive_stats ( ds VARCHAR(16), table_name VARCHAR(500), metric VARCHAR(200), value BIGINT ); - Parameters
- table (str) – the source table, in the format - database.table_name. (templated)
- partition (dict of {col:value}) – the source partition. (templated) 
- extra_exprs (dict) – dict of expression to run against the table where keys are metric names and values are Presto compatible expressions 
- excluded_columns (list) – list of columns to exclude, consider excluding blobs, large json columns, … 
- assignment_func (function) – a function that receives a column name and a type, and returns a dict of metric names and an Presto expressions. If None is returned, the global defaults are applied. If an empty dictionary is returned, no stats are computed for that column.