Module Contents



Gather partition statistics and insert them into MySQL.

class airflow.providers.apache.hive.operators.hive_stats.HiveStatsCollectionOperator(*, table, partition, extra_exprs=None, excluded_columns=None, assignment_func=None, metastore_conn_id='metastore_default', presto_conn_id='presto_default', mysql_conn_id='airflow_db', ds='{{ ds }}', dttm='{{ logical_date.isoformat() }}', **kwargs)[source]

Bases: airflow.models.BaseOperator

Gather partition statistics and insert them into MySQL.

Statistics are gathered with a dynamically generated Presto query and inserted with this format. Stats overwrite themselves if you rerun the same date/partition.

CREATE TABLE hive_stats (
    ds VARCHAR(16),
    table_name VARCHAR(500),
    metric VARCHAR(200),
    value BIGINT
  • metastore_conn_id (str) – Reference to the Hive Metastore connection id.

  • table (str) – the source table, in the format database.table_name. (templated)

  • partition (Any) – the source partition. (templated)

  • extra_exprs (dict[str, Any] | None) – dict of expression to run against the table where keys are metric names and values are Presto compatible expressions

  • excluded_columns (list[str] | None) – list of columns to exclude, consider excluding blobs, large json columns, …

  • assignment_func (Callable[[str, str], dict[Any, Any] | None] | None) – a function that receives a column name and a type, and returns a dict of metric names and an Presto expressions. If None is returned, the global defaults are applied. If an empty dictionary is returned, no stats are computed for that column.

template_fields: Sequence[str] = ('table', 'partition', 'ds', 'dttm')[source]
ui_color = '#aff7a6'[source]
get_default_exprs(col, col_type)[source]

Get default expressions.


Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?