airflow.operators.hive_to_druid
¶
Module Contents¶
-
class
airflow.operators.hive_to_druid.
HiveToDruidTransfer
(sql, druid_datasource, ts_dim, metric_spec=None, hive_cli_conn_id='hive_cli_default', druid_ingest_conn_id='druid_ingest_default', metastore_conn_id='metastore_default', hadoop_dependency_coordinates=None, intervals=None, num_shards=-1, target_partition_size=-1, query_granularity='NONE', segment_granularity='DAY', hive_tblproperties=None, job_properties=None, *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Moves data from Hive to Druid, [del]note that for now the data is loaded into memory before being pushed to Druid, so this operator should be used for smallish amount of data.[/del]
- Parameters
sql (str) – SQL query to execute against the Druid database. (templated)
druid_datasource (str) – the datasource you want to ingest into in druid
ts_dim (str) – the timestamp dimension
metric_spec (list) – the metrics you want to define for your data
hive_cli_conn_id (str) – the hive connection id
druid_ingest_conn_id (str) – the druid ingest connection id
metastore_conn_id (str) – the metastore connection id
hadoop_dependency_coordinates (list[str]) – list of coordinates to squeeze int the ingest json
intervals (list) – list of time intervals that defines segments, this is passed as is to the json object. (templated)
hive_tblproperties (dict) – additional properties for tblproperties in hive for the staging table
job_properties (dict) – additional properties for job