airflow.operators.hive_to_druid¶
Module Contents¶
- 
class airflow.operators.hive_to_druid.HiveToDruidTransfer(sql, druid_datasource, ts_dim, metric_spec=None, hive_cli_conn_id='hive_cli_default', druid_ingest_conn_id='druid_ingest_default', metastore_conn_id='metastore_default', hadoop_dependency_coordinates=None, intervals=None, num_shards=- 1, target_partition_size=- 1, query_granularity='NONE', segment_granularity='DAY', hive_tblproperties=None, job_properties=None, *args, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Moves data from Hive to Druid, [del]note that for now the data is loaded into memory before being pushed to Druid, so this operator should be used for smallish amount of data.[/del] - Parameters
- sql (str) – SQL query to execute against the Druid database. (templated) 
- druid_datasource (str) – the datasource you want to ingest into in druid 
- ts_dim (str) – the timestamp dimension 
- metric_spec (list) – the metrics you want to define for your data 
- hive_cli_conn_id (str) – the hive connection id 
- druid_ingest_conn_id (str) – the druid ingest connection id 
- metastore_conn_id (str) – the metastore connection id 
- hadoop_dependency_coordinates (list[str]) – list of coordinates to squeeze int the ingest json 
- intervals (list) – list of time intervals that defines segments, this is passed as is to the json object. (templated) 
- hive_tblproperties (dict) – additional properties for tblproperties in hive for the staging table 
- job_properties (dict) – additional properties for job