Apache Druid Operators


To use DruidOperator, you must configure a Druid Connection first.


Submit a task directly to Druid, you need to provide the filepath to the Druid index specification json_index_file, and the connection id of the Druid overlord druid_ingest_conn_id which accepts index jobs in Airflow Connections. In addition, you can provide the ingestion type ingestion_type to determine whether the job is Batch Ingestion or SQL-based ingestion.

There is also a example content of the Druid Ingestion specification below.

For parameter definition take a look at DruidOperator.

Using the operator


submit_job = DruidOperator(task_id="spark_submit_job", json_index_file="json_index.json")
# Example content of json_index.json:
        "type": "index_hadoop",
        "datasource": "datasource_prd",
        "spec": {
            "dataSchema": {
                "granularitySpec": {
                    "intervals": ["2021-09-01/2021-09-02"]


For more information, please refer to Apache Druid Ingestion spec reference.

Was this entry helpful?