Airflow Summit 2025 is coming October 07-09. Register now to secure your spot!

airflow.providers.apache.druid.hooks.druid

Classes

IngestionType

Druid Ingestion Type. Could be Native batch ingestion or SQL-based ingestion.

DruidHook

Connection to Druid overlord for ingestion.

DruidDbApiHook

Interact with Druid broker.

Module Contents

class airflow.providers.apache.druid.hooks.druid.IngestionType[source]

Bases: enum.Enum

Druid Ingestion Type. Could be Native batch ingestion or SQL-based ingestion.

https://druid.apache.org/docs/latest/ingestion/index.html

BATCH = 1[source]
MSQ = 2[source]
class airflow.providers.apache.druid.hooks.druid.DruidHook(druid_ingest_conn_id='druid_ingest_default', timeout=1, max_ingestion_time=None, verify_ssl=True)[source]

Bases: airflow.providers.apache.druid.version_compat.BaseHook

Connection to Druid overlord for ingestion.

To connect to a Druid cluster that is secured with the druid-basic-security extension, add the username and password to the druid ingestion connection.

Parameters:
  • druid_ingest_conn_id (str) – The connection id to the Druid overlord machine which accepts index jobs

  • timeout (int) – The interval between polling the Druid job for the status of the ingestion job. Must be greater than or equal to 1

  • max_ingestion_time (int | None) – The maximum ingestion time before assuming the job failed

  • verify_ssl (bool) – Whether to use SSL encryption to submit indexing job. If set to False then checks connection information for path to a CA bundle to use. Defaults to True

druid_ingest_conn_id = 'druid_ingest_default'[source]
timeout = 1[source]
max_ingestion_time = None[source]
header[source]
verify_ssl = True[source]
status_endpoint = 'druid/indexer/v1/task'[source]
property conn: airflow.models.Connection[source]
property get_connection_type: str[source]
get_conn_url(ingestion_type=IngestionType.BATCH)[source]

Get Druid connection url.

get_status_url(ingestion_type)[source]

Return Druid status url.

get_auth()[source]

Return username and password from connections tab as requests.auth.HTTPBasicAuth object.

If these details have not been set then returns None.

get_verify()[source]
submit_indexing_job(json_index_spec, ingestion_type=IngestionType.BATCH)[source]

Submit Druid ingestion job.

class airflow.providers.apache.druid.hooks.druid.DruidDbApiHook(context=None, *args, **kwargs)[source]

Bases: airflow.providers.common.sql.hooks.sql.DbApiHook

Interact with Druid broker.

This hook is purely for users to query druid broker. For ingestion, please use druidHook.

Parameters:

context (dict | None) – Optional query context parameters to pass to the SQL endpoint. Example: {"sqlFinalizeOuterSketches": True} See: https://druid.apache.org/docs/latest/querying/sql-query-context/

conn_name_attr = 'druid_broker_conn_id'[source]
default_conn_name = 'druid_broker_default'[source]
conn_type = 'druid'[source]
hook_name = 'Druid'[source]
supports_autocommit = False[source]
context[source]
get_conn()[source]

Establish a connection to druid broker.

get_uri()[source]

Get the connection uri for druid broker.

e.g: druid://localhost:8082/druid/v2/sql/

abstract set_autocommit(conn, autocommit)[source]

Set the autocommit flag on the connection.

abstract insert_rows(table, rows, target_fields=None, commit_every=1000, replace=False, **kwargs)[source]

Insert a collection of tuples into a table.

Rows are inserted in chunks, each chunk (of size commit_every) is done in a new transaction.

Parameters:
  • table (str) – Name of the target table

  • rows (collections.abc.Iterable[tuple[str]]) – The rows to insert into the table

  • target_fields (collections.abc.Iterable[str] | None) – The names of the columns to fill in the table

  • commit_every (int) – The maximum number of rows to insert in one transaction. Set to 0 to insert all rows in one transaction.

  • replace (bool) – Whether to replace instead of insert

  • executemany – If True, all rows are inserted at once in chunks defined by the commit_every parameter. This only works if all rows have same number of column names, but leads to better performance.

  • fast_executemany – If True, the fast_executemany parameter will be set on the cursor used by executemany which leads to better performance, if supported by driver.

  • autocommit – What to set the connection’s autocommit setting to before executing the query.

Was this entry helpful?