SQLExecuteQueryOperator to connect to Apache Pinot

Use the SQLExecuteQueryOperator to execute SQL queries against an Apache Pinot cluster.

Note

There is no dedicated operator for Apache Pinot. Please use the SQLExecuteQueryOperator instead.

Note

Make sure you have installed the necessary provider package (e.g. apache-airflow-providers-apache-pinot) to enable Apache Pinot support.

Using the Operator

Use the conn_id argument to connect to your Apache Pinot instance where the connection metadata is structured as follows:

Pinot Airflow Connection Metadata

Parameter

Input

Host: string

Pinot broker hostname or IP address

Port: int

Pinot broker port (default: 8000)

Schema: string

(Not used)

Extra: JSON

Optional fields such as: {"endpoint": "query/sql"}

An example usage of the SQLExecuteQueryOperator to connect to Apache Pinot is as follows:

tests/system/apache/pinot/example_pinot.py


    # Task: Simple query to test connection and query engine
    select_1_task = SQLExecuteQueryOperator(
        task_id="select_1",
        sql="SELECT 1",
    )

    # Task: Count total records in airlineStats (sample table)
    count_airline_stats = SQLExecuteQueryOperator(
        task_id="count_airline_stats",
        sql="SELECT COUNT(*) FROM airlineStats",
    )

    # Task: Group by Carrier and count flights
    group_by_carrier = SQLExecuteQueryOperator(
        task_id="group_by_carrier",
        sql=dedent("""
            SELECT Carrier, COUNT(*) AS flight_count
            FROM airlineStats
            GROUP BY Carrier
            ORDER BY flight_count DESC
            LIMIT 5
        """).strip(),
    )

Reference

For further information, see:

Note

Parameters provided directly via SQLExecuteQueryOperator() take precedence over those specified in the Airflow connection metadata.

Was this entry helpful?