SQLExecuteQueryOperator to connect to Apache Pinot¶
Use the SQLExecuteQueryOperator
to execute SQL queries against an
Apache Pinot cluster.
Note
There is no dedicated operator for Apache Pinot.
Please use the SQLExecuteQueryOperator
instead.
Note
Make sure you have installed the necessary provider package (e.g. apache-airflow-providers-apache-pinot
)
to enable Apache Pinot support.
Using the Operator¶
Use the conn_id
argument to connect to your Apache Pinot instance where
the connection metadata is structured as follows:
Parameter |
Input |
---|---|
Host: string |
Pinot broker hostname or IP address |
Port: int |
Pinot broker port (default: 8000) |
Schema: string |
(Not used) |
Extra: JSON |
Optional fields such as: |
An example usage of the SQLExecuteQueryOperator to connect to Apache Pinot is as follows:
tests/system/apache/pinot/example_pinot.py
# Task: Simple query to test connection and query engine
select_1_task = SQLExecuteQueryOperator(
task_id="select_1",
sql="SELECT 1",
)
# Task: Count total records in airlineStats (sample table)
count_airline_stats = SQLExecuteQueryOperator(
task_id="count_airline_stats",
sql="SELECT COUNT(*) FROM airlineStats",
)
# Task: Group by Carrier and count flights
group_by_carrier = SQLExecuteQueryOperator(
task_id="group_by_carrier",
sql=dedent("""
SELECT Carrier, COUNT(*) AS flight_count
FROM airlineStats
GROUP BY Carrier
ORDER BY flight_count DESC
LIMIT 5
""").strip(),
)
Reference¶
For further information, see:
Note
Parameters provided directly via SQLExecuteQueryOperator() take precedence over those specified in the Airflow connection metadata.