airflow.providers.apache.pinot.hooks.pinot
¶
Module Contents¶
Classes¶
This hook is a wrapper around the pinot-admin.sh script. |
|
Interact with Pinot Broker Query API. |
- class airflow.providers.apache.pinot.hooks.pinot.PinotAdminHook(conn_id='pinot_admin_default', cmd_path='pinot-admin.sh', pinot_admin_system_exit=False)[source]¶
Bases:
airflow.hooks.base.BaseHook
This hook is a wrapper around the pinot-admin.sh script.
For now, only small subset of its subcommands are implemented, which are required to ingest offline data into Apache Pinot (i.e., AddSchema, AddTable, CreateSegment, and UploadSegment). Their command options are based on Pinot v0.1.0.
Unfortunately, as of v0.1.0, pinot-admin.sh always exits with status code 0. To address this behavior, users can use the pinot_admin_system_exit flag. If its value is set to false, this hook evaluates the result based on the output message instead of the status code. This Pinot’s behavior is supposed to be improved in the next release, which will include the following PR: https://github.com/apache/incubator-pinot/pull/4110
- Parameters
conn_id (str) – The name of the connection to use.
cmd_path (str) – Do not modify the parameter. It used to be the filepath to the pinot-admin.sh executable but in version 4.0.0 of apache-pinot provider, value of this parameter must remain the default value: pinot-admin.sh. It is left here to not accidentally override the pinot_admin_system_exit in case positional parameters were used to initialize the hook.
pinot_admin_system_exit (bool) – If true, the result is evaluated based on the status code. Otherwise, the result is evaluated as a failure if “Error” or “Exception” is in the output message.
- create_segment(generator_config_file=None, data_dir=None, segment_format=None, out_dir=None, overwrite=None, table_name=None, segment_name=None, time_column_name=None, schema_file=None, reader_config_file=None, enable_star_tree_index=None, star_tree_index_spec_file=None, hll_size=None, hll_columns=None, hll_suffix=None, num_threads=None, post_creation_verification=None, retry=None)[source]¶
Create Pinot segment by run CreateSegment command.
- class airflow.providers.apache.pinot.hooks.pinot.PinotDbApiHook(*args, schema=None, log_sql=True, **kwargs)[source]¶
Bases:
airflow.providers.common.sql.hooks.sql.DbApiHook
Interact with Pinot Broker Query API.
This hook uses standard-SQL endpoint since PQL endpoint is soon to be deprecated. https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql
- get_uri()[source]¶
Get the connection uri for pinot broker.
e.g: http://localhost:9000/query/sql
- abstract insert_rows(table, rows, target_fields=None, commit_every=1000, replace=False, **kwargs)[source]¶
Insert a collection of tuples into a table.
Rows are inserted in chunks, each chunk (of size
commit_every
) is done in a new transaction.- Parameters
table (str) – Name of the target table
rows (str) – The rows to insert into the table
target_fields (str | None) – The names of the columns to fill in the table
commit_every (int) – The maximum number of rows to insert in one transaction. Set to 0 to insert all rows in one transaction.
replace (bool) – Whether to replace instead of insert
executemany – If True, all rows are inserted at once in chunks defined by the commit_every parameter. This only works if all rows have same number of column names, but leads to better performance.
fast_executemany – If True, the fast_executemany parameter will be set on the cursor used by executemany which leads to better performance, if supported by driver.
autocommit – What to set the connection’s autocommit setting to before executing the query.