airflow.providers.apache.pinot.hooks.pinot

Module Contents

Classes

PinotAdminHook

This hook is a wrapper around the pinot-admin.sh script.

PinotDbApiHook

Interact with Pinot Broker Query API.

class airflow.providers.apache.pinot.hooks.pinot.PinotAdminHook(conn_id='pinot_admin_default', cmd_path='pinot-admin.sh', pinot_admin_system_exit=False)[source]

Bases: airflow.hooks.base.BaseHook

This hook is a wrapper around the pinot-admin.sh script.

For now, only small subset of its subcommands are implemented, which are required to ingest offline data into Apache Pinot (i.e., AddSchema, AddTable, CreateSegment, and UploadSegment). Their command options are based on Pinot v0.1.0.

Unfortunately, as of v0.1.0, pinot-admin.sh always exits with status code 0. To address this behavior, users can use the pinot_admin_system_exit flag. If its value is set to false, this hook evaluates the result based on the output message instead of the status code. This Pinot’s behavior is supposed to be improved in the next release, which will include the following PR: https://github.com/apache/incubator-pinot/pull/4110

Parameters
  • conn_id (str) – The name of the connection to use.

  • cmd_path (str) – Do not modify the parameter. It used to be the filepath to the pinot-admin.sh executable but in version 4.0.0 of apache-pinot provider, value of this parameter must remain the default value: pinot-admin.sh. It is left here to not accidentally override the pinot_admin_system_exit in case positional parameters were used to initialize the hook.

  • pinot_admin_system_exit (bool) – If true, the result is evaluated based on the status code. Otherwise, the result is evaluated as a failure if “Error” or “Exception” is in the output message.

conn_name_attr = 'conn_id'[source]
default_conn_name = 'pinot_admin_default'[source]
conn_type = 'pinot_admin'[source]
hook_name = 'Pinot Admin'[source]
get_conn()[source]

Return connection for the hook.

add_schema(schema_file, with_exec=True)[source]

Add Pinot schema by run AddSchema command.

Parameters
  • schema_file (str) – Pinot schema file

  • with_exec (bool) – bool

add_table(file_path, with_exec=True)[source]

Add Pinot table with run AddTable command.

Parameters
  • file_path (str) – Pinot table configure file

  • with_exec (bool) – bool

create_segment(generator_config_file=None, data_dir=None, segment_format=None, out_dir=None, overwrite=None, table_name=None, segment_name=None, time_column_name=None, schema_file=None, reader_config_file=None, enable_star_tree_index=None, star_tree_index_spec_file=None, hll_size=None, hll_columns=None, hll_suffix=None, num_threads=None, post_creation_verification=None, retry=None)[source]

Create Pinot segment by run CreateSegment command.

upload_segment(segment_dir, table_name=None)[source]

Upload Segment with run UploadSegment command.

Parameters
  • segment_dir (str) –

  • table_name (str | None) –

Returns

Return type

Any

run_cli(cmd, verbose=True)[source]

Run command with pinot-admin.sh.

Parameters
  • cmd (list[str]) – List of command going to be run by pinot-admin.sh script

  • verbose (bool) –

class airflow.providers.apache.pinot.hooks.pinot.PinotDbApiHook(*args, schema=None, log_sql=True, **kwargs)[source]

Bases: airflow.providers.common.sql.hooks.sql.DbApiHook

Interact with Pinot Broker Query API.

This hook uses standard-SQL endpoint since PQL endpoint is soon to be deprecated. https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql

conn_name_attr = 'pinot_broker_conn_id'[source]
default_conn_name = 'pinot_broker_default'[source]
conn_type = 'pinot'[source]
hook_name = 'Pinot Broker'[source]
supports_autocommit = False[source]
get_conn()[source]

Establish a connection to pinot broker through pinot dbapi.

get_uri()[source]

Get the connection uri for pinot broker.

e.g: http://localhost:9000/query/sql

get_records(sql, parameters=None, **kwargs)[source]

Execute the sql and returns a set of records.

Parameters
  • sql (str | list[str]) – the sql statement to be executed (str) or a list of sql statements to execute

  • parameters (Iterable | Mapping[str, Any] | None) – The parameters to render the SQL query with.

get_first(sql, parameters=None)[source]

Execute the sql and returns the first resulting row.

Parameters
  • sql (str | list[str]) – the sql statement to be executed (str) or a list of sql statements to execute

  • parameters (Iterable | Mapping[str, Any] | None) – The parameters to render the SQL query with.

abstract set_autocommit(conn, autocommit)[source]

Set the autocommit flag on the connection.

abstract insert_rows(table, rows, target_fields=None, commit_every=1000, replace=False, **kwargs)[source]

Insert a collection of tuples into a table.

Rows are inserted in chunks, each chunk (of size commit_every) is done in a new transaction.

Parameters
  • table (str) – Name of the target table

  • rows (str) – The rows to insert into the table

  • target_fields (str | None) – The names of the columns to fill in the table

  • commit_every (int) – The maximum number of rows to insert in one transaction. Set to 0 to insert all rows in one transaction.

  • replace (bool) – Whether to replace instead of insert

  • executemany – If True, all rows are inserted at once in chunks defined by the commit_every parameter. This only works if all rows have same number of column names, but leads to better performance.

  • fast_executemany – If True, the fast_executemany parameter will be set on the cursor used by executemany which leads to better performance, if supported by driver.

  • autocommit – What to set the connection’s autocommit setting to before executing the query.

Was this entry helpful?