airflow.providers.pinecone.hooks.pinecone
¶
Hook for Pinecone.
Module Contents¶
Classes¶
Interact with Pinecone. This hook uses the Pinecone conn_id. |
- class airflow.providers.pinecone.hooks.pinecone.PineconeHook(conn_id=default_conn_name, environment=None, region=None)[source]¶
Bases:
airflow.hooks.base.BaseHook
Interact with Pinecone. This hook uses the Pinecone conn_id.
- Parameters
conn_id (str) – Optional, default connection id is pinecone_default. The connection id to use when connecting to Pinecone.
- classmethod get_connection_form_widgets()[source]¶
Return connection widgets to add to connection form.
- upsert(index_name, vectors, namespace='', batch_size=None, show_progress=True, **kwargs)[source]¶
Write vectors into a namespace.
If a new value is upserted for an existing vector id, it will overwrite the previous value.
To upsert in parallel follow
- Parameters
index_name (str) – The name of the index to describe.
vectors (list[pinecone.Vector] | list[tuple] | list[dict]) – A list of vectors to upsert.
namespace (str) – The namespace to write to. If not specified, the default namespace - “” is used.
batch_size (int | None) – The number of vectors to upsert in each batch.
show_progress (bool) – Whether to show a progress bar using tqdm. Applied only if batch_size is provided.
- get_pod_spec_obj(*, replicas=None, shards=None, pods=None, pod_type='p1.x1', metadata_config=None, source_collection=None, environment=None)[source]¶
Get a PodSpec object.
- Parameters
replicas (int | None) – The number of replicas.
shards (int | None) – The number of shards.
pods (int | None) – The number of pods.
pod_type (str | None) – The type of pod.
metadata_config (dict | None) – The metadata configuration.
source_collection (str | None) – The source collection.
environment (str | None) – The environment to use when creating the index.
- create_index(index_name, dimension, spec, metric='cosine', timeout=None)[source]¶
Create a new index.
- Parameters
index_name (str) – The name of the index.
dimension (int) – The dimension of the vectors to be indexed.
spec (pinecone.ServerlessSpec | pinecone.PodSpec) – Pass a ServerlessSpec object to create a serverless index or a PodSpec object to create a pod index.
get_serverless_spec_obj
andget_pod_spec_obj
can be used to create the Spec objects.metric (str | None) – The metric to use. Defaults to cosine.
timeout (int | None) – The timeout to use.
- describe_index(index_name)[source]¶
Retrieve information about a specific index.
- Parameters
index_name (str) – The name of the index to describe.
- configure_index(index_name, replicas=None, pod_type='')[source]¶
Change the current configuration of the index.
- create_collection(collection_name, index_name)[source]¶
Create a new collection from a specified index.
- delete_collection(collection_name)[source]¶
Delete a specific collection.
- Parameters
collection_name (str) – The name of the collection to delete.
- describe_collection(collection_name)[source]¶
Retrieve information about a specific collection.
- Parameters
collection_name (str) – The name of the collection to describe.
- query_vector(index_name, vector, query_id=None, top_k=10, namespace=None, query_filter=None, include_values=None, include_metadata=None, sparse_vector=None)[source]¶
Search a namespace using query vector.
It retrieves the ids of the most similar items in a namespace, along with their similarity scores. API reference: https://docs.pinecone.io/reference/query
- Parameters
index_name (str) – The name of the index to query.
vector (list[Any]) – The query vector.
query_id (str | None) – The unique ID of the vector to be used as a query vector.
top_k (int) – The number of results to return.
namespace (str | None) – The namespace to fetch vectors from. If not specified, the default namespace is used.
query_filter (dict[str, str | float | int | bool | list[Any] | dict[Any, Any]] | None) – The filter to apply. See https://www.pinecone.io/docs/metadata-filtering/
include_values (bool | None) – Whether to include the vector values in the result.
include_metadata (bool | None) – Indicates whether metadata is included in the response as well as the ids.
sparse_vector (pinecone.core.client.model.sparse_values.SparseValues | dict[str, list[float] | list[int]] | None) – sparse values of the query vector. Expected to be either a SparseValues object or a dict of the form: {‘indices’: List[int], ‘values’: List[float]}, where the lists each have the same length.
- upsert_data_async(index_name, data, async_req=False, pool_threads=None)[source]¶
Upserts (insert/update) data into the Pinecone index.
- Parameters
index_name (str) – Name of the index.
data (list[tuple[Any]]) – List of tuples to be upserted. Each tuple is of form (id, vector, metadata). Metadata is optional.
async_req (bool) – If True, upsert operations will be asynchronous.
pool_threads (int | None) – Number of threads for parallel upserting. If async_req is True, this must be provided.
- describe_index_stats(index_name, stats_filter=None, **kwargs)[source]¶
Describe the index statistics.
Returns statistics about the index’s contents. For example: The vector count per namespace and the number of dimensions. API reference: https://docs.pinecone.io/reference/describe_index_stats_post