`airflow.providers.pinecone.hooks.pinecone`¶

index_name (str) – The name of the index to describe.
vectors (list[pinecone.Vector] | list[tuple] | list[dict]) – A list of vectors to upsert.
namespace (str) – The namespace to write to. If not specified, the default namespace - “” is used.
batch_size (int | None) – The number of vectors to upsert in each batch.
show_progress (bool) – Whether to show a progress bar using tqdm. Applied only if batch_size is provided.

get_pod_spec_obj(*, replicas=None, shards=None, pods=None, pod_type='p1.x1', metadata_config=None, source_collection=None, environment=None)[source]¶

Get a PodSpec object.

Parameters

replicas (int | None) – The number of replicas.
shards (int | None) – The number of shards.
pods (int | None) – The number of pods.
pod_type (str | None) – The type of pod.
metadata_config (dict | None) – The metadata configuration.
source_collection (str | None) – The source collection.
environment (str | None) – The environment to use when creating the index.

get_serverless_spec_obj(*, cloud, region=None)[source]¶

Get a ServerlessSpec object.

Parameters

cloud (str) – The cloud provider.
region (str | None) – The region to use when creating the index.

create_index(index_name, dimension, spec, metric='cosine', timeout=None)[source]¶

Create a new index.

Parameters

index_name (str) – The name of the index.
dimension (int) – The dimension of the vectors to be indexed.
spec (pinecone.ServerlessSpec | pinecone.PodSpec) – Pass a ServerlessSpec object to create a serverless index or a PodSpec object to create a pod index. get_serverless_spec_obj and get_pod_spec_obj can be used to create the Spec objects.
metric (str | None) – The metric to use. Defaults to cosine.
timeout (int | None) – The timeout to use.

describe_index(index_name)[source]¶

Retrieve information about a specific index.

Parameters: index_name (str) – The name of the index to describe.

delete_index(index_name, timeout=None)[source]¶

Delete a specific index.

Parameters

index_name (str) – the name of the index.
timeout (int | None) – Timeout for wait until index gets ready.

configure_index(index_name, replicas=None, pod_type='')[source]¶

Change the current configuration of the index.

Parameters

index_name (str) – The name of the index to configure.
replicas (int | None) – The new number of replicas.
pod_type (str | None) – the new pod_type for the index.

create_collection(collection_name, index_name)[source]¶

Create a new collection from a specified index.

Parameters

collection_name (str) – The name of the collection to create.
index_name (str) – The name of the source index.

delete_collection(collection_name)[source]¶

Delete a specific collection.

Parameters: collection_name (str) – The name of the collection to delete.

describe_collection(collection_name)[source]¶

Retrieve information about a specific collection.

Parameters: collection_name (str) – The name of the collection to describe.

list_collections()[source]¶

Retrieve a list of all collections in the current project.

query_vector(index_name, vector, query_id=None, top_k=10, namespace=None, query_filter=None, include_values=None, include_metadata=None, sparse_vector=None)[source]¶

Search a namespace using query vector.

It retrieves the ids of the most similar items in a namespace, along with their similarity scores. API reference: https://docs.pinecone.io/reference/query

Parameters

index_name (str) – The name of the index to query.
vector (list[Any]) – The query vector.
query_id (str | None) – The unique ID of the vector to be used as a query vector.
top_k (int) – The number of results to return.
namespace (str | None) – The namespace to fetch vectors from. If not specified, the default namespace is used.
query_filter (dict[str, str | float | int | bool | list[Any] | dict[Any, Any]] | None) – The filter to apply. See https://www.pinecone.io/docs/metadata-filtering/
include_values (bool | None) – Whether to include the vector values in the result.
include_metadata (bool | None) – Indicates whether metadata is included in the response as well as the ids.
sparse_vector (pinecone.core.client.model.sparse_values.SparseValues | dict[str, list[float] | list[int]] | None) – sparse values of the query vector. Expected to be either a SparseValues object or a dict of the form: {‘indices’: List[int], ‘values’: List[float]}, where the lists each have the same length.

upsert_data_async(index_name, data, async_req=False, pool_threads=None)[source]¶

Upserts (insert/update) data into the Pinecone index.

Parameters

index_name (str) – Name of the index.
data (list[tuple[Any]]) – List of tuples to be upserted. Each tuple is of form (id, vector, metadata). Metadata is optional.
async_req (bool) – If True, upsert operations will be asynchronous.
pool_threads (int | None) – Number of threads for parallel upserting. If async_req is True, this must be provided.

describe_index_stats(index_name, stats_filter=None, **kwargs)[source]¶

Describe the index statistics.

Returns statistics about the index’s contents. For example: The vector count per namespace and the number of dimensions. API reference: https://docs.pinecone.io/reference/describe_index_stats_post

Parameters

index_name (str) – Name of the index.
stats_filter (dict[str, str | float | int | bool | list[Any] | dict[Any, Any]] | None) – If this parameter is present, the operation only returns statistics for vectors that satisfy the filter. See https://www.pinecone.io/docs/metadata-filtering/

airflow.providers.pinecone.hooks.pinecone¶

Module Contents¶

Classes¶

`airflow.providers.pinecone.hooks.pinecone`¶