Airflow Summit 2025 is coming October 07-09. Register now for early bird ticket!

airflow.providers.pinecone.operators.pinecone

Classes

PineconeIngestOperator

Ingest vector embeddings into Pinecone.

CreatePodIndexOperator

Create a pod based index in Pinecone.

CreateServerlessIndexOperator

Create a serverless index in Pinecone.

Module Contents

class airflow.providers.pinecone.operators.pinecone.PineconeIngestOperator(*, conn_id=PineconeHook.default_conn_name, index_name, input_vectors, namespace='', batch_size=None, upsert_kwargs=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Ingest vector embeddings into Pinecone.

See also

For more information on how to use this operator, take a look at the guide: Ingest data into a pinecone index

Parameters:
  • conn_id (str) – The connection id to use when connecting to Pinecone.

  • index_name (str) – Name of the Pinecone index.

  • input_vectors (list[pinecone.Vector] | list[tuple] | list[dict]) – Data to be ingested, in the form of a list of vectors, list of tuples, or list of dictionaries.

  • namespace (str) – The namespace to write to. If not specified, the default namespace is used.

  • batch_size (int | None) – The number of vectors to upsert in each batch.

  • upsert_kwargs (dict | None) –

template_fields: collections.abc.Sequence[str] = ('index_name', 'input_vectors', 'namespace')[source]
upsert_kwargs[source]
conn_id = 'pinecone_default'[source]
index_name[source]
namespace = ''[source]
batch_size = None[source]
input_vectors[source]
property hook: airflow.providers.pinecone.hooks.pinecone.PineconeHook[source]

Return an instance of the PineconeHook.

execute(context)[source]

Ingest data into Pinecone using the PineconeHook.

class airflow.providers.pinecone.operators.pinecone.CreatePodIndexOperator(*, conn_id=PineconeHook.default_conn_name, index_name, dimension, environment=None, replicas=None, shards=None, pods=None, pod_type='p1.x1', metadata_config=None, source_collection=None, metric='cosine', timeout=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Create a pod based index in Pinecone.

See also

For more information on how to use this operator, take a look at the guide: Create a Pod based Index

Parameters:
  • conn_id (str) – The connection id to use when connecting to Pinecone.

  • index_name (str) – Name of the Pinecone index.

  • dimension (int) – The dimension of the vectors to be indexed.

  • environment (str | None) – The environment to use when creating the index.

  • replicas (int | None) – The number of replicas to use.

  • shards (int | None) – The number of shards to use.

  • pods (int | None) – The number of pods to use.

  • pod_type (str) – The type of pod to use. Defaults to p1.x1

  • metadata_config (dict | None) – The metadata configuration to use.

  • source_collection (str | None) – The source collection to use.

  • metric (str) – The metric to use. Defaults to cosine.

  • timeout (int | None) – The timeout to use.

conn_id = 'pinecone_default'[source]
index_name[source]
dimension[source]
environment = None[source]
replicas = None[source]
shards = None[source]
pods = None[source]
pod_type = 'p1.x1'[source]
metadata_config = None[source]
source_collection = None[source]
metric = 'cosine'[source]
timeout = None[source]
property hook: airflow.providers.pinecone.hooks.pinecone.PineconeHook[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.pinecone.operators.pinecone.CreateServerlessIndexOperator(*, conn_id=PineconeHook.default_conn_name, index_name, dimension, cloud, region=None, metric=None, timeout=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Create a serverless index in Pinecone.

See also

For more information on how to use this operator, take a look at the guide: Create a Serverless Index

Parameters:
  • conn_id (str) – The connection id to use when connecting to Pinecone.

  • index_name (str) – Name of the Pinecone index.

  • dimension (int) – The dimension of the vectors to be indexed.

  • cloud (str) – The cloud to use when creating the index.

  • region (str | None) – The region to use when creating the index.

  • metric (str | None) – The metric to use.

  • timeout (int | None) – The timeout to use.

conn_id = 'pinecone_default'[source]
index_name[source]
dimension[source]
cloud[source]
region = None[source]
metric = None[source]
timeout = None[source]
property hook: airflow.providers.pinecone.hooks.pinecone.PineconeHook[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?