airflow.providers.weaviate.operators.weaviate

Module Contents

Classes

WeaviateIngestOperator

Operator that store vector in the Weaviate class.

WeaviateDocumentIngestOperator

Create or replace objects belonging to documents.

class airflow.providers.weaviate.operators.weaviate.WeaviateIngestOperator(conn_id, collection_name, input_data=None, vector_col='Vector', uuid_column='id', tenant=None, hook_params=None, input_json=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Operator that store vector in the Weaviate class.

See also

For more information on how to use this operator, take a look at the guide: WeaviateIngestOperator

Operator that accepts input json or pandas dataframe to generate embeddings on or accepting provided custom vectors and store them in the Weaviate class.

Parameters
  • conn_id (str) – The Weaviate connection.

  • collection – The Weaviate collection to be used for storing the data objects into.

  • input_data (list[dict[str, Any]] | pandas.DataFrame | None) – The list of dicts or pandas dataframe representing Weaviate data objects to generate embeddings on (or provides custom vectors) and store them in the Weaviate class.

  • vector_col (str) – key/column name in which the vectors are stored.

  • hook_params (dict | None) – Optional config params to be passed to the underlying hook. Should match the desired hook constructor params.

  • input_json (list[dict[str, Any]] | pandas.DataFrame | None) – (Deprecated) The JSON representing Weaviate data objects to generate embeddings on (or provides custom vectors) and store them in the Weaviate class.

template_fields: Sequence[str] = ('input_json', 'input_data')[source]
hook()[source]

Return an instance of the WeaviateHook.

execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.weaviate.operators.weaviate.WeaviateDocumentIngestOperator(conn_id, input_data, collection_name, document_column, existing='skip', uuid_column='id', vector_col='Vector', tenant=None, verbose=False, hook_params=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Create or replace objects belonging to documents.

In real-world scenarios, information sources like Airflow docs, Stack Overflow, or other issues are considered ‘documents’ here. It’s crucial to keep the database objects in sync with these sources. If any changes occur in these documents, this function aims to reflect those changes in the database.

Note

This function assumes responsibility for identifying changes in documents, dropping relevant database objects, and recreating them based on updated information. It’s crucial to handle this process with care, ensuring backups and validation are in place to prevent data loss or inconsistencies.

Provides users with multiple ways of dealing with existing values. replace: replace the existing objects with new objects. This option requires to identify the objects belonging to a document. which by default is done by using document_column field. skip: skip the existing objects and only add the missing objects of a document. error: raise an error if an object belonging to a existing document is tried to be created.

Parameters
  • data – A single pandas DataFrame or a list of dicts to be ingested.

  • collection_name (str) – Name of the collection in Weaviate schema where data is to be ingested.

  • existing (str) – Strategy for handling existing data: ‘skip’, or ‘replace’. Default is ‘skip’.

  • document_column (str) – Column in DataFrame that identifying source document.

  • uuid_column (str) – Column with pre-generated UUIDs. If not provided, UUIDs will be generated.

  • vector_column – Column with embedding vectors for pre-embedded data.

  • tenant (str | None) – The tenant to which the object will be added.

  • verbose (bool) – Flag to enable verbose output during the ingestion process.

  • hook_params (dict | None) – Optional config params to be passed to the underlying hook. Should match the desired hook constructor params.

template_fields: Sequence[str] = ('input_data',)[source]
hook()[source]

Return an instance of the WeaviateHook.

execute(context)[source]

Create or replace objects belonging to documents.

Returns

List of UUID which failed to create

Return type

Sequence[dict[str, weaviate.types.UUID | str] | None]

Was this entry helpful?