airflow.providers.weaviate.operators.weaviate
¶
Module Contents¶
Classes¶
Operator that store vector in the Weaviate class. |
|
Create or replace objects belonging to documents. |
- class airflow.providers.weaviate.operators.weaviate.WeaviateIngestOperator(conn_id, collection_name, input_data=None, vector_col='Vector', uuid_column='id', tenant=None, hook_params=None, input_json=None, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Operator that store vector in the Weaviate class.
See also
For more information on how to use this operator, take a look at the guide: WeaviateIngestOperator
Operator that accepts input json or pandas dataframe to generate embeddings on or accepting provided custom vectors and store them in the Weaviate class.
- Parameters
conn_id (str) – The Weaviate connection.
collection – The Weaviate collection to be used for storing the data objects into.
input_data (list[dict[str, Any]] | pandas.DataFrame | None) – The list of dicts or pandas dataframe representing Weaviate data objects to generate embeddings on (or provides custom vectors) and store them in the Weaviate class.
vector_col (str) – key/column name in which the vectors are stored.
hook_params (dict | None) – Optional config params to be passed to the underlying hook. Should match the desired hook constructor params.
input_json (list[dict[str, Any]] | pandas.DataFrame | None) – (Deprecated) The JSON representing Weaviate data objects to generate embeddings on (or provides custom vectors) and store them in the Weaviate class.
- class airflow.providers.weaviate.operators.weaviate.WeaviateDocumentIngestOperator(conn_id, input_data, collection_name, document_column, existing='skip', uuid_column='id', vector_col='Vector', tenant=None, verbose=False, hook_params=None, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Create or replace objects belonging to documents.
In real-world scenarios, information sources like Airflow docs, Stack Overflow, or other issues are considered ‘documents’ here. It’s crucial to keep the database objects in sync with these sources. If any changes occur in these documents, this function aims to reflect those changes in the database.
Note
This function assumes responsibility for identifying changes in documents, dropping relevant database objects, and recreating them based on updated information. It’s crucial to handle this process with care, ensuring backups and validation are in place to prevent data loss or inconsistencies.
Provides users with multiple ways of dealing with existing values. replace: replace the existing objects with new objects. This option requires to identify the objects belonging to a document. which by default is done by using document_column field. skip: skip the existing objects and only add the missing objects of a document. error: raise an error if an object belonging to a existing document is tried to be created.
- Parameters
data – A single pandas DataFrame or a list of dicts to be ingested.
collection_name (str) – Name of the collection in Weaviate schema where data is to be ingested.
existing (str) – Strategy for handling existing data: ‘skip’, or ‘replace’. Default is ‘skip’.
document_column (str) – Column in DataFrame that identifying source document.
uuid_column (str) – Column with pre-generated UUIDs. If not provided, UUIDs will be generated.
vector_column – Column with embedding vectors for pre-embedded data.
tenant (str | None) – The tenant to which the object will be added.
verbose (bool) – Flag to enable verbose output during the ingestion process.
hook_params (dict | None) – Optional config params to be passed to the underlying hook. Should match the desired hook constructor params.