airflow.providers.common.ai.example_dags.example_llamaindex_rag¶
Example DAGs demonstrating RAG pipelines with LlamaIndex operators.
Three patterns:
Full RAG pipeline – load -> embed -> retrieve -> answer in one DAG.
Separate index/query DAGs – production-shaped split (scheduled indexing job + on-demand query DAG).
Multi-source RAG – combine multiple loaders with source metadata.
The LLMOperator synthesis step uses a pydanticai_default connection
because LLMOperator is
pydantic-ai-backed; the LlamaIndex operators use llamaindex_default.
The two connection types are intentional – they back different frameworks.
Functions¶
End-to-end RAG pipeline in a single DAG. |
|
Weekly indexing DAG -- keep the vector index fresh as PDFs arrive. |
|
On-demand query DAG -- retrieve from a pre-built index and synthesize. |
|
Combine multiple loaders with source-tagging metadata. |
Module Contents¶
- airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_rag_pipeline()[source]¶
End-to-end RAG pipeline in a single DAG.
Parse local text files into document dicts.
Chunk and embed the documents, persisting the index to disk.
Retrieve relevant chunks for a user question.
Synthesize an answer using the retrieved context.
- airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_index_pdf()[source]¶
Weekly indexing DAG – keep the vector index fresh as PDFs arrive.
The companion query DAG (below) reads the persisted index on demand.
- airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_query()[source]¶
On-demand query DAG – retrieve from a pre-built index and synthesize.
Trigger manually or via API with a
questionparameter.
- airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_multi_source()[source]¶
Combine multiple loaders with source-tagging metadata.
Shows how
DocumentLoaderOperatorhandles different file formats and howmetadata_fieldstags documents by source for filtered retrieval downstream.