airflow.providers.common.ai.example_dags.example_llamaindex_rag¶

Example DAGs demonstrating RAG pipelines with LlamaIndex operators.

Three patterns:

Full RAG pipeline – load -> embed -> retrieve -> answer in one DAG.
Separate index/query DAGs – production-shaped split (scheduled indexing job + on-demand query DAG).
Multi-source RAG – combine multiple loaders with source metadata.

The LLMOperator synthesis step uses a pydanticai_default connection because LLMOperator is pydantic-ai-backed; the LlamaIndex operators use llamaindex_default. The two connection types are intentional – they back different frameworks.

Functions¶

`example_llamaindex_rag_pipeline`()	End-to-end RAG pipeline in a single DAG.
`example_llamaindex_index_pdf`()	Weekly indexing DAG -- keep the vector index fresh as PDFs arrive.
`example_llamaindex_query`()	On-demand query DAG -- retrieve from a pre-built index and synthesize.
`example_llamaindex_multi_source`()	Combine multiple loaders with source-tagging metadata.

Module Contents¶

airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_rag_pipeline()[source]¶

End-to-end RAG pipeline in a single DAG.

Parse local text files into document dicts.
Chunk and embed the documents, persisting the index to disk.
Retrieve relevant chunks for a user question.
Synthesize an answer using the retrieved context.

airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_index_pdf()[source]¶

Weekly indexing DAG – keep the vector index fresh as PDFs arrive.

The companion query DAG (below) reads the persisted index on demand.

airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_query()[source]¶

On-demand query DAG – retrieve from a pre-built index and synthesize.

Trigger manually or via API with a question parameter.

airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_multi_source()[source]¶

Combine multiple loaders with source-tagging metadata.

Shows how DocumentLoaderOperator handles different file formats and how metadata_fields tags documents by source for filtered retrieval downstream.