airflow.providers.common.ai.example_dags.example_llamaindex_rag

Example DAGs demonstrating RAG pipelines with LlamaIndex operators.

Three patterns:

  1. Full RAG pipeline – load -> embed -> retrieve -> answer in one DAG.

  2. Separate index/query DAGs – production-shaped split (scheduled indexing job + on-demand query DAG).

  3. Multi-source RAG – combine multiple loaders with source metadata.

The LLMOperator synthesis step uses a pydanticai_default connection because LLMOperator is pydantic-ai-backed; the LlamaIndex operators use llamaindex_default. The two connection types are intentional – they back different frameworks.

Functions

example_llamaindex_rag_pipeline()

End-to-end RAG pipeline in a single DAG.

example_llamaindex_index_pdf()

Weekly indexing DAG -- keep the vector index fresh as PDFs arrive.

example_llamaindex_query()

On-demand query DAG -- retrieve from a pre-built index and synthesize.

example_llamaindex_multi_source()

Combine multiple loaders with source-tagging metadata.

Module Contents

airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_rag_pipeline()[source]

End-to-end RAG pipeline in a single DAG.

  1. Parse local text files into document dicts.

  2. Chunk and embed the documents, persisting the index to disk.

  3. Retrieve relevant chunks for a user question.

  4. Synthesize an answer using the retrieved context.

airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_index_pdf()[source]

Weekly indexing DAG – keep the vector index fresh as PDFs arrive.

The companion query DAG (below) reads the persisted index on demand.

airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_query()[source]

On-demand query DAG – retrieve from a pre-built index and synthesize.

Trigger manually or via API with a question parameter.

airflow.providers.common.ai.example_dags.example_llamaindex_rag.example_llamaindex_multi_source()[source]

Combine multiple loaders with source-tagging metadata.

Shows how DocumentLoaderOperator handles different file formats and how metadata_fields tags documents by source for filtered retrieval downstream.

Was this entry helpful?