airflow.providers.common.ai.example_dags.example_llm_survey_analysis

Natural language analysis of a survey CSV – interactive and scheduled variants.

Both DAGs query the Airflow Community Survey 2025 CSV using LLMSQLQueryOperator and AnalyticsOperator.

example_llm_survey_interactive (five tasks, manual trigger) adds human-in-the-loop review at both ends of the pipeline: HITLEntryOperator, LLMSQLQueryOperator, AnalyticsOperator, a @task extraction step, and ApprovalOperator.

example_llm_survey_scheduled (seven tasks, runs monthly) downloads the CSV, validates its schema, generates and executes SQL, then emails or logs the result. No human review steps – suitable for recurring reporting or dashboards.

Before running either DAG:

  1. Create an LLM connection named pydanticai_default (or the value of LLM_CONN_ID below) for your chosen model provider.

  2. Place the survey CSV at the path set by the SURVEY_CSV_PATH environment variable, or update SURVEY_CSV_PATH below. A cleaned copy of the 2025 survey CSV (duplicate columns renamed, embedded newlines removed) is required – Apache DataFusion is strict about these.

Attributes

LLM_CONN_ID

AIRFLOW_WEBSITE_CONN_ID

SURVEY_CSV_ENDPOINT

SURVEY_CSV_PATH

SURVEY_CSV_URI

REFERENCE_CSV_PATH

REFERENCE_CSV_URI

SMTP_CONN_ID

NOTIFY_EMAIL

INTERACTIVE_PROMPT

SCHEDULED_PROMPT

SURVEY_SCHEMA

survey_datasource

reference_datasource

Functions

example_llm_survey_interactive()

Ask a natural language question about the survey with human review at each end.

example_llm_survey_scheduled()

Download, validate, query, and report on the survey CSV on a schedule.

Module Contents

airflow.providers.common.ai.example_dags.example_llm_survey_analysis.LLM_CONN_ID = 'pydanticai_default'[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.AIRFLOW_WEBSITE_CONN_ID = 'airflow_website'[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.SURVEY_CSV_ENDPOINT = '/survey/airflow-user-survey-2025.csv'[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.SURVEY_CSV_PATH[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.SURVEY_CSV_URI[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.REFERENCE_CSV_PATH[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.REFERENCE_CSV_URI[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.SMTP_CONN_ID[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.NOTIFY_EMAIL[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.INTERACTIVE_PROMPT = 'How does AI tool usage for writing Airflow code compare between Airflow 3 users and Airflow 2 users?'[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.SCHEDULED_PROMPT = 'What is the breakdown of respondents by Airflow version currently in use?'[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.SURVEY_SCHEMA = Multiline-String[source]
Show Value
"""
Table: survey
Key columns (quote all names in SQL):
  "How important is Airflow to your business?"                                                TEXT
  "Which version of Airflow do you currently use?"                                            TEXT
  "CeleryExecutor"                                                                            TEXT
  "KubernetesExecutor"                                                                        TEXT
  "LocalExecutor"                                                                             TEXT
  "How do you deploy Airflow?"                                                                TEXT
  "What best describes your current occupation?"                                              TEXT
  "What industry do you currently work in?"                                                   TEXT
  "What city do you currently reside in?"                                                     TEXT
  "How many years of experience do you have with Airflow?"                                    TEXT
  "Which of the following is your company's primary cloud provider for Airflow?"              TEXT
  "How many people work at your company?"                                                     TEXT
  "How many people at your company directly work on data?"                                    TEXT
  "How many people at your company use Airflow?"                                              TEXT
  "How likely are you to recommend Apache Airflow?"                                           TEXT
  "Are you using AI/LLM (ChatGPT/Cursor/Claude etc) to assist you in writing Airflow code?"  TEXT
"""
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.survey_datasource[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.reference_datasource[source]
airflow.providers.common.ai.example_dags.example_llm_survey_analysis.example_llm_survey_interactive()[source]

Ask a natural language question about the survey with human review at each end.

Task graph:

prompt_confirmation (HITLEntryOperator)
    → generate_sql (LLMSQLQueryOperator)
    → run_query (AnalyticsOperator)
    → extract_data (@task)
    → result_confirmation (ApprovalOperator)

The first HITL step lets the analyst review and optionally reword the question before it reaches the LLM. The final HITL step presents the query result for approval or rejection.

airflow.providers.common.ai.example_dags.example_llm_survey_analysis.example_llm_survey_scheduled()[source]

Download, validate, query, and report on the survey CSV on a schedule.

Task graph:

download_survey (HttpOperator)
    → prepare_csv (@task)
    → check_schema (LLMSchemaCompareOperator)
    → generate_sql (LLMSQLQueryOperator)
    → run_query (AnalyticsOperator)
    → extract_data (@task)
    → send_result (@task)

No human review steps – suitable for recurring reporting or dashboards. Change schedule to any cron expression or Airflow timetable to adjust the run frequency.

Prerequisites:

  • HTTP connection airflow_website pointing at https://airflow.apache.org.

  • Set SMTP_CONN_ID and NOTIFY_EMAIL environment variables to enable email delivery of results; otherwise results are logged to the task log.

Was this entry helpful?