airflow.providers.common.ai.utils.sql_validation¶
SQL safety validation for LLM-generated queries.
Uses an allowlist approach: only explicitly permitted statement types pass. This is safer than a denylist because new/unexpected statement types (INSERT, UPDATE, MERGE, TRUNCATE, COPY, etc.) are blocked by default.
Attributes¶
Exceptions¶
Generated SQL failed safety validation. |
Classes¶
Result of |
Functions¶
|
Normalize a SQLAlchemy dialect name to a sqlglot dialect. |
|
Parse SQL into statements, enforcing the empty- and multi-statement guards only. |
|
Walk parsed statements and report every real table they reach, scope-correctly. |
|
Parse SQL and verify all statements are in the allowed types list. |
Module Contents¶
- airflow.providers.common.ai.utils.sql_validation.resolve_sqlglot_dialect(dialect_name)[source]¶
Normalize a SQLAlchemy dialect name to a sqlglot dialect.
Returns
None(dialect-agnostic parsing) for empty, non-string, or unknown inputs, so a bad dialect value never breaks SQL validation.
- airflow.providers.common.ai.utils.sql_validation.DEFAULT_ALLOWED_TYPES: tuple[type[sqlglot.exp.Expr], Ellipsis][source]¶
- airflow.providers.common.ai.utils.sql_validation.READ_ONLY_METADATA_TYPES: tuple[type[sqlglot.exp.Expr], Ellipsis][source]¶
- exception airflow.providers.common.ai.utils.sql_validation.SQLSafetyError[source]¶
Bases:
ExceptionGenerated SQL failed safety validation.
- airflow.providers.common.ai.utils.sql_validation.parse_sql(sql, *, dialect=None, allow_multiple_statements=False)[source]¶
Parse SQL into statements, enforcing the empty- and multi-statement guards only.
Shared by
validate_sql()(which then applies statement-type checks) and by callers that need the parsed AST for their own analysis – e.g. table-reference extraction forallowed_tablesenforcement – without the read-only allow-list.- Parameters:
- Returns:
List of parsed sqlglot Expression objects (never empty).
- Raises:
SQLSafetyError – If the SQL is empty, cannot be parsed, or contains multiple statements when not permitted.
- Return type:
list[sqlglot.exp.Expr]
- class airflow.providers.common.ai.utils.sql_validation.TableScan[source]¶
Bases:
NamedTupleResult of
collect_table_references().
- airflow.providers.common.ai.utils.sql_validation.collect_table_references(statements)[source]¶
Walk parsed statements and report every real table they reach, scope-correctly.
This is the AST half of
allowed_tablesenforcement: it returns the concrete base tables a query reaches (including those nested in subqueries, CTEs, JOINs, set operations,DESCRIBE, and DML) as(catalog, schema, table)so the caller can check each against its allow-list, plus a list of constructs that cannot be checked and must therefore be rejected while an allow-list is active.Handled carefully (each was a confirmed bypass before it was closed):
CTE references are excluded by lexical scope, not by name. A table is treated as a CTE only when a
WITHenclosing that reference defines the name (see_is_in_scope_cte()); a same-named CTE in a sibling/inner query no longer hides a real top-level table. A DML target is always a real table (you cannot write to a CTE, so a same-named CTE does not shadow it), but DML sources follow normal CTE scoping – a CTE used as an INSERT/UPDATE source is not flagged.Catalog-qualified references are reported with their catalog, so the caller rejects
otherdb.public.ordersinstead of matching it topublic.orders.Unverifiable constructs are listed, not silently dropped: nameless table-valued functions (
dblink),TABLE('name')row sources (exp.TableFromRows),SHOW, dynamic SQL (EXEC/Command), theTABLE <name>shorthand (which sqlglot parses incorrectly, leaking theTABLEkeyword as a column), a quoted identifier (case-sensitive on the engine but matched case-insensitively here, so"Orders"could otherwise reach a table distinct from the allow-listedorders), and any inline comment – comments are where parser-vs-engine differentials hide (MySQL executable/*! ... */,--not followed by whitespace,#).
- Parameters:
statements (list[sqlglot.exp.Expr]) – Parsed sqlglot statements (from
parse_sql()).- Returns:
A
TableScanof real table references and unverifiable constructs.- Return type:
- airflow.providers.common.ai.utils.sql_validation.validate_sql(sql, *, allowed_types=None, dialect=None, allow_multiple_statements=False, allow_read_only_metadata=False)[source]¶
Parse SQL and verify all statements are in the allowed types list.
By default, only a single SELECT-family statement is allowed. Multi-statement SQL (separated by semicolons) is rejected unless
allow_multiple_statements=True, because multi-statement inputs can hide dangerous operations after a benign SELECT.Returns parsed statements on success, raises
SQLSafetyErroron violation.- Parameters:
sql (str) – SQL string to validate.
allowed_types (tuple[type[sqlglot.exp.Expr], Ellipsis] | None) – Tuple of sqlglot expression types to permit. Defaults to
(Select, Union, Intersect, Except). When supplied, the caller takes full control of the allow-list andallow_read_only_metadatais ignored.dialect (str | None) – SQL dialect for parsing (
postgres,mysql, etc.).allow_multiple_statements (bool) – Whether to allow multiple semicolon-separated statements. Default
False.allow_read_only_metadata (bool) – Also permit read-only metadata statements (
DESCRIBE/SHOW) on top of the default read-only allow-list. Ignored whenallowed_typesis supplied. NoteSHOWonly parses to a metadata statement when adialectthat supports it is given. DefaultFalse.
- Returns:
List of parsed sqlglot Expression objects.
- Raises:
SQLSafetyError – If the SQL is empty, contains disallowed statement types, or has multiple statements when not permitted.
- Return type:
list[sqlglot.exp.Expr]