airflow.providers.common.ai.toolsets.sql¶
Curated SQL toolset wrapping DbApiHook for agentic database workflows.
Classes¶
Curated toolset that gives an LLM agent safe access to a SQL database. |
Module Contents¶
- class airflow.providers.common.ai.toolsets.sql.SQLToolset(db_conn_id, *, allowed_tables=None, schema=None, allow_writes=False, max_rows=50)[source]¶
Bases:
pydantic_ai.toolsets.abstract.AbstractToolset[Any]Curated toolset that gives an LLM agent safe access to a SQL database.
Provides four tools —
list_tables,get_schema,query, andcheck_query— inspired by LangChain’sSQLDatabaseToolkitpattern.Uses a
DbApiHookresolved lazily from the givendb_conn_id.When a tool fails, the database’s error message is returned to the agent as a retry (
pydantic_ai.ModelRetry) so the model can correct its SQL within the run instead of failing the task.pydantic-aibounds this by the tool’smax_retries, so an unrecoverable error – a bad connection or an auth failure – exhausts the retries and fails the task for Airflow to retry. The toolset does not inspect the error type or message.- Parameters:
db_conn_id (str) – Airflow connection ID for the database.
allowed_tables (list[str] | None) –
Restrict the agent to a fixed set of tables.
None(default) exposes every table inschema. Entries may be schema-qualified ("SCHEMA.TABLE") to span multiple schemas in one database – common on warehouses such as Snowflake.list_tablesintrospects each referenced schema and returns the matching tables fully qualified, andget_schemaroutes to the table’s own schema. Unqualified entries useschema. Matching is case-insensitive, since databases reflect identifiers in their own case.When set, the list is enforced on the
queryandcheck_querytools as well as on discovery: every table a query reaches – through subqueries, CTEs, JOINs, set operations,DESCRIBE, catalog views such asinformation_schema, or DML – must be on the list, resolved with its database/catalog, or the query is rejected before it runs. CTE references are excluded by lexical scope (a same-named CTE in another scope never hides a real table). Constructs the list cannot describe are rejected outright while it is active: table-valued functions (dblink),TABLE('name')row sources, theTABLE <name>shorthand,SHOW, dynamic SQL, and inline comments (where parser-vs-engine differences such as MySQL/*! ... */executable comments hide).Note
This is an application-level guardrail, enforced by parsing the SQL with sqlglot. It is strong defense-in-depth but not a substitute for database permissions: it cannot police data reached through a function whose argument is itself SQL or a path –
pg_read_file('...')(a file) orquery_to_xml('SELECT ... FROM other_table', ...)anddblinkin scalar position (a table, read through a string the parser cannot inspect) – and any query the engine parses differently from sqlglot is a residual gap. For a hard guarantee, also pointdb_conn_idat a least-privilege role whoseSELECTgrants are limited to the same tables.schema (str | None) – Default schema/namespace for table listing and introspection, used for unqualified
allowed_tablesentries and unqualifiedget_schemacalls. Schema-qualifiedallowed_tablesentries override it per table.allow_writes (bool) – Allow data-modifying SQL (INSERT, UPDATE, DELETE, etc.). Default
False— only SELECT-family statements are permitted.max_rows (int) – Maximum number of rows returned from the
querytool. Default50.
- property id: str[source]¶
An ID for the toolset that is unique among all toolsets registered with the same agent.
If you’re implementing a concrete implementation that users can instantiate more than once, you should let them optionally pass a custom ID to the constructor and return that here.
A toolset needs to have an ID in order to be used in a durable execution environment like Temporal, in which case the ID will be used to identify the toolset’s activities within the workflow.
- async call_tool(name, tool_args, ctx, tool)[source]¶
Call a tool with the given arguments.
- Args:
name: The name of the tool to call. tool_args: The arguments to pass to the tool. ctx: The run context. tool: The tool definition returned by [get_tools][pydantic_ai.toolsets.AbstractToolset.get_tools] that was called.