`airflow.providers.common.sql.hooks.sql`¶

Module Contents¶

Classes¶

`ConnectorProtocol`	A protocol where you can connect to a database.
`DbApiHook`	Abstract base class for sql hooks.

Functions¶

`return_single_query_results`(sql, return_last, ...)	Determines when results of single query only should be returned.
`fetch_all_handler`(cursor)	Handler for DbApiHook.run() to return results
`fetch_one_handler`(cursor)	Handler for DbApiHook.run() to return results

Attributes¶

BaseForDbApiHook

airflow.providers.common.sql.hooks.sql.return_single_query_results(sql, return_last, split_statements)[source]¶

Determines when results of single query only should be returned.

For compatibility reasons, the behaviour of the DBAPIHook is somewhat confusing. In cases, when multiple queries are run, the return values will be an iterable (list) of results - one for each query. However, in certain cases, when single query is run - the results will be just the results of that single query without wrapping the results in a list.

The cases when single query results are returned without wrapping them in a list are when:

sql is string and last_statement is True (regardless what split_statement value is)
sql is string and split_statement is False

In all other cases, the results are wrapped in a list, even if there is only one statement to process:

always when sql is an iterable of string statements (regardless what last_statement value is)
when sql is string, split_statement is True and last_statement is False

Parameters

sql (str | Iterable[str]) – sql to run (either string or list of strings)
return_last (bool) – whether last statement output should only be returned
split_statements (bool) – whether to split string statements.

Returns

True if the hook should return single query results

airflow.providers.common.sql.hooks.sql.fetch_all_handler(cursor)[source]¶

Handler for DbApiHook.run() to return results

airflow.providers.common.sql.hooks.sql.fetch_one_handler(cursor)[source]¶

Handler for DbApiHook.run() to return results

class airflow.providers.common.sql.hooks.sql.ConnectorProtocol[source]¶

Bases: typing_extensions.Protocol

A protocol where you can connect to a database.

connect(host, port, username, schema)[source]¶

Connect to a database.

Parameters

host (str) – The database host to connect to.
port (int) – The database port to connect to.
username (str) – The database username used for the authentication.
schema (str) – The database schema to connect to.

Returns

the authorized connection object.

Return type

Any

airflow.providers.common.sql.hooks.sql.BaseForDbApiHook :type[airflow.hooks.base.BaseHook][source]¶

class airflow.providers.common.sql.hooks.sql.DbApiHook(*args, schema=None, log_sql=True, **kwargs)[source]¶

Bases: airflow.hooks.dbapi.DbApiHook

Abstract base class for sql hooks.

Parameters

schema (str | None) – Optional DB schema that overrides the schema specified in the connection. Make sure that if you change the schema parameter value in the constructor of the derived Hook, such change should be done before calling the DBApiHook.__init__().
log_sql (bool) – Whether to log SQL query when it’s executed. Defaults to True.

property last_description: Sequence[Sequence] | None[source]¶

conn_name_attr :str[source]¶

default_conn_name = default_conn_id[source]¶

supports_autocommit = False[source]¶

connector :ConnectorProtocol | None[source]¶

placeholder :str = %s[source]¶

get_conn()[source]¶: Returns a connection object

get_uri()[source]¶

Extract the URI from the connection.

Returns: the extracted uri.
Return type: str

get_sqlalchemy_engine(engine_kwargs=None)[source]¶

Get an sqlalchemy_engine object.

Parameters: engine_kwargs – Kwargs used in create_engine().
Returns: the created engine.

get_pandas_df(sql, parameters=None, **kwargs)[source]¶

Executes the sql and returns a pandas dataframe

Parameters

sql – the sql statement to be executed (str) or a list of sql statements to execute
parameters – The parameters to render the SQL query with.
kwargs – (optional) passed into pandas.io.sql.read_sql method

get_pandas_df_by_chunks(sql, parameters=None, *, chunksize, **kwargs)[source]¶

Executes the sql and returns a generator

Parameters

sql – the sql statement to be executed (str) or a list of sql statements to execute
parameters – The parameters to render the SQL query with
chunksize – number of rows to include in each chunk
kwargs – (optional) passed into pandas.io.sql.read_sql method

get_records(sql, parameters=None)[source]¶

Executes the sql and returns a set of records.

Parameters

sql (str | list[str]) – the sql statement to be executed (str) or a list of sql statements to execute
parameters (Iterable | Mapping | None) – The parameters to render the SQL query with.

get_first(sql, parameters=None)[source]¶

Executes the sql and returns the first resulting row.

Parameters

sql (str | list[str]) – the sql statement to be executed (str) or a list of sql statements to execute
parameters (Iterable | Mapping | None) – The parameters to render the SQL query with.

static strip_sql_string(sql)[source]¶

static split_sql_string(sql)[source]¶

Splits string into multiple SQL expressions

Parameters: sql (str) – SQL string potentially consisting of multiple expressions
Returns: list of individual expressions
Return type: list[str]

run(sql, autocommit=False, parameters=None, handler=None, split_statements=False, return_last=True)[source]¶

Runs a command or a list of commands. Pass a list of sql statements to the sql parameter to get them to execute sequentially.

The method will return either single query results (typically list of rows) or list of those results where each element in the list are results of one of the queries (typically list of list of rows :D)

For compatibility reasons, the behaviour of the DBAPIHook is somewhat confusing. In cases, when multiple queries are run, the return values will be an iterable (list) of results - one for each query. However, in certain cases, when single query is run - the results will be just the results of that query without wrapping the results in a list.

The cases when single query results are returned without wrapping them in a list are when:

sql is string and last_statement is True (regardless what split_statement value is)
sql is string and split_statement is False

In all other cases, the results are wrapped in a list, even if there is only one statement to process:

always when sql is an iterable of string statements (regardless what last_statement value is)
when sql is string, split_statement is True and last_statement is False

In any of those cases, however you can access the following properties of the Hook after running it:

descriptions - has an array of cursor descriptions - each statement executed contain the list of descriptions executed. If return_last is used, this is always a one-element array

last_description - description of the last statement executed

Note that return value from the hook will ONLY be actually returned when handler is provided. Setting the handler to None, results in this method returning None.

Handler is a way to process the rows from cursor (Iterator) into a value that is suitable to be returned to XCom and generally fit in memory. As an optimization, handler is usually not executed by the SQLExecuteQuery operator if do_xcom_push is not specified.

You can use pre-defined handles (fetch_all_handler`, ‘’fetch_one_handler``) or implement your own handler.

Parameters

sql (str | Iterable[str]) – the sql statement to be executed (str) or a list of sql statements to execute
autocommit (bool) – What to set the connection’s autocommit setting to before executing the query.
parameters (Iterable | Mapping | None) – The parameters to render the SQL query with.
handler (Callable | None) – The result handler which is called with the result of each statement.
split_statements (bool) – Whether to split a single SQL string into statements and run separately
return_last (bool) – Whether to return result for only last statement or for all after split

Returns

return only result of the ALL SQL expressions if handler was provided.

Return type

Any | list[Any] | None

set_autocommit(conn, autocommit)[source]¶: Sets the autocommit flag on the connection

get_autocommit(conn)[source]¶

Get autocommit setting for the provided connection. Return True if conn.autocommit is set to True. Return False if conn.autocommit is not set or set to False or conn does not support autocommit.

Parameters: conn – Connection to get autocommit setting from.
Returns: connection autocommit setting.
Return type: bool

get_cursor()[source]¶: Returns a cursor

insert_rows(table, rows, target_fields=None, commit_every=1000, replace=False, **kwargs)[source]¶

A generic way to insert a set of tuples into a table, a new transaction is created every commit_every rows

Parameters

table – Name of the target table
rows – The rows to insert into the table
target_fields – The names of the columns to fill in the table
commit_every – The maximum number of rows to insert in one transaction. Set to 0 to insert all rows in one transaction.
replace – Whether to replace instead of insert

abstract bulk_dump(table, tmp_file)[source]¶

Dumps a database table into a tab-delimited file

Parameters

table – The name of the source table
tmp_file – The path of the target file

abstract bulk_load(table, tmp_file)[source]¶

Loads a tab-delimited file into a database table

Parameters

table – The name of the target table
tmp_file – The path of the file to load into the table

test_connection()[source]¶: Tests the connection using db-specific query

airflow.providers.common.sql.hooks.sql¶

Module Contents¶

Classes¶

Functions¶

Attributes¶

`airflow.providers.common.sql.hooks.sql`¶