airflow.providers.databricks.operators.databricks_sql¶

This module contains Databricks operators.

Attributes¶

COPY_INTO_APPROVED_FORMATS

Classes¶

`DatabricksSqlOperator`	Executes SQL code in a Databricks SQL endpoint or a Databricks cluster.
`DatabricksCopyIntoOperator`	Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster.

Module Contents¶

class airflow.providers.databricks.operators.databricks_sql.DatabricksSqlOperator(*, databricks_conn_id=DatabricksSqlHook.default_conn_name, http_path=None, sql_endpoint_name=None, session_configuration=None, http_headers=None, catalog=None, schema=None, output_path=None, output_format='csv', csv_params=None, client_parameters=None, gcp_conn_id='google_cloud_default', gcs_impersonation_chain=None, **kwargs)[source]¶

Bases: airflow.providers.common.sql.operators.sql.SQLExecuteQueryOperator

Executes SQL code in a Databricks SQL endpoint or a Databricks cluster.

See also

For more information on how to use this operator, take a look at the guide: DatabricksSqlOperator

Parameters:

databricks_conn_id (str) – Reference to Databricks connection id (templated)
http_path (str | None) – Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection’s extra parameters, or sql_endpoint_name must be specified.
sql_endpoint_name (str | None) – Optional name of Databricks SQL Endpoint. If not specified, http_path must be provided as described above.
sql – the SQL code to be executed as a single string, or a list of str (sql statements), or a reference to a template file. (templated) Template references are recognized by str ending in ‘.sql’
parameters – (optional) the parameters to render the SQL query with.
session_configuration – An optional dictionary of Spark session parameters. Defaults to None. If not specified, it could be specified in the Databricks connection’s extra parameters.
client_parameters (dict[str, Any] | None) – Additional parameters internal to Databricks SQL Connector parameters
http_headers (list[tuple[str, str]] | None) – An optional list of (k, v) pairs that will be set as HTTP headers on every request. (templated)
catalog (str | None) – An optional initial catalog to use. Requires DBR version 9.0+ (templated)
schema (str | None) – An optional initial schema to use. Requires DBR version 9.0+ (templated)
output_path (str | None) – optional string specifying the file to which write selected data. (templated) Supports local file paths and GCS URIs (e.g., gs://bucket/path/file.parquet). When using GCS URIs, requires the apache-airflow-providers-google package.
output_format (str) – format of output data if output_path is specified. Possible values are csv, json, jsonl, parquet, avro. Default is csv.
csv_params (dict[str, Any] | None) – parameters that will be passed to the csv.DictWriter class used to write CSV data.
gcp_conn_id (str) – The connection ID to use for connecting to Google Cloud when using GCS output path. Default is google_cloud_default.
gcs_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials for GCS upload, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. (templated)

template_fields: collections.abc.Sequence[str][source]¶

template_ext: collections.abc.Sequence[str] = ('.sql',)[source]¶

template_fields_renderers: ClassVar[dict][source]¶

conn_id_field = 'databricks_conn_id'[source]¶

databricks_conn_id = 'databricks_default'[source]¶

http_path = None[source]¶

sql_endpoint_name = None[source]¶

session_configuration = None[source]¶

client_parameters[source]¶

hook_params[source]¶

http_headers = None[source]¶

catalog = None[source]¶

schema = None[source]¶

get_db_hook()[source]¶

Get the database hook for the connection.

Returns:: the database hook object.
Return type:: airflow.providers.databricks.hooks.databricks_sql.DatabricksSqlHook

airflow.providers.databricks.operators.databricks_sql.COPY_INTO_APPROVED_FORMATS = ['CSV', 'JSON', 'AVRO', 'ORC', 'PARQUET', 'TEXT', 'BINARYFILE'][source]¶

class airflow.providers.databricks.operators.databricks_sql.DatabricksCopyIntoOperator(*, table_name, file_location, file_format, databricks_conn_id=DatabricksSqlHook.default_conn_name, http_path=None, sql_endpoint_name=None, session_configuration=None, http_headers=None, client_parameters=None, catalog=None, schema=None, files=None, pattern=None, expression_list=None, credential=None, storage_credential=None, encryption=None, format_options=None, force_copy=None, copy_options=None, validate=None, **kwargs)[source]¶

Bases: airflow.providers.common.compat.sdk.BaseOperator

Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster.

COPY INTO command is constructed from individual pieces, that are described in documentation.