airflow.providers.databricks.operators.databricks_sql¶
This module contains Databricks operators.
Module Contents¶
Classes¶
| Executes SQL code in a Databricks SQL endpoint or a Databricks cluster | |
| Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster. | 
Attributes¶
- class airflow.providers.databricks.operators.databricks_sql.DatabricksSqlOperator(*, databricks_conn_id=DatabricksSqlHook.default_conn_name, http_path=None, sql_endpoint_name=None, session_configuration=None, http_headers=None, catalog=None, schema=None, output_path=None, output_format='csv', csv_params=None, client_parameters=None, **kwargs)[source]¶
- Bases: - airflow.providers.common.sql.operators.sql.SQLExecuteQueryOperator- Executes SQL code in a Databricks SQL endpoint or a Databricks cluster - See also - For more information on how to use this operator, take a look at the guide: DatabricksSqlOperator - Parameters
- databricks_conn_id (str) – Reference to Databricks connection id (templated) 
- http_path (str | None) – Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection’s extra parameters, or - sql_endpoint_namemust be specified.
- sql_endpoint_name (str | None) – Optional name of Databricks SQL Endpoint. If not specified, - http_pathmust be provided as described above.
- sql – the SQL code to be executed as a single string, or a list of str (sql statements), or a reference to a template file. (templated) Template references are recognized by str ending in ‘.sql’ 
- parameters – (optional) the parameters to render the SQL query with. 
- session_configuration – An optional dictionary of Spark session parameters. Defaults to None. If not specified, it could be specified in the Databricks connection’s extra parameters. 
- client_parameters (dict[str, Any] | None) – Additional parameters internal to Databricks SQL Connector parameters 
- http_headers (list[tuple[str, str]] | None) – An optional list of (k, v) pairs that will be set as HTTP headers on every request. (templated) 
- catalog (str | None) – An optional initial catalog to use. Requires DBR version 9.0+ (templated) 
- schema (str | None) – An optional initial schema to use. Requires DBR version 9.0+ (templated) 
- output_path (str | None) – optional string specifying the file to which write selected data. (templated) 
- output_format (str) – format of output data if - output_path` is specified. Possible values are ``csv,- json,- jsonl. Default is- csv.
- csv_params (dict[str, Any] | None) – parameters that will be passed to the - csv.DictWriterclass used to write CSV data.
 
 - template_fields: Sequence[str] = ('sql', '_output_path', 'schema', 'catalog', 'http_headers', 'databricks_conn_id')[source]¶
 - get_db_hook()[source]¶
- Get the database hook for the connection. - Returns
- the database hook object. 
- Return type
- airflow.providers.databricks.hooks.databricks_sql.DatabricksSqlHook 
 
 
- airflow.providers.databricks.operators.databricks_sql.COPY_INTO_APPROVED_FORMATS = ['CSV', 'JSON', 'AVRO', 'ORC', 'PARQUET', 'TEXT', 'BINARYFILE'][source]¶
- class airflow.providers.databricks.operators.databricks_sql.DatabricksCopyIntoOperator(*, table_name, file_location, file_format, databricks_conn_id=DatabricksSqlHook.default_conn_name, http_path=None, sql_endpoint_name=None, session_configuration=None, http_headers=None, client_parameters=None, catalog=None, schema=None, files=None, pattern=None, expression_list=None, credential=None, storage_credential=None, encryption=None, format_options=None, force_copy=None, copy_options=None, validate=None, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster. COPY INTO command is constructed from individual pieces, that are described in documentation. - See also - For more information on how to use this operator, take a look at the guide: DatabricksCopyIntoOperator - Parameters
- table_name (str) – Required name of the table. (templated) 
- file_location (str) – Required location of files to import. (templated) 
- file_format (str) – Required file format. Supported formats are - CSV,- JSON,- AVRO,- ORC,- PARQUET,- TEXT,- BINARYFILE.
- databricks_conn_id (str) – Reference to Databricks connection id (templated) 
- http_path (str | None) – Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection’s extra parameters, or - sql_endpoint_namemust be specified.
- sql_endpoint_name (str | None) – Optional name of Databricks SQL Endpoint. If not specified, - http_pathmust be provided as described above.
- session_configuration – An optional dictionary of Spark session parameters. Defaults to None. If not specified, it could be specified in the Databricks connection’s extra parameters. 
- http_headers (list[tuple[str, str]] | None) – An optional list of (k, v) pairs that will be set as HTTP headers on every request 
- catalog (str | None) – An optional initial catalog to use. Requires DBR version 9.0+ 
- schema (str | None) – An optional initial schema to use. Requires DBR version 9.0+ 
- client_parameters (dict[str, Any] | None) – Additional parameters internal to Databricks SQL Connector parameters 
- files (list[str] | None) – optional list of files to import. Can’t be specified together with - pattern. (templated)
- pattern (str | None) – optional regex string to match file names to import. Can’t be specified together with - files.
- expression_list (str | None) – optional string that will be used in the - SELECTexpression.
- credential (dict[str, str] | None) – optional credential configuration for authentication against a source location. 
- storage_credential (str | None) – optional Unity Catalog storage credential for destination. 
- encryption (dict[str, str] | None) – optional encryption configuration for a specified location. 
- format_options (dict[str, str] | None) – optional dictionary with options specific for a given file format. 
- force_copy (bool | None) – optional bool to control forcing of data import (could be also specified in - copy_options).
- validate (bool | int | None) – optional configuration for schema & data validation. - Trueforces validation of all rows, integer number - validate only N first rows
- copy_options (dict[str, str] | None) – optional dictionary of copy options. Right now only - forceoption is supported.
 
 
