airflow.providers.databricks.operators.databricks_sql
¶
This module contains Databricks operators.
Module Contents¶
Classes¶
Executes SQL code in a Databricks SQL endpoint or a Databricks cluster |
|
Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster. |
Attributes¶
- class airflow.providers.databricks.operators.databricks_sql.DatabricksSqlOperator(*, sql, databricks_conn_id=DatabricksSqlHook.default_conn_name, http_path=None, sql_endpoint_name=None, parameters=None, session_configuration=None, http_headers=None, catalog=None, schema=None, do_xcom_push=False, output_path=None, output_format='csv', csv_params=None, client_parameters=None, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Executes SQL code in a Databricks SQL endpoint or a Databricks cluster
See also
For more information on how to use this operator, take a look at the guide: DatabricksSqlOperator
- Parameters
databricks_conn_id (str) – Reference to Databricks connection id (templated)
http_path (Optional[str]) – Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection’s extra parameters, or
sql_endpoint_name
must be specified.sql_endpoint_name (Optional[str]) – Optional name of Databricks SQL Endpoint. If not specified,
http_path
must be provided as described above.sql (Union[str, Iterable[str]]) – the SQL code to be executed as a single string, or a list of str (sql statements), or a reference to a template file. (templated) Template references are recognized by str ending in ‘.sql’
parameters (Optional[Union[Iterable, Mapping]]) – (optional) the parameters to render the SQL query with.
session_configuration – An optional dictionary of Spark session parameters. Defaults to None. If not specified, it could be specified in the Databricks connection’s extra parameters.
client_parameters (Optional[Dict[str, Any]]) – Additional parameters internal to Databricks SQL Connector parameters
http_headers (Optional[List[Tuple[str, str]]]) – An optional list of (k, v) pairs that will be set as HTTP headers on every request. (templated)
catalog (Optional[str]) – An optional initial catalog to use. Requires DBR version 9.0+ (templated)
schema (Optional[str]) – An optional initial schema to use. Requires DBR version 9.0+ (templated)
output_path (Optional[str]) – optional string specifying the file to which write selected data. (templated)
output_format (str) – format of output data if
output_path` is specified. Possible values are ``csv
,json
,jsonl
. Default iscsv
.csv_params (Optional[Dict[str, Any]]) – parameters that will be passed to the
csv.DictWriter
class used to write CSV data.do_xcom_push (bool) – If True, then the result of SQL executed will be pushed to an XCom.
- airflow.providers.databricks.operators.databricks_sql.COPY_INTO_APPROVED_FORMATS = ['CSV', 'JSON', 'AVRO', 'ORC', 'PARQUET', 'TEXT', 'BINARYFILE'][source]¶
- class airflow.providers.databricks.operators.databricks_sql.DatabricksCopyIntoOperator(*, table_name, file_location, file_format, databricks_conn_id=DatabricksSqlHook.default_conn_name, http_path=None, sql_endpoint_name=None, session_configuration=None, http_headers=None, client_parameters=None, catalog=None, schema=None, files=None, pattern=None, expression_list=None, credential=None, storage_credential=None, encryption=None, format_options=None, force_copy=None, copy_options=None, validate=None, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster. COPY INTO command is constructed from individual pieces, that are described in documentation.
See also
For more information on how to use this operator, take a look at the guide: DatabricksCopyIntoOperator
- Parameters
table_name (str) – Required name of the table. (templated)
file_location (str) – Required location of files to import. (templated)
file_format (str) – Required file format. Supported formats are
CSV
,JSON
,AVRO
,ORC
,PARQUET
,TEXT
,BINARYFILE
.databricks_conn_id (str) – Reference to Databricks connection id (templated)
http_path (Optional[str]) – Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection’s extra parameters, or
sql_endpoint_name
must be specified.sql_endpoint_name (Optional[str]) – Optional name of Databricks SQL Endpoint. If not specified,
http_path
must be provided as described above.session_configuration – An optional dictionary of Spark session parameters. Defaults to None. If not specified, it could be specified in the Databricks connection’s extra parameters.
http_headers (Optional[List[Tuple[str, str]]]) – An optional list of (k, v) pairs that will be set as HTTP headers on every request
catalog (Optional[str]) – An optional initial catalog to use. Requires DBR version 9.0+
schema (Optional[str]) – An optional initial schema to use. Requires DBR version 9.0+
client_parameters (Optional[Dict[str, Any]]) – Additional parameters internal to Databricks SQL Connector parameters
files (Optional[List[str]]) – optional list of files to import. Can’t be specified together with
pattern
. (templated)pattern (Optional[str]) – optional regex string to match file names to import. Can’t be specified together with
files
.expression_list (Optional[str]) – optional string that will be used in the
SELECT
expression.credential (Optional[Dict[str, str]]) – optional credential configuration for authentication against a source location.
storage_credential (Optional[str]) – optional Unity Catalog storage credential for destination.
encryption (Optional[Dict[str, str]]) – optional encryption configuration for a specified location.
format_options (Optional[Dict[str, str]]) – optional dictionary with options specific for a given file format.
force_copy (Optional[bool]) – optional bool to control forcing of data import (could be also specified in
copy_options
).validate (Optional[Union[bool, int]]) – optional configuration for schema & data validation.
True
forces validation of all rows, integer number - validate only N first rowscopy_options (Optional[Dict[str, str]]) – optional dictionary of copy options. Right now only
force
option is supported.