airflow.providers.databricks.operators.databricks_sql

This module contains Databricks operators.

Module Contents

Classes

DatabricksSqlOperator

Executes SQL code in a Databricks SQL endpoint or a Databricks cluster

DatabricksCopyIntoOperator

Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster.

Attributes

COPY_INTO_APPROVED_FORMATS

class airflow.providers.databricks.operators.databricks_sql.DatabricksSqlOperator(*, sql, databricks_conn_id=DatabricksSqlHook.default_conn_name, http_path=None, sql_endpoint_name=None, parameters=None, session_configuration=None, do_xcom_push=False, output_path=None, output_format='csv', csv_params=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Executes SQL code in a Databricks SQL endpoint or a Databricks cluster

See also

For more information on how to use this operator, take a look at the guide: DatabricksSqlOperator

Parameters
  • databricks_conn_id (str) -- Reference to Databricks connection id

  • http_path (Optional[str]) -- Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection's extra parameters, or sql_endpoint_name must be specified.

  • sql_endpoint_name (Optional[str]) -- Optional name of Databricks SQL Endpoint. If not specified, http_path must be provided as described above.

  • sql (Union[str, List[str]]) -- the SQL code to be executed as a single string, or a list of str (sql statements), or a reference to a template file. (templated) Template references are recognized by str ending in '.sql'

  • parameters (Optional[Union[Mapping, Iterable]]) -- (optional) the parameters to render the SQL query with.

  • session_configuration -- An optional dictionary of Spark session parameters. Defaults to None. If not specified, it could be specified in the Databricks connection's extra parameters.

  • output_path (Optional[str]) -- optional string specifying the file to which write selected data. (templated)

  • output_format (str) -- format of output data if output_path` is specified. Possible values are ``csv, json, jsonl. Default is csv.

  • csv_params (Optional[Dict[str, Any]]) -- parameters that will be passed to the csv.DictWriter class used to write CSV data.

template_fields :Sequence[str] = ['sql', '_output_path'][source]
template_ext :Sequence[str] = ['.sql'][source]
template_fields_renderers[source]
execute(self, context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

airflow.providers.databricks.operators.databricks_sql.COPY_INTO_APPROVED_FORMATS = ['CSV', 'JSON', 'AVRO', 'ORC', 'PARQUET', 'TEXT', 'BINARYFILE'][source]
class airflow.providers.databricks.operators.databricks_sql.DatabricksCopyIntoOperator(*, table_name, file_location, file_format, databricks_conn_id=DatabricksSqlHook.default_conn_name, http_path=None, sql_endpoint_name=None, session_configuration=None, files=None, pattern=None, expression_list=None, credential=None, encryption=None, format_options=None, force_copy=None, copy_options=None, validate=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster. COPY INTO command is constructed from individual pieces, that are described in documentation.

See also

For more information on how to use this operator, take a look at the guide: DatabricksCopyIntoOperator

Parameters
  • table_name (str) -- Required name of the table. (templated)

  • file_location (str) -- Required location of files to import. (templated)

  • file_format (str) -- Required file format. Supported formats are CSV, JSON, AVRO, ORC, PARQUET, TEXT, BINARYFILE.

  • databricks_conn_id (str) -- Reference to Databricks connection id

  • http_path (Optional[str]) -- Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection's extra parameters, or sql_endpoint_name must be specified.

  • sql_endpoint_name (Optional[str]) -- Optional name of Databricks SQL Endpoint. If not specified, http_path must be provided as described above.

  • files (Optional[List[str]]) -- optional list of files to import. Can't be specified together with pattern. (templated)

  • pattern (Optional[str]) -- optional regex string to match file names to import. Can't be specified together with files.

  • expression_list (Optional[str]) -- optional string that will be used in the SELECT expression.

  • credential (Optional[Dict[str, str]]) -- optional credential configuration for authentication against a specified location.

  • encryption (Optional[Dict[str, str]]) -- optional encryption configuration for a specified location.

  • format_options (Optional[Dict[str, str]]) -- optional dictionary with options specific for a given file format.

  • force_copy (Optional[bool]) -- optional bool to control forcing of data import (could be also specified in copy_options).

  • validate (Optional[Union[bool, int]]) -- optional configuration for schema & data validation. True forces validation of all rows, integer number - validate only N first rows

  • copy_options (Optional[Dict[str, str]]) -- optional dictionary of copy options. Right now only force option is supported.

template_fields :Sequence[str] = ['_file_location', '_files', '_table_name'][source]
execute(self, context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?