DatabricksCopyIntoOperator¶

Use the DatabricksCopyIntoOperator to import data into Databricks table using COPY INTO command.

Using the Operator¶

Operator loads data from a specified location into a table using a configured endpoint.

Parameter	Input
table_name: str	Required name of the table.
file_location: str	Required location of files to import.
file_format: str	Required file format. Supported formats are `CSV`, `JSON`, `AVRO`, `ORC`, `PARQUET`, `TEXT`, `BINARYFILE`.
sql_endpoint_name: str	Optional name of Databricks SQL endpoint to use. If not specified, `http_path` should be provided.
http_path: str	Optional HTTP path for Databricks SQL endpoint or Databricks cluster. If not specified, it should be provided in Databricks connection, or the `sql_endpoint_name` parameter must be set.
session_configuration: dict[str,str]	optional dict specifying Spark configuration parameters that will be set for the session.
files: Optional[List[str]]	optional list of files to import. Can't be specified together with `pattern`.
pattern: Optional[str]	optional regex string to match file names to import. Can't be specified together with `files`.
expression_list: Optional[str]	optional string that will be used in the `SELECT` expression.
credential: Optional[Dict[str, str]]	optional credential configuration for authentication against a specified location
encryption: Optional[Dict[str, str]]	optional encryption configuration for a specified location
format_options: Optional[Dict[str, str]]	optional dictionary with options specific for a given file format.
force_copy: Optional[bool]	optional bool to control forcing of data import (could be also specified in `copy_options`).
copy_options: Optional[Dict[str, str]]	optional dictionary of copy options. Right now only `force` option is supported.
validate: Optional[Union[bool, int]]	optional validation configuration. `True` forces validation of all rows, positive number - only N first rows. (requires Preview channel)

Examples¶

Importing CSV data¶

An example usage of the DatabricksCopyIntoOperator to import CSV data into a table is as follows:

airflow/providers/databricks/example_dags/example_databricks_sql.py[source]

    # Example of importing data using COPY_INTO SQL command
    import_csv = DatabricksCopyIntoOperator(
        task_id='import_csv',
        databricks_conn_id=connection_id,
        sql_endpoint_name=sql_endpoint_name,
        table_name="my_table",
        file_format="CSV",
        file_location="abfss://container@account.dfs.core.windows.net/my-data/csv",
        format_options={'header': 'true'},
        force_copy=True,
    )