DatabricksCopyIntoOperator¶
Use the DatabricksCopyIntoOperator
to import
data into Databricks table using COPY INTO
command.
Using the Operator¶
Operator loads data from a specified location into a table using a configured endpoint.
Parameter |
Input |
---|---|
table_name: str |
Required name of the table. |
file_location: str |
Required location of files to import. |
file_format: str |
Required file format. Supported formats are |
sql_endpoint_name: str |
Optional name of Databricks SQL endpoint to use. If not specified, |
http_path: str |
Optional HTTP path for Databricks SQL endpoint or Databricks cluster. If not specified, it should be provided in Databricks connection, or the |
session_configuration: dict[str,str] |
optional dict specifying Spark configuration parameters that will be set for the session. |
files: Optional[List[str]] |
optional list of files to import. Can't be specified together with |
pattern: Optional[str] |
optional regex string to match file names to import. Can't be specified together with |
expression_list: Optional[str] |
optional string that will be used in the |
credential: Optional[Dict[str, str]] |
optional credential configuration for authentication against a specified location |
encryption: Optional[Dict[str, str]] |
optional encryption configuration for a specified location |
format_options: Optional[Dict[str, str]] |
optional dictionary with options specific for a given file format. |
force_copy: Optional[bool] |
optional bool to control forcing of data import (could be also specified in |
copy_options: Optional[Dict[str, str]] |
optional dictionary of copy options. Right now only |
validate: Optional[Union[bool, int]] |
optional validation configuration. |
Examples¶
Importing CSV data¶
An example usage of the DatabricksCopyIntoOperator to import CSV data into a table is as follows:
# Example of importing data using COPY_INTO SQL command
import_csv = DatabricksCopyIntoOperator(
task_id='import_csv',
databricks_conn_id=connection_id,
sql_endpoint_name=sql_endpoint_name,
table_name="my_table",
file_format="CSV",
file_location="abfss://container@account.dfs.core.windows.net/my-data/csv",
format_options={'header': 'true'},
force_copy=True,
)