DatabricksCopyIntoOperator

Use the DatabricksCopyIntoOperator to import data into Databricks table using COPY INTO command.

Using the Operator

Operator loads data from a specified location into a table using a configured endpoint.

Parameter

Input

table_name: str

Required name of the table.

file_location: str

Required location of files to import.

file_format: str

Required file format. Supported formats are CSV, JSON, AVRO, ORC, PARQUET, TEXT, BINARYFILE.

sql_endpoint_name: str

Optional name of Databricks SQL endpoint to use. If not specified, http_path should be provided.

http_path: str

Optional HTTP path for Databricks SQL endpoint or Databricks cluster. If not specified, it should be provided in Databricks connection, or the sql_endpoint_name parameter must be set.

session_configuration: dict[str,str]

optional dict specifying Spark configuration parameters that will be set for the session.

files: Optional[List[str]]

optional list of files to import. Can't be specified together with pattern.

pattern: Optional[str]

optional regex string to match file names to import. Can't be specified together with files.

expression_list: Optional[str]

optional string that will be used in the SELECT expression.

credential: Optional[Dict[str, str]]

optional credential configuration for authentication against a specified location

encryption: Optional[Dict[str, str]]

optional encryption configuration for a specified location

format_options: Optional[Dict[str, str]]

optional dictionary with options specific for a given file format.

force_copy: Optional[bool]

optional bool to control forcing of data import (could be also specified in copy_options).

copy_options: Optional[Dict[str, str]]

optional dictionary of copy options. Right now only force option is supported.

validate: Optional[Union[bool, int]]

optional validation configuration. True forces validation of all rows, positive number - only N first rows. (requires Preview channel)

Examples

Importing CSV data

An example usage of the DatabricksCopyIntoOperator to import CSV data into a table is as follows:

airflow/providers/databricks/example_dags/example_databricks_sql.py[source]

    # Example of importing data using COPY_INTO SQL command
    import_csv = DatabricksCopyIntoOperator(
        task_id='import_csv',
        databricks_conn_id=connection_id,
        sql_endpoint_name=sql_endpoint_name,
        table_name="my_table",
        file_format="CSV",
        file_location="abfss://container@account.dfs.core.windows.net/my-data/csv",
        format_options={'header': 'true'},
        force_copy=True,
    )

Was this entry helpful?