DatabricksCopyIntoOperator¶

Use the DatabricksCopyIntoOperator to import data into Databricks table using COPY INTO command.

Using the Operator¶

Operator loads data from a specified location into a table using a configured endpoint. The only required parameters are:

table_name - string with the table name
file_location - string with the URI of data to load
file_format - string specifying the file format of data to load. Supported formats are CSV, JSON, AVRO, ORC, PARQUET, TEXT, BINARYFILE.
One of sql_endpoint_name (name of Databricks SQL endpoint to use) or http_path (HTTP path for Databricks SQL endpoint or Databricks cluster).

Other parameters are optional and could be found in the class documentation.

Examples¶

Importing CSV data¶

An example usage of the DatabricksCopyIntoOperator to import CSV data into a table is as follows:

databricks/tests/system/databricks/example_databricks_sql.py[source]

    # Example of importing data using COPY_INTO SQL command
    import_csv = DatabricksCopyIntoOperator(
        task_id="import_csv",
        databricks_conn_id=connection_id,
        sql_endpoint_name=sql_endpoint_name,
        table_name="my_table",
        file_format="CSV",
        file_location="abfss://container@account.dfs.core.windows.net/my-data/csv",
        format_options={"header": "true"},
        force_copy=True,
    )