airflow.providers.microsoft.azure.transfers.local_to_adls

Module Contents

class airflow.providers.microsoft.azure.transfers.local_to_adls.LocalToAzureDataLakeStorageOperator(*, local_path: str, remote_path: str, overwrite: bool = True, nthreads: int = 64, buffersize: int = 4194304, blocksize: int = 4194304, extra_upload_options: Optional[Dict[str, Any]] = None, azure_data_lake_conn_id: str = 'azure_data_lake_default', **kwargs)[source]

Bases: airflow.models.BaseOperator

Upload file(s) to Azure Data Lake

See also

For more information on how to use this operator, take a look at the guide: LocalToAzureDataLakeStorageOperator

Parameters
  • local_path (str) -- local path. Can be single file, directory (in which case, upload recursively) or glob pattern. Recursive glob patterns using ** are not supported

  • remote_path (str) -- Remote path to upload to; if multiple files, this is the directory root to write within

  • nthreads (int) -- Number of threads to use. If None, uses the number of cores.

  • overwrite (bool) -- Whether to forcibly overwrite existing files/directories. If False and remote path is a directory, will quit regardless if any files would be overwritten or not. If True, only matching filenames are actually overwritten

  • buffersize (int) -- int [2**22] Number of bytes for internal buffer. This block cannot be bigger than a chunk and cannot be smaller than a block

  • blocksize (int) -- int [2**22] Number of bytes for a block. Within each chunk, we write a smaller block for each API call. This block cannot be bigger than a chunk

  • extra_upload_options (dict) -- Extra upload options to add to the hook upload method

  • azure_data_lake_conn_id (str) -- Reference to the Azure Data Lake connection

template_fields = ['local_path', 'remote_path'][source]
ui_color = #e4f0e8[source]
execute(self, context: dict)[source]

Was this entry helpful?