:mod:`airflow.contrib.hooks.azure_data_lake_hook` ================================================= .. py:module:: airflow.contrib.hooks.azure_data_lake_hook Module Contents --------------- .. py:class:: AzureDataLakeHook(azure_data_lake_conn_id='azure_data_lake_default') Bases: :class:`airflow.hooks.base_hook.BaseHook` Interacts with Azure Data Lake. Client ID and client secret should be in user and password parameters. Tenant and account name should be extra field as {"tenant": "", "account_name": "ACCOUNT_NAME"}. :param azure_data_lake_conn_id: Reference to the Azure Data Lake connection. :type azure_data_lake_conn_id: str .. method:: get_conn(self) Return a AzureDLFileSystem object. .. method:: check_for_file(self, file_path) Check if a file exists on Azure Data Lake. :param file_path: Path and name of the file. :type file_path: str :return: True if the file exists, False otherwise. :rtype: bool .. method:: upload_file(self, local_path, remote_path, nthreads=64, overwrite=True, buffersize=4194304, blocksize=4194304) Upload a file to Azure Data Lake. :param local_path: local path. Can be single file, directory (in which case, upload recursively) or glob pattern. Recursive glob patterns using `**` are not supported. :type local_path: str :param remote_path: Remote path to upload to; if multiple files, this is the directory root to write within. :type remote_path: str :param nthreads: Number of threads to use. If None, uses the number of cores. :type nthreads: int :param overwrite: Whether to forcibly overwrite existing files/directories. If False and remote path is a directory, will quit regardless if any files would be overwritten or not. If True, only matching filenames are actually overwritten. :type overwrite: bool :param buffersize: int [2**22] Number of bytes for internal buffer. This block cannot be bigger than a chunk and cannot be smaller than a block. :type buffersize: int :param blocksize: int [2**22] Number of bytes for a block. Within each chunk, we write a smaller block for each API call. This block cannot be bigger than a chunk. :type blocksize: int .. method:: download_file(self, local_path, remote_path, nthreads=64, overwrite=True, buffersize=4194304, blocksize=4194304) Download a file from Azure Blob Storage. :param local_path: local path. If downloading a single file, will write to this specific file, unless it is an existing directory, in which case a file is created within it. If downloading multiple files, this is the root directory to write within. Will create directories as required. :type local_path: str :param remote_path: remote path/globstring to use to find remote files. Recursive glob patterns using `**` are not supported. :type remote_path: str :param nthreads: Number of threads to use. If None, uses the number of cores. :type nthreads: int :param overwrite: Whether to forcibly overwrite existing files/directories. If False and remote path is a directory, will quit regardless if any files would be overwritten or not. If True, only matching filenames are actually overwritten. :type overwrite: bool :param buffersize: int [2**22] Number of bytes for internal buffer. This block cannot be bigger than a chunk and cannot be smaller than a block. :type buffersize: int :param blocksize: int [2**22] Number of bytes for a block. Within each chunk, we write a smaller block for each API call. This block cannot be bigger than a chunk. :type blocksize: int