:mod:`airflow.hooks.webhdfs_hook`
=================================

.. py:module:: airflow.hooks.webhdfs_hook







Module Contents
---------------






.. data:: _kerberos_security_mode
   

   









.. data:: log
   

   









.. py:exception:: AirflowWebHDFSHookException

   Bases::class:`airflow.exceptions.AirflowException`

   









.. py:class:: WebHDFSHook(webhdfs_conn_id='webhdfs_default', proxy_user=None)

   Bases::class:`airflow.hooks.base_hook.BaseHook`

   

   Interact with HDFS. This class is a wrapper around the hdfscli library.


   

   

   

   .. method:: get_conn(self)

      
      Returns a hdfscli InsecureClient object.

      



   

   .. method:: check_for_path(self, hdfs_path)

      
      Check for the existence of a path in HDFS by querying FileStatus.

      



   

   .. method:: load_file(self, source, destination, overwrite=True, parallelism=1, **kwargs)

      
      Uploads a file to HDFS

      :param source: Local path to file or folder. If a folder, all the files
        inside of it will be uploaded (note that this implies that folders empty
        of files will not be created remotely).
      :type source: str
      :param destination: PTarget HDFS path. If it already exists and is a
        directory, files will be uploaded inside.
      :type destination: str
      :param overwrite: Overwrite any existing file or directory.
      :type overwrite: bool
      :param parallelism: Number of threads to use for parallelization. A value of
        `0` (or negative) uses as many threads as there are files.
      :type parallelism: int
      :param \*\*kwargs: Keyword arguments forwarded to :meth:`upload`.