airflow.providers.apache.hdfs.hooks.webhdfs¶
Hook for Web HDFS
Module Contents¶
Classes¶
| Interact with HDFS. This class is a wrapper around the hdfscli library. | 
Attributes¶
- exception airflow.providers.apache.hdfs.hooks.webhdfs.AirflowWebHDFSHookException[source]¶
- Bases: - airflow.exceptions.AirflowException- Exception specific for WebHDFS hook 
- class airflow.providers.apache.hdfs.hooks.webhdfs.WebHDFSHook(webhdfs_conn_id='webhdfs_default', proxy_user=None)[source]¶
- Bases: - airflow.hooks.base.BaseHook- Interact with HDFS. This class is a wrapper around the hdfscli library. - Parameters
 - get_conn(self)[source]¶
- Establishes a connection depending on the security mode set via config or environment variable. :return: a hdfscli InsecureClient or KerberosClient object. :rtype: hdfs.InsecureClient or hdfs.ext.kerberos.KerberosClient 
 - check_for_path(self, hdfs_path)[source]¶
- Check for the existence of a path in HDFS by querying FileStatus. 
 - load_file(self, source, destination, overwrite=True, parallelism=1, **kwargs)[source]¶
- Uploads a file to HDFS. - Parameters
- source (str) -- Local path to file or folder. If it's a folder, all the files inside of it will be uploaded. .. note:: This implies that folders empty of files will not be created remotely. 
- destination (str) -- PTarget HDFS path. If it already exists and is a directory, files will be uploaded inside. 
- overwrite (bool) -- Overwrite any existing file or directory. 
- parallelism (int) -- Number of threads to use for parallelization. A value of 0 (or negative) uses as many threads as there are files. 
- kwargs (Any) -- Keyword arguments forwarded to - hdfs.client.Client.upload().