airflow.providers.apache.hdfs.hooks.webhdfs
¶
Hook for Web HDFS
Module Contents¶
-
exception
airflow.providers.apache.hdfs.hooks.webhdfs.
AirflowWebHDFSHookException
[source]¶ Bases:
airflow.exceptions.AirflowException
Exception specific for WebHDFS hook
-
class
airflow.providers.apache.hdfs.hooks.webhdfs.
WebHDFSHook
(webhdfs_conn_id: str = 'webhdfs_default', proxy_user: Optional[str] = None)[source]¶ Bases:
airflow.hooks.base.BaseHook
Interact with HDFS. This class is a wrapper around the hdfscli library.
- Parameters
-
get_conn
(self)[source]¶ Establishes a connection depending on the security mode set via config or environment variable. :return: a hdfscli InsecureClient or KerberosClient object. :rtype: hdfs.InsecureClient or hdfs.ext.kerberos.KerberosClient
-
check_for_path
(self, hdfs_path: str)[source]¶ Check for the existence of a path in HDFS by querying FileStatus.
-
load_file
(self, source: str, destination: str, overwrite: bool = True, parallelism: int = 1, **kwargs)[source]¶ Uploads a file to HDFS.
- Parameters
source (str) -- Local path to file or folder. If it's a folder, all the files inside of it will be uploaded. .. note:: This implies that folders empty of files will not be created remotely.
destination (str) -- PTarget HDFS path. If it already exists and is a directory, files will be uploaded inside.
overwrite (bool) -- Overwrite any existing file or directory.
parallelism (int) -- Number of threads to use for parallelization. A value of 0 (or negative) uses as many threads as there are files.
kwargs -- Keyword arguments forwarded to
hdfs.client.Client.upload()
.