airflow.providers.apache.hdfs.hooks.webhdfs
¶
Hook for Web HDFS.
Module Contents¶
Classes¶
Interact with HDFS. This class is a wrapper around the hdfscli library. |
Attributes¶
- exception airflow.providers.apache.hdfs.hooks.webhdfs.AirflowWebHDFSHookException[source]¶
Bases:
airflow.exceptions.AirflowException
Exception specific for WebHDFS hook.
- class airflow.providers.apache.hdfs.hooks.webhdfs.WebHDFSHook(webhdfs_conn_id=default_conn_name, proxy_user=None)[source]¶
Bases:
airflow.hooks.base.BaseHook
Interact with HDFS. This class is a wrapper around the hdfscli library.
- Parameters
- get_conn()[source]¶
Establish a connection depending on the security mode set via config or environment variable.
- Returns
a hdfscli InsecureClient or KerberosClient object.
- Return type
Any
- check_for_path(hdfs_path)[source]¶
Check for the existence of a path in HDFS by querying FileStatus.
- load_file(source, destination, overwrite=True, parallelism=1, **kwargs)[source]¶
Upload a file to HDFS.
- Parameters
source (str) – Local path to file or folder. If it’s a folder, all the files inside it will be uploaded. .. note:: This implies that folders empty of files will not be created remotely.
destination (str) – PTarget HDFS path. If it already exists and is a directory, files will be uploaded inside.
overwrite (bool) – Overwrite any existing file or directory.
parallelism (int) – Number of threads to use for parallelization. A value of 0 (or negative) uses as many threads as there are files.
kwargs (Any) – Keyword arguments forwarded to
hdfs.client.Client.upload()
.