airflow.hooks.webhdfs_hook
¶
Module Contents¶
-
exception
airflow.hooks.webhdfs_hook.
AirflowWebHDFSHookException
[source]¶ Bases:
airflow.exceptions.AirflowException
-
class
airflow.hooks.webhdfs_hook.
WebHDFSHook
(webhdfs_conn_id='webhdfs_default', proxy_user=None)[source]¶ Bases:
airflow.hooks.base_hook.BaseHook
Interact with HDFS. This class is a wrapper around the hdfscli library.
-
check_for_path
(self, hdfs_path)[source]¶ Check for the existence of a path in HDFS by querying FileStatus.
-
load_file
(self, source, destination, overwrite=True, parallelism=1, **kwargs)[source]¶ Uploads a file to HDFS
- Parameters
source (str) – Local path to file or folder. If a folder, all the files inside of it will be uploaded (note that this implies that folders empty of files will not be created remotely).
destination (str) – PTarget HDFS path. If it already exists and is a directory, files will be uploaded inside.
overwrite (bool) – Overwrite any existing file or directory.
parallelism (int) – Number of threads to use for parallelization. A value of 0 (or negative) uses as many threads as there are files.
**kwargs – Keyword arguments forwarded to
upload()
.
-