airflow.providers.apache.hdfs.hooks.webhdfs

Hook for Web HDFS

Module Contents

airflow.providers.apache.hdfs.hooks.webhdfs.log[source]
airflow.providers.apache.hdfs.hooks.webhdfs._kerberos_security_mode[source]
exception airflow.providers.apache.hdfs.hooks.webhdfs.AirflowWebHDFSHookException[source]

Bases: airflow.exceptions.AirflowException

Exception specific for WebHDFS hook

class airflow.providers.apache.hdfs.hooks.webhdfs.WebHDFSHook(webhdfs_conn_id: str = 'webhdfs_default', proxy_user: Optional[str] = None)[source]

Bases: airflow.hooks.base.BaseHook

Interact with HDFS. This class is a wrapper around the hdfscli library.

Parameters
  • webhdfs_conn_id (str) -- The connection id for the webhdfs client to connect to.

  • proxy_user (str) -- The user used to authenticate.

get_conn(self)[source]

Establishes a connection depending on the security mode set via config or environment variable. :return: a hdfscli InsecureClient or KerberosClient object. :rtype: hdfs.InsecureClient or hdfs.ext.kerberos.KerberosClient

_find_valid_server(self)[source]
_get_client(self, connection: Connection)[source]
check_for_path(self, hdfs_path: str)[source]

Check for the existence of a path in HDFS by querying FileStatus.

Parameters

hdfs_path (str) -- The path to check.

Returns

True if the path exists and False if not.

Return type

bool

load_file(self, source: str, destination: str, overwrite: bool = True, parallelism: int = 1, **kwargs)[source]

Uploads a file to HDFS.

Parameters
  • source (str) -- Local path to file or folder. If it's a folder, all the files inside of it will be uploaded. .. note:: This implies that folders empty of files will not be created remotely.

  • destination (str) -- PTarget HDFS path. If it already exists and is a directory, files will be uploaded inside.

  • overwrite (bool) -- Overwrite any existing file or directory.

  • parallelism (int) -- Number of threads to use for parallelization. A value of 0 (or negative) uses as many threads as there are files.

  • kwargs -- Keyword arguments forwarded to hdfs.client.Client.upload().

Was this entry helpful?