Apache HDFS

apache-airflow-providers-apache-hdfs

Hadoop Distributed File System (HDFS) and WebHDFS.

Works with Airflow 2.11+
Install:
pip install apache-airflow-providers-apache-hdfs==4.11.3

Airflow

2.11+

Python

>=3.10

Dependencies (7)

Show all Hide apache-airflow>=2.11.0 apache-airflow-providers-common-compat>=1.12.0 hdfs[avro,dataframe,kerberos]>=2.5.4;python_version<"3.12" hdfs[avro,dataframe,kerberos]>=2.7.3;python_version>="3.12" fastavro>=1.10.0;python_version>="3.13" pandas>=2.1.2; python_version <"3.13" pandas>=2.2.3; python_version >="3.13"

Connections (1)

Modules

H

WebHDFSHook

Interact with HDFS. This class is a wrapper around the hdfscli library.

airflow.providers.apache.hdfs.hooks.webhdfs.WebHDFSHook
S

MultipleFilesWebHdfsSensor

Waits for multiple files in a folder to land in HDFS.

airflow.providers.apache.hdfs.sensors.web_hdfs.MultipleFilesWebHdfsSensor
S

WebHdfsSensor

Waits for a file or folder to land in HDFS.

airflow.providers.apache.hdfs.sensors.web_hdfs.WebHdfsSensor