Apache Hadoop HDFS Operators

Apache Hadoop HDFS is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS is now an Apache Hadoop sub project.

Prerequisite

To use operators, you must configure a HDFS Connection.

HdfsFolderSensor

Waits for a non-empty directory

The HdfsFolderSensor operator is used to check for a non-empty directory in HDFS.

Use the filepath parameter to poke until the provided file is found.

HdfsRegexSensor

Waits for matching files by matching on regex

The HdfsRegexSensor operator is used to check for matching files by matching on regex in HDFS.

Use the filepath parameter to mention the keyspace and table for the record. Use dot notation to target a specific keyspace.

Waits for a file or folder to land in HDFS

The HdfsSensor operator is used to check for a file or folder to land in HDFS.

Use the filepath parameter to poke until the provided file is found.

Reference

For further information, look at HDFS Architecture Guide.

Was this entry helpful?