airflow.providers.apache.hdfs.sensors.hdfs

Module Contents

airflow.providers.apache.hdfs.sensors.hdfs.log[source]
class airflow.providers.apache.hdfs.sensors.hdfs.HdfsSensor(*, filepath: str, hdfs_conn_id: str = 'hdfs_default', ignored_ext: Optional[List[str]] = None, ignore_copying: bool = True, file_size: Optional[int] = None, hook: Type[HDFSHook] = HDFSHook, **kwargs)[source]

Bases: airflow.sensors.base.BaseSensorOperator

Waits for a file or folder to land in HDFS

Parameters
  • filepath (str) -- The route to a stored file.

  • hdfs_conn_id (str) -- The Airflow connection used for HDFS credentials.

  • ignored_ext (Optional[List[str]]) -- This is the list of ignored extensions.

  • ignore_copying (Optional[bool]) -- Shall we ignore?

  • file_size (Optional[int]) -- This is the size of the file.

See also

For more information on how to use this operator, take a look at the guide: Waits for a file or folder to land in HDFS

template_fields = ['filepath'][source]
ui_color[source]
static filter_for_filesize(result: List[Dict[Any, Any]], size: Optional[int] = None)[source]

Will test the filepath result and test if its size is at least self.filesize

Parameters
  • result -- a list of dicts returned by Snakebite ls

  • size -- the file size in MB a file should be at least to trigger True

Returns

(bool) depending on the matching criteria

static filter_for_ignored_ext(result: List[Dict[Any, Any]], ignored_ext: List[str], ignore_copying: bool)[source]

Will filter if instructed to do so the result to remove matching criteria

Parameters
  • result (list[dict]) -- list of dicts returned by Snakebite ls

  • ignored_ext (list) -- list of ignored extensions

  • ignore_copying (bool) -- shall we ignore ?

Returns

list of dicts which were not removed

Return type

list[dict]

poke(self, context: Dict[Any, Any])[source]

Get a snakebite client connection and check for file.

class airflow.providers.apache.hdfs.sensors.hdfs.HdfsRegexSensor(regex: Pattern[str], *args, **kwargs)[source]

Bases: airflow.providers.apache.hdfs.sensors.hdfs.HdfsSensor

Waits for matching files by matching on regex

See also

For more information on how to use this operator, take a look at the guide: HdfsRegexSensor

poke(self, context: Dict[Any, Any])[source]

Poke matching files in a directory with self.regex

Returns

Bool depending on the search criteria

class airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor(be_empty: bool = False, *args, **kwargs)[source]

Bases: airflow.providers.apache.hdfs.sensors.hdfs.HdfsSensor

Waits for a non-empty directory

See also

For more information on how to use this operator, take a look at the guide: HdfsFolderSensor

poke(self, context: Dict[str, Any])[source]

Poke for a non empty directory

Returns

Bool depending on the search criteria

Was this entry helpful?