airflow.providers.apache.hdfs.sensors.hdfs

Module Contents

Classes

HdfsSensor

Waits for a file or folder to land in HDFS

HdfsRegexSensor

Waits for matching files by matching on regex

HdfsFolderSensor

Waits for a non-empty directory

Attributes

log

airflow.providers.apache.hdfs.sensors.hdfs.log[source]
class airflow.providers.apache.hdfs.sensors.hdfs.HdfsSensor(*, filepath, hdfs_conn_id='hdfs_default', ignored_ext=None, ignore_copying=True, file_size=None, hook=HDFSHook, **kwargs)[source]

Bases: airflow.sensors.base.BaseSensorOperator

Waits for a file or folder to land in HDFS

Parameters
  • filepath (str) – The route to a stored file.

  • hdfs_conn_id (str) – The Airflow connection used for HDFS credentials.

  • ignored_ext (list[str] | None) – This is the list of ignored extensions.

  • ignore_copying (bool) – Shall we ignore?

  • file_size (int | None) – This is the size of the file.

See also

For more information on how to use this operator, take a look at the guide: Waits for a file or folder to land in HDFS

template_fields: Sequence[str] = ('filepath',)[source]
ui_color[source]
static filter_for_filesize(result, size=None)[source]

Will test the filepath result and test if its size is at least self.filesize

Parameters
  • result (list[dict[Any, Any]]) – a list of dicts returned by Snakebite ls

  • size (int | None) – the file size in MB a file should be at least to trigger True

Returns

(bool) depending on the matching criteria

Return type

list[dict[Any, Any]]

static filter_for_ignored_ext(result, ignored_ext, ignore_copying)[source]

Will filter if instructed to do so the result to remove matching criteria

Parameters
  • result (list[dict[Any, Any]]) – list of dicts returned by Snakebite ls

  • ignored_ext (list[str]) – list of ignored extensions

  • ignore_copying (bool) – shall we ignore ?

Returns

list of dicts which were not removed

Return type

list[dict[Any, Any]]

poke(context)[source]

Get a snakebite client connection and check for file.

class airflow.providers.apache.hdfs.sensors.hdfs.HdfsRegexSensor(regex, *args, **kwargs)[source]

Bases: HdfsSensor

Waits for matching files by matching on regex

See also

For more information on how to use this operator, take a look at the guide: HdfsRegexSensor

poke(context)[source]

Poke matching files in a directory with self.regex

Returns

Bool depending on the search criteria

Return type

bool

class airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor(be_empty=False, *args, **kwargs)[source]

Bases: HdfsSensor

Waits for a non-empty directory

See also

For more information on how to use this operator, take a look at the guide: HdfsFolderSensor

poke(context)[source]

Poke for a non empty directory

Returns

Bool depending on the search criteria

Return type

bool

Was this entry helpful?