airflow.providers.google.cloud.triggers.gcs

Module Contents

Classes

GCSBlobTrigger

A trigger that fires and it finds the requested file or folder present in the given bucket.

GCSCheckBlobUpdateTimeTrigger

A trigger that makes an async call to GCS to check whether the object is updated in a bucket.

GCSPrefixBlobTrigger

Looks for objects in bucket matching a prefix.

GCSUploadSessionTrigger

Return Trigger Event if the inactivity period has passed with no increase in the number of objects.

class airflow.providers.google.cloud.triggers.gcs.GCSBlobTrigger(bucket, object_name, use_glob, poke_interval, google_cloud_conn_id, hook_params)[source]

Bases: airflow.triggers.base.BaseTrigger

A trigger that fires and it finds the requested file or folder present in the given bucket.

Parameters
  • bucket (str) – the bucket in the google cloud storage where the objects are residing.

  • object_name (str) – the file or folder present in the bucket

  • use_glob (bool) – if true object_name is interpreted as glob

  • google_cloud_conn_id (str) – reference to the Google Connection

  • poke_interval (float) – polling period in seconds to check for file/folder

  • hook_params (dict[str, Any]) – Extra config params to be passed to the underlying hook. Should match the desired hook constructor params.

serialize()[source]

Serializes GCSBlobTrigger arguments and classpath.

async run()[source]

Loop until the relevant file/folder is found.

class airflow.providers.google.cloud.triggers.gcs.GCSCheckBlobUpdateTimeTrigger(bucket, object_name, target_date, poke_interval, google_cloud_conn_id, hook_params)[source]

Bases: airflow.triggers.base.BaseTrigger

A trigger that makes an async call to GCS to check whether the object is updated in a bucket.

Parameters
  • bucket (str) – google cloud storage bucket name cloud storage where the objects are residing.

  • object_name (str) – the file or folder present in the bucket

  • target_date (datetime.datetime) – context datetime to compare with blob object updated time

  • poke_interval (float) – polling period in seconds to check for file/folder

  • google_cloud_conn_id (str) – reference to the Google Connection

  • hook_params (dict[str, Any]) – dict object that has delegate_to and impersonation_chain

serialize()[source]

Serializes GCSCheckBlobUpdateTimeTrigger arguments and classpath.

async run()[source]

Loop until the object updated time is greater than target datetime.

class airflow.providers.google.cloud.triggers.gcs.GCSPrefixBlobTrigger(bucket, prefix, poke_interval, google_cloud_conn_id, hook_params)[source]

Bases: GCSBlobTrigger

Looks for objects in bucket matching a prefix.

If none found, sleep for interval and check again. Otherwise, return matches.

Parameters
  • bucket (str) – the bucket in the google cloud storage where the objects are residing.

  • prefix (str) – The prefix of the blob_names to match in the Google cloud storage bucket

  • google_cloud_conn_id (str) – reference to the Google Connection

  • poke_interval (float) – polling period in seconds to check

  • hook_params (dict[str, Any]) – Extra config params to be passed to the underlying hook. Should match the desired hook constructor params.

serialize()[source]

Serializes GCSPrefixBlobTrigger arguments and classpath.

async run()[source]

Loop until the matches are found for the given prefix on the bucket.

class airflow.providers.google.cloud.triggers.gcs.GCSUploadSessionTrigger(bucket, prefix, poke_interval, google_cloud_conn_id, hook_params, inactivity_period=60 * 60, min_objects=1, previous_objects=None, allow_delete=True)[source]

Bases: GCSPrefixBlobTrigger

Return Trigger Event if the inactivity period has passed with no increase in the number of objects.

Parameters
  • bucket (str) – The Google Cloud Storage bucket where the objects are expected.

  • prefix (str) – The name of the prefix to check in the Google cloud storage bucket.

  • poke_interval (float) – polling period in seconds to check

  • inactivity_period (float) – The total seconds of inactivity to designate an upload session is over. Note, this mechanism is not real time and this operator may not return until a interval after this period has passed with no additional objects sensed.

  • min_objects (int) – The minimum number of objects needed for upload session to be considered valid.

  • previous_objects (set[str] | None) – The set of object ids found during the last poke.

  • allow_delete (bool) – Should this sensor consider objects being deleted between intervals valid behavior. If true a warning message will be logged when this happens. If false an error will be raised.

  • google_cloud_conn_id (str) – The connection ID to use when connecting to Google Cloud Storage.

serialize()[source]

Serializes GCSUploadSessionTrigger arguments and classpath.

async run()[source]

Loop until no new files or deleted files in list blob for the inactivity_period.

Was this entry helpful?