airflow.providers.amazon.aws.sensors.s3_key

Module Contents

class airflow.providers.amazon.aws.sensors.s3_key.S3KeySensor(*, bucket_key: str, bucket_name: Optional[str] = None, wildcard_match: bool = False, aws_conn_id: str = 'aws_default', verify: Optional[Union[str, bool]] = None, **kwargs)[source]

Bases: airflow.sensors.base.BaseSensorOperator

Waits for a key (a file-like instance on S3) to be present in a S3 bucket. S3 being a key/value it does not support folders. The path is just a key a resource.

Parameters
  • bucket_key (str) – The key being waited on. Supports full s3:// style url or relative path from root level. When it’s specified as a full s3:// url, please leave bucket_name as None.

  • bucket_name (str) – Name of the S3 bucket. Only needed when bucket_key is not provided as a full s3:// url.

  • wildcard_match (bool) – whether the bucket_key should be interpreted as a Unix wildcard pattern

  • aws_conn_id (str) – a reference to the s3 connection

  • verify (bool or str) –

    Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values:

    • False: do not validate SSL certificates. SSL will still be used

      (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.

      You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

template_fields = ['bucket_key', 'bucket_name'][source]
poke(self, context)[source]
get_hook(self)[source]

Create and return an S3Hook

class airflow.providers.amazon.aws.sensors.s3_key.S3KeySizeSensor(*, check_fn: Optional[Callable[, bool]] = None, **kwargs)[source]

Bases: airflow.providers.amazon.aws.sensors.s3_key.S3KeySensor

Waits for a key (a file-like instance on S3) to be present and be more than some size in a S3 bucket. S3 being a key/value it does not support folders. The path is just a key a resource.

Parameters
  • bucket_key (str) – The key being waited on. Supports full s3:// style url or relative path from root level. When it’s specified as a full s3:// url, please leave bucket_name as None.

  • bucket_name (str) – Name of the S3 bucket. Only needed when bucket_key is not provided as a full s3:// url.

  • wildcard_match (bool) – whether the bucket_key should be interpreted as a Unix wildcard pattern

  • aws_conn_id (str) – a reference to the s3 connection

  • verify (bool or str) –

    Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values:

    • False: do not validate SSL certificates. SSL will still be used

      (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.

      You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • check_fn (Optional[Callable[.., bool]]) –

    Function that receives the list of the S3 objects, and returns the boolean: - True: a certain criteria is met - False: the criteria isn’t met Example: Wait for any S3 object size more than 1 megabyte

    def check_fn(self, data: List) -> bool:
        return any(f.get('Size', 0) > 1048576 for f in data if isinstance(f, dict))
    

poke(self, context)[source]
get_files(self, s3_hook: S3Hook, delimiter: Optional[str] = '/')[source]

Gets a list of files in the bucket

check_fn(self, data: List, object_min_size: Optional[Union[int, float]] = 0)[source]

Default function for checking that S3 Objects have size more than 0

Parameters
  • data (list) – List of the objects in S3 bucket.

  • object_min_size (int) – Checks if the objects sizes are greater then this value.

Was this entry helpful?