airflow.providers.amazon.aws.sensors.s3¶
Module Contents¶
Classes¶
| Waits for one or multiple keys (a file-like instance on S3) to be present in a S3 bucket. | |
| Return True if inactivity_period has passed with no increase in the number of objects matching prefix. | 
- class airflow.providers.amazon.aws.sensors.s3.S3KeySensor(*, bucket_key, bucket_name=None, wildcard_match=False, check_fn=None, aws_conn_id='aws_default', verify=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), use_regex=False, metadata_keys=None, **kwargs)[source]¶
- Bases: - airflow.sensors.base.BaseSensorOperator- Waits for one or multiple keys (a file-like instance on S3) to be present in a S3 bucket. - The path is just a key/value pointer to a resource for the given S3 path. Note: S3 does not support folders directly, and only provides key/value pairs. - See also - For more information on how to use this sensor, take a look at the guide: Wait on an Amazon S3 key - Parameters
- bucket_key (str | list[str]) – The key(s) being waited on. Supports full s3:// style url or relative path from root level. When it’s specified as a full s3:// url, please leave bucket_name as None 
- bucket_name (str | None) – Name of the S3 bucket. Only needed when - bucket_keyis not provided as a full- s3://url. When specified, all the keys passed to- bucket_keyrefers to this bucket
- wildcard_match (bool) – whether the bucket_key should be interpreted as a Unix wildcard pattern 
- check_fn (Callable[Ellipsis, bool] | None) – - Function that receives the list of the S3 objects with the context values, and returns a boolean: - - True: the criteria is met -- False: the criteria isn’t met Example: Wait for any S3 object size more than 1 megabyte- def check_fn(files: List, **kwargs) -> bool: return any(f.get('Size', 0) > 1048576 for f in files) 
- aws_conn_id (str | None) – a reference to the s3 connection 
- Whether to verify SSL certificates for S3 connection. By default, SSL certificates are verified. You can provide the following values: - False: do not validate SSL certificates. SSL will still be used
- (unless use_ssl is False), but SSL certificates will not be verified. 
 
- path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.
- You can specify this argument if you want to use a different CA cert bundle than the one used by botocore. 
 
 
- deferrable (bool) – Run operator in the deferrable mode 
- use_regex (bool) – whether to use regex to check bucket 
- metadata_keys (list[str] | None) – List of head_object attributes to gather and send to - check_fn. Acceptable values: Any top level attribute returned by s3.head_object. Specify * to return all available attributes. Default value: “Size”. If the requested attribute is not found, the key is still included and the value is None.
 
 
- class airflow.providers.amazon.aws.sensors.s3.S3KeysUnchangedSensor(*, bucket_name, prefix, aws_conn_id='aws_default', verify=None, inactivity_period=60 * 60, min_objects=1, previous_objects=None, allow_delete=True, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
- Bases: - airflow.sensors.base.BaseSensorOperator- Return True if inactivity_period has passed with no increase in the number of objects matching prefix. - Note, this sensor will not behave correctly in reschedule mode, as the state of the listed objects in the S3 bucket will be lost between rescheduled invocations. - See also - For more information on how to use this sensor, take a look at the guide: Wait on Amazon S3 prefix changes - Parameters
- bucket_name (str) – Name of the S3 bucket 
- prefix (str) – The prefix being waited on. Relative path from bucket root level. 
- aws_conn_id (str | None) – a reference to the s3 connection 
- Whether or not to verify SSL certificates for S3 connection. By default SSL certificates are verified. You can provide the following values: - False: do not validate SSL certificates. SSL will still be used
- (unless use_ssl is False), but SSL certificates will not be verified. 
 
- path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.
- You can specify this argument if you want to use a different CA cert bundle than the one used by botocore. 
 
 
- inactivity_period (float) – The total seconds of inactivity to designate keys unchanged. Note, this mechanism is not real time and this operator may not return until a poke_interval after this period has passed with no additional objects sensed. 
- min_objects (int) – The minimum number of objects needed for keys unchanged sensor to be considered valid. 
- previous_objects (set[str] | None) – The set of object ids found during the last poke. 
- allow_delete (bool) – Should this sensor consider objects being deleted between pokes valid behavior. If true a warning message will be logged when this happens. If false an error will be raised. 
- deferrable (bool) – Run sensor in the deferrable mode 
 
 - is_keys_unchanged(current_objects)[source]¶
- Check for new objects after the inactivity_period and update the sensor state accordingly.