airflow.providers.amazon.aws.hooks.s3¶
Interact with AWS S3, using the boto3 library.
Attributes¶
Classes¶
| Interact with Amazon Simple Storage Service (S3). | 
Functions¶
| 
 | Provide a bucket name taken from the connection if no bucket name has been passed to the function. | 
| Unify bucket name and key in case no bucket name and at least a key has been passed to the function. | 
Module Contents¶
- airflow.providers.amazon.aws.hooks.s3.provide_bucket_name(func)[source]¶
- Provide a bucket name taken from the connection if no bucket name has been passed to the function. 
- airflow.providers.amazon.aws.hooks.s3.unify_bucket_name_and_key(func)[source]¶
- Unify bucket name and key in case no bucket name and at least a key has been passed to the function. 
- class airflow.providers.amazon.aws.hooks.s3.S3Hook(aws_conn_id=AwsBaseHook.default_conn_name, transfer_config_args=None, extra_args=None, *args, **kwargs)[source]¶
- Bases: - airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook- Interact with Amazon Simple Storage Service (S3). - Provide thick wrapper around - boto3.client("s3")and- boto3.resource("s3").- Parameters:
 - See also - For allowed upload extra arguments see - boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS.
- For allowed download extra arguments see - boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS.
 - Additional arguments (such as - aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.- static parse_s3_url(s3url)[source]¶
- Parse the S3 Url into a bucket name and key. - See https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html for valid url formats. 
 - static get_s3_bucket_key(bucket, key, bucket_param_name, key_param_name)[source]¶
- Get the S3 bucket name and key. - From either: - bucket name and key. Return the info as it is after checking key is a relative path. - key. Must be a full s3:// url. 
 - get_bucket(bucket_name=None)[source]¶
- Return a - S3.Bucketobject.- See also - Parameters:
- bucket_name (str | None) – the name of the bucket 
- Returns:
- the bucket object to the bucket name. 
- Return type:
- mypy_boto3_s3.service_resource.Bucket 
 
 - check_for_prefix(prefix, delimiter, bucket_name=None)[source]¶
- Check that a prefix exists in a bucket. 
 - list_prefixes(bucket_name=None, prefix=None, delimiter=None, page_size=None, max_items=None)[source]¶
- List prefixes in a bucket under prefix. - See also - Parameters:
- Returns:
- a list of matched prefixes 
- Return type:
 
 - async list_prefixes_async(client, bucket_name=None, prefix=None, delimiter=None, page_size=None, max_items=None)[source]¶
- List prefixes in a bucket under prefix. - Parameters:
- Returns:
- a list of matched prefixes 
- Return type:
- list[Any] 
 
 - async get_file_metadata_async(client, bucket_name, key=None)[source]¶
- Get a list of files that a key matching a wildcard expression exists in a bucket asynchronously. 
 - async check_key_async(client, bucket, bucket_keys, wildcard_match, use_regex=False)[source]¶
- Get a list of files that a key matching a wildcard expression or get the head object. - If wildcard_match is True get list of files that a key matching a wildcard expression exists in a bucket asynchronously and return the boolean value. If wildcard_match is False get the head object from the bucket and return the boolean value. 
 - async check_for_prefix_async(client, prefix, delimiter, bucket_name=None)[source]¶
- Check that a prefix exists in a bucket. 
 - async get_files_async(client, bucket, bucket_keys, wildcard_match, delimiter='/')[source]¶
- Get a list of files in the bucket. 
 - async is_keys_unchanged_async(client, bucket_name, prefix, inactivity_period=60 * 60, min_objects=1, previous_objects=None, inactivity_seconds=0, allow_delete=True, last_activity_time=None)[source]¶
- Check if new objects have been uploaded and the period has passed; update sensor state accordingly. - Parameters:
- client (aiobotocore.client.AioBaseClient) – aiobotocore client 
- bucket_name (str) – the name of the bucket 
- prefix (str) – a key prefix 
- inactivity_period (float) – the total seconds of inactivity to designate keys unchanged. Note, this mechanism is not real time and this operator may not return until a poke_interval after this period has passed with no additional objects sensed. 
- min_objects (int) – the minimum number of objects needed for keys unchanged sensor to be considered valid. 
- previous_objects (set[str] | None) – the set of object ids found during the last poke. 
- inactivity_seconds (int) – number of inactive seconds 
- allow_delete (bool) – Should this sensor consider objects being deleted between pokes valid behavior. If true a warning message will be logged when this happens. If false an error will be raised. 
- last_activity_time (datetime.datetime | None) – last activity datetime. 
 
 
 - list_keys(bucket_name=None, prefix=None, delimiter=None, page_size=None, max_items=None, start_after_key=None, from_datetime=None, to_datetime=None, object_filter=None, apply_wildcard=False)[source]¶
- List keys in a bucket under prefix and not containing delimiter. - See also - Parameters:
- bucket_name (str | None) – the name of the bucket 
- prefix (str | None) – a key prefix 
- delimiter (str | None) – the delimiter marks key hierarchy. 
- page_size (int | None) – pagination size 
- max_items (int | None) – maximum items to return 
- start_after_key (str | None) – should return only keys greater than this key 
- from_datetime (datetime.datetime | None) – should return only keys with LastModified attr greater than this equal from_datetime 
- to_datetime (datetime.datetime | None) – should return only keys with LastModified attr less than this to_datetime 
- object_filter (Callable[Ellipsis, list] | None) – Function that receives the list of the S3 objects, from_datetime and to_datetime and returns the List of matched key. 
- apply_wildcard (bool) – whether to treat ‘*’ as a wildcard or a plain symbol in the prefix. 
 
 - Example: Returns the list of S3 object with LastModified attr greater than from_datetime
- and less than to_datetime: 
 - def object_filter( keys: list, from_datetime: datetime | None = None, to_datetime: datetime | None = None, ) -> list: def _is_in_period(input_date: datetime) -> bool: if from_datetime is not None and input_date < from_datetime: return False if to_datetime is not None and input_date > to_datetime: return False return True return [k["Key"] for k in keys if _is_in_period(k["LastModified"])] - Returns:
- a list of matched keys 
- Return type:
 
 - get_file_metadata(prefix, bucket_name=None, page_size=None, max_items=None)[source]¶
- List metadata objects in a bucket under prefix. - See also 
 - select_key(key, bucket_name=None, expression=None, expression_type=None, input_serialization=None, output_serialization=None)[source]¶
- Read a key with S3 Select. - See also - Parameters:
- key (str) – S3 key that will point to the file 
- bucket_name (str | None) – Name of the bucket in which the file is stored 
- expression (str | None) – S3 Select expression 
- expression_type (str | None) – S3 Select expression type 
- input_serialization (dict[str, Any] | None) – S3 Select input data serialization format 
- output_serialization (dict[str, Any] | None) – S3 Select output data serialization format 
 
- Returns:
- retrieved subset of original data by S3 Select 
- Return type:
 
 - check_for_wildcard_key(wildcard_key, bucket_name=None, delimiter='')[source]¶
- Check that a key matching a wildcard expression exists in a bucket. 
 - get_wildcard_key(wildcard_key, bucket_name=None, delimiter='')[source]¶
- Return a boto3.s3.Object object matching the wildcard expression. 
 - load_file(filename, key, bucket_name=None, replace=False, encrypt=False, gzip=False, acl_policy=None)[source]¶
- Load a local file to S3. - See also - Parameters:
- filename (pathlib.Path | str) – path to the file to load. 
- key (str) – S3 key that will point to the file 
- bucket_name (str | None) – Name of the bucket in which to store the file 
- replace (bool) – A flag to decide whether or not to overwrite the key if it already exists. If replace is False and the key exists, an error will be raised. 
- encrypt (bool) – If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3. 
- gzip (bool) – If True, the file will be compressed locally 
- acl_policy (str | None) – String specifying the canned ACL policy for the file being uploaded to the S3 bucket. 
 
 
 - load_string(string_data, key, bucket_name=None, replace=False, encrypt=False, encoding=None, acl_policy=None, compression=None)[source]¶
- Load a string to S3. - This is provided as a convenience to drop a string in S3. It uses the boto infrastructure to ship a file to s3. - See also - Parameters:
- string_data (str) – str to set as content for the key. 
- key (str) – S3 key that will point to the file 
- bucket_name (str | None) – Name of the bucket in which to store the file 
- replace (bool) – A flag to decide whether or not to overwrite the key if it already exists 
- encrypt (bool) – If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3. 
- encoding (str | None) – The string to byte encoding 
- acl_policy (str | None) – The string to specify the canned ACL policy for the object to be uploaded 
- compression (str | None) – Type of compression to use, currently only gzip is supported. 
 
 
 - load_bytes(bytes_data, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None)[source]¶
- Load bytes to S3. - This is provided as a convenience to drop bytes data into S3. It uses the boto infrastructure to ship a file to s3. - See also - Parameters:
- bytes_data (bytes) – bytes to set as content for the key. 
- key (str) – S3 key that will point to the file 
- bucket_name (str | None) – Name of the bucket in which to store the file 
- replace (bool) – A flag to decide whether or not to overwrite the key if it already exists 
- encrypt (bool) – If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3. 
- acl_policy (str | None) – The string to specify the canned ACL policy for the object to be uploaded 
 
 
 - load_file_obj(file_obj, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None)[source]¶
- Load a file object to S3. - See also - Parameters:
- file_obj (io.BytesIO) – The file-like object to set as the content for the S3 key. 
- key (str) – S3 key that will point to the file 
- bucket_name (str | None) – Name of the bucket in which to store the file 
- replace (bool) – A flag that indicates whether to overwrite the key if it already exists. 
- encrypt (bool) – If True, S3 encrypts the file on the server, and the file is stored in encrypted form at rest in S3. 
- acl_policy (str | None) – The string to specify the canned ACL policy for the object to be uploaded 
 
 
 - copy_object(source_bucket_key, dest_bucket_key, source_bucket_name=None, dest_bucket_name=None, source_version_id=None, acl_policy=None, meta_data_directive=None, **kwargs)[source]¶
- Create a copy of an object that is already stored in S3. - See also - Note: the S3 connection used here needs to have access to both source and destination bucket/key. - Parameters:
- source_bucket_key (str) – - The key of the source object. - It can be either full s3:// style url or relative path from root level. - When it’s specified as a full s3:// url, please omit source_bucket_name. 
- dest_bucket_key (str) – - The key of the object to copy to. - The convention to specify dest_bucket_key is the same as source_bucket_key. 
- source_bucket_name (str | None) – - Name of the S3 bucket where the source object is in. - It should be omitted when source_bucket_key is provided as a full s3:// url. 
- dest_bucket_name (str | None) – - Name of the S3 bucket to where the object is copied. - It should be omitted when dest_bucket_key is provided as a full s3:// url. 
- source_version_id (str | None) – Version ID of the source object (OPTIONAL) 
- acl_policy (str | None) – The string to specify the canned ACL policy for the object to be copied which is private by default. 
- meta_data_directive (str | None) – Whether to COPY the metadata from the source object or REPLACE it with metadata that’s provided in the request. 
 
 
 - delete_bucket(bucket_name, force_delete=False, max_retries=5)[source]¶
- To delete s3 bucket, delete all s3 bucket objects and then delete the bucket. - See also - Parameters:
- bucket_name (str) – Bucket name 
- force_delete (bool) – Enable this to delete bucket even if not empty 
- max_retries (int) – A bucket must be empty to be deleted. If force_delete is true, then retries may help prevent a race condition between deleting objects in the bucket and trying to delete the bucket. 
 
- Returns:
- None 
- Return type:
- None 
 
 - download_file(key, bucket_name=None, local_path=None, preserve_file_name=False, use_autogenerated_subdir=True)[source]¶
- Download a file from the S3 location to the local file system. - Note:
- This function shadows the ‘download_file’ method of S3 API, but it is not the same. If you want to use the original method from S3 API, please use ‘S3Hook.get_conn().download_file()’ 
 - See also - Parameters:
- key (str) – The key path in S3. 
- bucket_name (str | None) – The specific bucket to use. 
- local_path (str | None) – The local path to the downloaded file. If no path is provided it will use the system’s temporary directory. 
- preserve_file_name (bool) – If you want the downloaded file name to be the same name as it is in S3, set this parameter to True. When set to False, a random filename will be generated. Default: False. 
- use_autogenerated_subdir (bool) – Pairs with ‘preserve_file_name = True’ to download the file into a random generated folder inside the ‘local_path’, useful to avoid collisions between various tasks that might download the same file name. Set it to ‘False’ if you don’t want it, and you want a predictable path. Default: True. 
 
- Returns:
- the file name. 
- Return type:
 
 - generate_presigned_url(client_method, params=None, expires_in=3600, http_method=None)[source]¶
- Generate a presigned url given a client, its method, and arguments. - See also - Parameters:
- client_method (str) – The client method to presign for. 
- params (dict | None) – The parameters normally passed to ClientMethod. 
- expires_in (int) – The number of seconds the presigned url is valid for. By default it expires in an hour (3600 seconds). 
- http_method (str | None) – The http method to use on the generated url. By default, the http method is whatever is used in the method’s model. 
 
- Returns:
- The presigned url. 
- Return type:
- str | None 
 
 - put_bucket_tagging(tag_set=None, key=None, value=None, bucket_name=None)[source]¶
- Overwrite the existing TagSet with provided tags; must provide a TagSet, a key/value pair, or both. - See also - Parameters:
- tag_set (dict[str, str] | list[dict[str, str]] | None) – A dictionary containing the key/value pairs for the tags, or a list already formatted for the API 
- key (str | None) – The Key for the new TagSet entry. 
- value (str | None) – The Value for the new TagSet entry. 
- bucket_name (str | None) – The name of the bucket. 
 
- Returns:
- None 
- Return type:
- None