airflow.providers.amazon.aws.hooks.s3
¶
Interact with AWS S3, using the boto3 library.
Module Contents¶
Functions¶
|
Function decorator that provides a bucket name taken from the connection |
Function decorator that unifies bucket name and key taken from the key |
Attributes¶
- airflow.providers.amazon.aws.hooks.s3.provide_bucket_name(func)[source]¶
Function decorator that provides a bucket name taken from the connection in case no bucket name has been passed to the function.
- airflow.providers.amazon.aws.hooks.s3.unify_bucket_name_and_key(func)[source]¶
Function decorator that unifies bucket name and key taken from the key in case no bucket name and at least a key has been passed to the function.
- class airflow.providers.amazon.aws.hooks.s3.S3Hook(aws_conn_id=AwsBaseHook.default_conn_name, transfer_config_args=None, extra_args=None, *args, **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook
Interact with AWS S3, using the boto3 library.
- Parameters
See also
For allowed upload extra arguments see
boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS
.For allowed download extra arguments see
boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS
.
Additional arguments (such as
aws_conn_id
) may be specified and are passed down to the underlying AwsBaseHook.See also
- static parse_s3_url(s3url)[source]¶
- Parses the S3 Url into a bucket name and key.
See https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html for valid url formats
- static get_s3_bucket_key(bucket, key, bucket_param_name, key_param_name)[source]¶
- Get the S3 bucket name and key from either:
bucket name and key. Return the info as it is after checking key is a relative path
key. Must be a full s3:// url
- check_for_prefix(prefix, delimiter, bucket_name=None)[source]¶
Checks that a prefix exists in a bucket
- list_prefixes(bucket_name=None, prefix=None, delimiter=None, page_size=None, max_items=None)[source]¶
Lists prefixes in a bucket under prefix
- list_keys(bucket_name=None, prefix=None, delimiter=None, page_size=None, max_items=None, start_after_key=None, from_datetime=None, to_datetime=None, object_filter=None)[source]¶
Lists keys in a bucket under prefix and not containing delimiter
- Parameters
bucket_name (str | None) – the name of the bucket
prefix (str | None) – a key prefix
delimiter (str | None) – the delimiter marks key hierarchy.
page_size (int | None) – pagination size
max_items (int | None) – maximum items to return
start_after_key (str | None) – should return only keys greater than this key
from_datetime (datetime | None) – should return only keys with LastModified attr greater than this equal from_datetime
to_datetime (datetime | None) – should return only keys with LastModified attr less than this to_datetime
object_filter (Callable[..., list] | None) – Function that receives the list of the S3 objects, from_datetime and to_datetime and returns the List of matched key.
- Example: Returns the list of S3 object with LastModified attr greater than from_datetime
and less than to_datetime:
def object_filter( keys: list, from_datetime: datetime | None = None, to_datetime: datetime | None = None, ) -> list: def _is_in_period(input_date: datetime) -> bool: if from_datetime is not None and input_date < from_datetime: return False if to_datetime is not None and input_date > to_datetime: return False return True return [k["Key"] for k in keys if _is_in_period(k["LastModified"])]
- Returns
a list of matched keys
- Return type
- get_file_metadata(prefix, bucket_name=None, page_size=None, max_items=None)[source]¶
Lists metadata objects in a bucket under prefix
- get_key(key, bucket_name=None)[source]¶
Returns a boto3.s3.Object
- Parameters
- Returns
the key object from the bucket
- Return type
- select_key(key, bucket_name=None, expression=None, expression_type=None, input_serialization=None, output_serialization=None)[source]¶
Reads a key with S3 Select.
- Parameters
key (str) – S3 key that will point to the file
bucket_name (str | None) – Name of the bucket in which the file is stored
expression (str | None) – S3 Select expression
expression_type (str | None) – S3 Select expression type
input_serialization (dict[str, Any] | None) – S3 Select input data serialization format
output_serialization (dict[str, Any] | None) – S3 Select output data serialization format
- Returns
retrieved subset of original data by S3 Select
- Return type
See also
For more details about S3 Select parameters: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.select_object_content
- check_for_wildcard_key(wildcard_key, bucket_name=None, delimiter='')[source]¶
Checks that a key matching a wildcard expression exists in a bucket
- get_wildcard_key(wildcard_key, bucket_name=None, delimiter='')[source]¶
Returns a boto3.s3.Object object matching the wildcard expression
- Parameters
- Returns
the key object from the bucket or None if none has been found.
- Return type
- load_file(filename, key, bucket_name=None, replace=False, encrypt=False, gzip=False, acl_policy=None)[source]¶
Loads a local file to S3
- Parameters
filename (Path | str) – path to the file to load.
key (str) – S3 key that will point to the file
bucket_name (str | None) – Name of the bucket in which to store the file
replace (bool) – A flag to decide whether or not to overwrite the key if it already exists. If replace is False and the key exists, an error will be raised.
encrypt (bool) – If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3.
gzip (bool) – If True, the file will be compressed locally
acl_policy (str | None) – String specifying the canned ACL policy for the file being uploaded to the S3 bucket.
- load_string(string_data, key, bucket_name=None, replace=False, encrypt=False, encoding=None, acl_policy=None, compression=None)[source]¶
Loads a string to S3
This is provided as a convenience to drop a string in S3. It uses the boto infrastructure to ship a file to s3.
- Parameters
string_data (str) – str to set as content for the key.
key (str) – S3 key that will point to the file
bucket_name (str | None) – Name of the bucket in which to store the file
replace (bool) – A flag to decide whether or not to overwrite the key if it already exists
encrypt (bool) – If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3.
encoding (str | None) – The string to byte encoding
acl_policy (str | None) – The string to specify the canned ACL policy for the object to be uploaded
compression (str | None) – Type of compression to use, currently only gzip is supported.
- load_bytes(bytes_data, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None)[source]¶
Loads bytes to S3
This is provided as a convenience to drop bytes data into S3. It uses the boto infrastructure to ship a file to s3.
- Parameters
bytes_data (bytes) – bytes to set as content for the key.
key (str) – S3 key that will point to the file
bucket_name (str | None) – Name of the bucket in which to store the file
replace (bool) – A flag to decide whether or not to overwrite the key if it already exists
encrypt (bool) – If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3.
acl_policy (str | None) – The string to specify the canned ACL policy for the object to be uploaded
- load_file_obj(file_obj, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None)[source]¶
Loads a file object to S3
- Parameters
file_obj (io.BytesIO) – The file-like object to set as the content for the S3 key.
key (str) – S3 key that will point to the file
bucket_name (str | None) – Name of the bucket in which to store the file
replace (bool) – A flag that indicates whether to overwrite the key if it already exists.
encrypt (bool) – If True, S3 encrypts the file on the server, and the file is stored in encrypted form at rest in S3.
acl_policy (str | None) – The string to specify the canned ACL policy for the object to be uploaded
- copy_object(source_bucket_key, dest_bucket_key, source_bucket_name=None, dest_bucket_name=None, source_version_id=None, acl_policy=None)[source]¶
Creates a copy of an object that is already stored in S3.
Note: the S3 connection used here needs to have access to both source and destination bucket/key.
- Parameters
source_bucket_key (str) –
The key of the source object.
It can be either full s3:// style url or relative path from root level.
When it’s specified as a full s3:// url, please omit source_bucket_name.
dest_bucket_key (str) –
The key of the object to copy to.
The convention to specify dest_bucket_key is the same as source_bucket_key.
source_bucket_name (str | None) –
Name of the S3 bucket where the source object is in.
It should be omitted when source_bucket_key is provided as a full s3:// url.
dest_bucket_name (str | None) –
Name of the S3 bucket to where the object is copied.
It should be omitted when dest_bucket_key is provided as a full s3:// url.
source_version_id (str | None) – Version ID of the source object (OPTIONAL)
acl_policy (str | None) – The string to specify the canned ACL policy for the object to be copied which is private by default.
- delete_bucket(bucket_name, force_delete=False)[source]¶
To delete s3 bucket, delete all s3 bucket objects and then delete the bucket.
- download_file(key, bucket_name=None, local_path=None, preserve_file_name=False, use_autogenerated_subdir=True)[source]¶
Downloads a file from the S3 location to the local file system.
- Parameters
key (str) – The key path in S3.
bucket_name (str | None) – The specific bucket to use.
local_path (str | None) – The local path to the downloaded file. If no path is provided it will use the system’s temporary directory.
preserve_file_name (bool) – If you want the downloaded file name to be the same name as it is in S3, set this parameter to True. When set to False, a random filename will be generated. Default: False.
use_autogenerated_subdir (bool) – Pairs with ‘preserve_file_name = True’ to download the file into a random generated folder inside the ‘local_path’, useful to avoid collisions between various tasks that might download the same file name. Set it to ‘False’ if you don’t want it, and you want a predictable path. Default: True.
- Returns
the file name.
- Return type
- generate_presigned_url(client_method, params=None, expires_in=3600, http_method=None)[source]¶
Generate a presigned url given a client, its method, and arguments
- Parameters
client_method (str) – The client method to presign for.
params (dict | None) – The parameters normally passed to ClientMethod.
expires_in (int) – The number of seconds the presigned url is valid for. By default it expires in an hour (3600 seconds).
http_method (str | None) – The http method to use on the generated url. By default, the http method is whatever is used in the method’s model.
- Returns
The presigned url.
- Return type
str | None