airflow.providers.amazon.aws.hooks.s3

Interact with AWS S3, using the boto3 library.

Module Contents

airflow.providers.amazon.aws.hooks.s3.T[source]
airflow.providers.amazon.aws.hooks.s3.provide_bucket_name(func: T)T[source]
Function decorator that provides a bucket name taken from the connection
in case no bucket name has been passed to the function.
airflow.providers.amazon.aws.hooks.s3.unify_bucket_name_and_key(func: T)T[source]
Function decorator that unifies bucket name and key taken from the key
in case no bucket name and at least a key has been passed to the function.
class airflow.providers.amazon.aws.hooks.s3.S3Hook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with AWS S3, using the boto3 library.

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

See also

AwsBaseHook

conn_type = s3[source]
hook_name = S3[source]
static parse_s3_url(s3url: str)[source]

Parses the S3 Url into a bucket name and key.

Parameters

s3url – The S3 Url to parse.

Rtype s3url

str

Returns

the parsed bucket name and key

Return type

tuple of str

check_for_bucket(self, bucket_name: Optional[str] = None)[source]

Check if bucket_name exists.

Parameters

bucket_name (str) – the name of the bucket

Returns

True if it exists and False if not.

Return type

bool

get_bucket(self, bucket_name: Optional[str] = None)[source]

Returns a boto3.S3.Bucket object

Parameters

bucket_name (str) – the name of the bucket

Returns

the bucket object to the bucket name.

Return type

boto3.S3.Bucket

create_bucket(self, bucket_name: Optional[str] = None, region_name: Optional[str] = None)[source]

Creates an Amazon S3 bucket.

Parameters
  • bucket_name (str) – The name of the bucket

  • region_name (str) – The name of the aws region in which to create the bucket.

check_for_prefix(self, prefix: str, delimiter: str, bucket_name: Optional[str] = None)[source]

Checks that a prefix exists in a bucket

Parameters
  • bucket_name (str) – the name of the bucket

  • prefix (str) – a key prefix

  • delimiter (str) – the delimiter marks key hierarchy.

Returns

False if the prefix does not exist in the bucket and True if it does.

Return type

bool

list_prefixes(self, bucket_name: Optional[str] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, page_size: Optional[int] = None, max_items: Optional[int] = None)[source]

Lists prefixes in a bucket under prefix

Parameters
  • bucket_name (str) – the name of the bucket

  • prefix (str) – a key prefix

  • delimiter (str) – the delimiter marks key hierarchy.

  • page_size (int) – pagination size

  • max_items (int) – maximum items to return

Returns

a list of matched prefixes

Return type

list

list_keys(self, bucket_name: Optional[str] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, page_size: Optional[int] = None, max_items: Optional[int] = None)[source]

Lists keys in a bucket under prefix and not containing delimiter

Parameters
  • bucket_name (str) – the name of the bucket

  • prefix (str) – a key prefix

  • delimiter (str) – the delimiter marks key hierarchy.

  • page_size (int) – pagination size

  • max_items (int) – maximum items to return

Returns

a list of matched keys

Return type

list

check_for_key(self, key: str, bucket_name: Optional[str] = None)[source]

Checks if a key exists in a bucket

Parameters
  • key (str) – S3 key that will point to the file

  • bucket_name (str) – Name of the bucket in which the file is stored

Returns

True if the key exists and False if not.

Return type

bool

get_key(self, key: str, bucket_name: Optional[str] = None)[source]

Returns a boto3.s3.Object

Parameters
  • key (str) – the path to the key

  • bucket_name (str) – the name of the bucket

Returns

the key object from the bucket

Return type

boto3.s3.Object

read_key(self, key: str, bucket_name: Optional[str] = None)[source]

Reads a key from S3

Parameters
  • key (str) – S3 key that will point to the file

  • bucket_name (str) – Name of the bucket in which the file is stored

Returns

the content of the key

Return type

str

select_key(self, key: str, bucket_name: Optional[str] = None, expression: Optional[str] = None, expression_type: Optional[str] = None, input_serialization: Optional[Dict[str, Any]] = None, output_serialization: Optional[Dict[str, Any]] = None)[source]

Reads a key with S3 Select.

Parameters
  • key (str) – S3 key that will point to the file

  • bucket_name (str) – Name of the bucket in which the file is stored

  • expression (str) – S3 Select expression

  • expression_type (str) – S3 Select expression type

  • input_serialization (dict) – S3 Select input data serialization format

  • output_serialization (dict) – S3 Select output data serialization format

Returns

retrieved subset of original data by S3 Select

Return type

str

check_for_wildcard_key(self, wildcard_key: str, bucket_name: Optional[str] = None, delimiter: str = '')[source]

Checks that a key matching a wildcard expression exists in a bucket

Parameters
  • wildcard_key (str) – the path to the key

  • bucket_name (str) – the name of the bucket

  • delimiter (str) – the delimiter marks key hierarchy

Returns

True if a key exists and False if not.

Return type

bool

get_wildcard_key(self, wildcard_key: str, bucket_name: Optional[str] = None, delimiter: str = '')[source]

Returns a boto3.s3.Object object matching the wildcard expression

Parameters
  • wildcard_key (str) – the path to the key

  • bucket_name (str) – the name of the bucket

  • delimiter (str) – the delimiter marks key hierarchy

Returns

the key object from the bucket or None if none has been found.

Return type

boto3.s3.Object

load_file(self, filename: Union[Path, str], key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, gzip: bool = False, acl_policy: Optional[str] = None)[source]

Loads a local file to S3

Parameters
  • filename (Union[Path, str]) – path to the file to load.

  • key (str) – S3 key that will point to the file

  • bucket_name (str) – Name of the bucket in which to store the file

  • replace (bool) – A flag to decide whether or not to overwrite the key if it already exists. If replace is False and the key exists, an error will be raised.

  • encrypt (bool) – If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3.

  • gzip (bool) – If True, the file will be compressed locally

  • acl_policy (str) – String specifying the canned ACL policy for the file being uploaded to the S3 bucket.

load_string(self, string_data: str, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, encoding: Optional[str] = None, acl_policy: Optional[str] = None, compression: Optional[str] = None)[source]

Loads a string to S3

This is provided as a convenience to drop a string in S3. It uses the boto infrastructure to ship a file to s3.

Parameters
  • string_data (str) – str to set as content for the key.

  • key (str) – S3 key that will point to the file

  • bucket_name (str) – Name of the bucket in which to store the file

  • replace (bool) – A flag to decide whether or not to overwrite the key if it already exists

  • encrypt (bool) – If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3.

  • encoding (str) – The string to byte encoding

  • acl_policy (str) – The string to specify the canned ACL policy for the object to be uploaded

  • compression (str) – Type of compression to use, currently only gzip is supported.

load_bytes(self, bytes_data: bytes, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, acl_policy: Optional[str] = None)[source]

Loads bytes to S3

This is provided as a convenience to drop a string in S3. It uses the boto infrastructure to ship a file to s3.

Parameters
  • bytes_data (bytes) – bytes to set as content for the key.

  • key (str) – S3 key that will point to the file

  • bucket_name (str) – Name of the bucket in which to store the file

  • replace (bool) – A flag to decide whether or not to overwrite the key if it already exists

  • encrypt (bool) – If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3.

  • acl_policy (str) – The string to specify the canned ACL policy for the object to be uploaded

load_file_obj(self, file_obj: BytesIO, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, acl_policy: Optional[str] = None)[source]

Loads a file object to S3

Parameters
  • file_obj (file-like object) – The file-like object to set as the content for the S3 key.

  • key (str) – S3 key that will point to the file

  • bucket_name (str) – Name of the bucket in which to store the file

  • replace (bool) – A flag that indicates whether to overwrite the key if it already exists.

  • encrypt (bool) – If True, S3 encrypts the file on the server, and the file is stored in encrypted form at rest in S3.

  • acl_policy (str) – The string to specify the canned ACL policy for the object to be uploaded

copy_object(self, source_bucket_key: str, dest_bucket_key: str, source_bucket_name: Optional[str] = None, dest_bucket_name: Optional[str] = None, source_version_id: Optional[str] = None, acl_policy: Optional[str] = None)[source]

Creates a copy of an object that is already stored in S3.

Note: the S3 connection used here needs to have access to both source and destination bucket/key.

Parameters
  • source_bucket_key (str) –

    The key of the source object.

    It can be either full s3:// style url or relative path from root level.

    When it’s specified as a full s3:// url, please omit source_bucket_name.

  • dest_bucket_key (str) –

    The key of the object to copy to.

    The convention to specify dest_bucket_key is the same as source_bucket_key.

  • source_bucket_name (str) –

    Name of the S3 bucket where the source object is in.

    It should be omitted when source_bucket_key is provided as a full s3:// url.

  • dest_bucket_name (str) –

    Name of the S3 bucket to where the object is copied.

    It should be omitted when dest_bucket_key is provided as a full s3:// url.

  • source_version_id (str) – Version ID of the source object (OPTIONAL)

  • acl_policy (str) – The string to specify the canned ACL policy for the object to be copied which is private by default.

delete_bucket(self, bucket_name: str, force_delete: bool = False)[source]

To delete s3 bucket, delete all s3 bucket objects and then delete the bucket.

Parameters
  • bucket_name (str) – Bucket name

  • force_delete (bool) – Enable this to delete bucket even if not empty

Returns

None

Return type

None

delete_objects(self, bucket: str, keys: Union[str, list])[source]

Delete keys from the bucket.

Parameters
  • bucket (str) – Name of the bucket in which you are going to delete object(s)

  • keys (str or list) –

    The key(s) to delete from S3 bucket.

    When keys is a string, it’s supposed to be the key name of the single object to delete.

    When keys is a list, it’s supposed to be the list of the keys to delete.

download_file(self, key: str, bucket_name: Optional[str] = None, local_path: Optional[str] = None)[source]

Downloads a file from the S3 location to the local file system.

Parameters
  • key (str) – The key path in S3.

  • bucket_name (Optional[str]) – The specific bucket to use.

  • local_path (Optional[str]) – The local path to the downloaded file. If no path is provided it will use the system’s temporary directory.

Returns

the file name.

Return type

str

generate_presigned_url(self, client_method: str, params: Optional[dict] = None, expires_in: int = 3600, http_method: Optional[str] = None)[source]

Generate a presigned url given a client, its method, and arguments

Parameters
  • client_method (str) – The client method to presign for.

  • params (dict) – The parameters normally passed to ClientMethod.

  • expires_in (int) – The number of seconds the presigned url is valid for. By default it expires in an hour (3600 seconds).

  • http_method (str) – The http method to use on the generated url. By default, the http method is whatever is used in the method’s model.

Returns

The presigned url.

Return type

str

get_bucket_tagging(self, bucket_name: Optional[str] = None)[source]

Gets a List of tags from a bucket.

Parameters

bucket_name (str) – The name of the bucket.

Returns

A List containing the key/value pairs for the tags

Return type

Optional[List[Dict[str, str]]]

put_bucket_tagging(self, tag_set: Optional[List[Dict[str, str]]] = None, key: Optional[str] = None, value: Optional[str] = None, bucket_name: Optional[str] = None)[source]

Overwrites the existing TagSet with provided tags. Must provide either a TagSet or a key/value pair.

Parameters
  • tag_set (List[Dict[str, str]]) – A List containing the key/value pairs for the tags.

  • key (str) – The Key for the new TagSet entry.

  • value (str) – The Value for the new TagSet entry.

  • bucket_name (str) – The name of the bucket.

Returns

None

Return type

None

delete_bucket_tagging(self, bucket_name: Optional[str] = None)[source]

Deletes all tags from a bucket.

Parameters

bucket_name (str) – The name of the bucket.

Returns

None

Return type

None

Was this entry helpful?