airflow.providers.amazon.aws.hooks.s3

Interact with AWS S3, using the boto3 library.

Module Contents

airflow.providers.amazon.aws.hooks.s3.T[source]
airflow.providers.amazon.aws.hooks.s3.provide_bucket_name(func: T)T[source]
Function decorator that provides a bucket name taken from the connection
in case no bucket name has been passed to the function.
airflow.providers.amazon.aws.hooks.s3.unify_bucket_name_and_key(func: T)T[source]
Function decorator that unifies bucket name and key taken from the key
in case no bucket name and at least a key has been passed to the function.
class airflow.providers.amazon.aws.hooks.s3.S3Hook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with AWS S3, using the boto3 library.

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

See also

AwsBaseHook

conn_type = s3[source]
hook_name = S3[source]
static parse_s3_url(s3url: str)[source]

Parses the S3 Url into a bucket name and key.

Parameters

s3url -- The S3 Url to parse.

Rtype s3url

str

Returns

the parsed bucket name and key

Return type

tuple of str

check_for_bucket(self, bucket_name: Optional[str] = None)[source]

Check if bucket_name exists.

Parameters

bucket_name (str) -- the name of the bucket

Returns

True if it exists and False if not.

Return type

bool

get_bucket(self, bucket_name: Optional[str] = None)[source]

Returns a boto3.S3.Bucket object

Parameters

bucket_name (str) -- the name of the bucket

Returns

the bucket object to the bucket name.

Return type

boto3.S3.Bucket

create_bucket(self, bucket_name: Optional[str] = None, region_name: Optional[str] = None)[source]

Creates an Amazon S3 bucket.

Parameters
  • bucket_name (str) -- The name of the bucket

  • region_name (str) -- The name of the aws region in which to create the bucket.

check_for_prefix(self, prefix: str, delimiter: str, bucket_name: Optional[str] = None)[source]

Checks that a prefix exists in a bucket

Parameters
  • bucket_name (str) -- the name of the bucket

  • prefix (str) -- a key prefix

  • delimiter (str) -- the delimiter marks key hierarchy.

Returns

False if the prefix does not exist in the bucket and True if it does.

Return type

bool

list_prefixes(self, bucket_name: Optional[str] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, page_size: Optional[int] = None, max_items: Optional[int] = None)[source]

Lists prefixes in a bucket under prefix

Parameters
  • bucket_name (str) -- the name of the bucket

  • prefix (str) -- a key prefix

  • delimiter (str) -- the delimiter marks key hierarchy.

  • page_size (int) -- pagination size

  • max_items (int) -- maximum items to return

Returns

a list of matched prefixes

Return type

list

list_keys(self, bucket_name: Optional[str] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, page_size: Optional[int] = None, max_items: Optional[int] = None)[source]

Lists keys in a bucket under prefix and not containing delimiter

Parameters
  • bucket_name (str) -- the name of the bucket

  • prefix (str) -- a key prefix

  • delimiter (str) -- the delimiter marks key hierarchy.

  • page_size (int) -- pagination size

  • max_items (int) -- maximum items to return

Returns

a list of matched keys

Return type

list

check_for_key(self, key: str, bucket_name: Optional[str] = None)[source]

Checks if a key exists in a bucket

Parameters
  • key (str) -- S3 key that will point to the file

  • bucket_name (str) -- Name of the bucket in which the file is stored

Returns

True if the key exists and False if not.

Return type

bool

get_key(self, key: str, bucket_name: Optional[str] = None)[source]

Returns a boto3.s3.Object

Parameters
  • key (str) -- the path to the key

  • bucket_name (str) -- the name of the bucket

Returns

the key object from the bucket

Return type

boto3.s3.Object

read_key(self, key: str, bucket_name: Optional[str] = None)[source]

Reads a key from S3

Parameters
  • key (str) -- S3 key that will point to the file

  • bucket_name (str) -- Name of the bucket in which the file is stored

Returns

the content of the key

Return type

str

select_key(self, key: str, bucket_name: Optional[str] = None, expression: Optional[str] = None, expression_type: Optional[str] = None, input_serialization: Optional[Dict[str, Any]] = None, output_serialization: Optional[Dict[str, Any]] = None)[source]

Reads a key with S3 Select.

Parameters
  • key (str) -- S3 key that will point to the file

  • bucket_name (str) -- Name of the bucket in which the file is stored

  • expression (str) -- S3 Select expression

  • expression_type (str) -- S3 Select expression type

  • input_serialization (dict) -- S3 Select input data serialization format

  • output_serialization (dict) -- S3 Select output data serialization format

Returns

retrieved subset of original data by S3 Select

Return type

str

check_for_wildcard_key(self, wildcard_key: str, bucket_name: Optional[str] = None, delimiter: str = '')[source]

Checks that a key matching a wildcard expression exists in a bucket

Parameters
  • wildcard_key (str) -- the path to the key

  • bucket_name (str) -- the name of the bucket

  • delimiter (str) -- the delimiter marks key hierarchy

Returns

True if a key exists and False if not.

Return type

bool

get_wildcard_key(self, wildcard_key: str, bucket_name: Optional[str] = None, delimiter: str = '')[source]

Returns a boto3.s3.Object object matching the wildcard expression

Parameters
  • wildcard_key (str) -- the path to the key

  • bucket_name (str) -- the name of the bucket

  • delimiter (str) -- the delimiter marks key hierarchy

Returns

the key object from the bucket or None if none has been found.

Return type

boto3.s3.Object

load_file(self, filename: str, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, gzip: bool = False, acl_policy: Optional[str] = None)[source]

Loads a local file to S3

Parameters
  • filename (str) -- name of the file to load.

  • key (str) -- S3 key that will point to the file

  • bucket_name (str) -- Name of the bucket in which to store the file

  • replace (bool) -- A flag to decide whether or not to overwrite the key if it already exists. If replace is False and the key exists, an error will be raised.

  • encrypt (bool) -- If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3.

  • gzip (bool) -- If True, the file will be compressed locally

  • acl_policy (str) -- String specifying the canned ACL policy for the file being uploaded to the S3 bucket.

load_string(self, string_data: str, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, encoding: Optional[str] = None, acl_policy: Optional[str] = None, compression: Optional[str] = None)[source]

Loads a string to S3

This is provided as a convenience to drop a string in S3. It uses the boto infrastructure to ship a file to s3.

Parameters
  • string_data (str) -- str to set as content for the key.

  • key (str) -- S3 key that will point to the file

  • bucket_name (str) -- Name of the bucket in which to store the file

  • replace (bool) -- A flag to decide whether or not to overwrite the key if it already exists

  • encrypt (bool) -- If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3.

  • encoding (str) -- The string to byte encoding

  • acl_policy (str) -- The string to specify the canned ACL policy for the object to be uploaded

  • compression (str) -- Type of compression to use, currently only gzip is supported.

load_bytes(self, bytes_data: bytes, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, acl_policy: Optional[str] = None)[source]

Loads bytes to S3

This is provided as a convenience to drop a string in S3. It uses the boto infrastructure to ship a file to s3.

Parameters
  • bytes_data (bytes) -- bytes to set as content for the key.

  • key (str) -- S3 key that will point to the file

  • bucket_name (str) -- Name of the bucket in which to store the file

  • replace (bool) -- A flag to decide whether or not to overwrite the key if it already exists

  • encrypt (bool) -- If True, the file will be encrypted on the server-side by S3 and will be stored in an encrypted form while at rest in S3.

  • acl_policy (str) -- The string to specify the canned ACL policy for the object to be uploaded

load_file_obj(self, file_obj: BytesIO, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, acl_policy: Optional[str] = None)[source]

Loads a file object to S3

Parameters
  • file_obj (file-like object) -- The file-like object to set as the content for the S3 key.

  • key (str) -- S3 key that will point to the file

  • bucket_name (str) -- Name of the bucket in which to store the file

  • replace (bool) -- A flag that indicates whether to overwrite the key if it already exists.

  • encrypt (bool) -- If True, S3 encrypts the file on the server, and the file is stored in encrypted form at rest in S3.

  • acl_policy (str) -- The string to specify the canned ACL policy for the object to be uploaded

_upload_file_obj(self, file_obj: BytesIO, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, acl_policy: Optional[str] = None)[source]
copy_object(self, source_bucket_key: str, dest_bucket_key: str, source_bucket_name: Optional[str] = None, dest_bucket_name: Optional[str] = None, source_version_id: Optional[str] = None, acl_policy: Optional[str] = None)[source]

Creates a copy of an object that is already stored in S3.

Note: the S3 connection used here needs to have access to both source and destination bucket/key.

Parameters
  • source_bucket_key (str) --

    The key of the source object.

    It can be either full s3:// style url or relative path from root level.

    When it's specified as a full s3:// url, please omit source_bucket_name.

  • dest_bucket_key (str) --

    The key of the object to copy to.

    The convention to specify dest_bucket_key is the same as source_bucket_key.

  • source_bucket_name (str) --

    Name of the S3 bucket where the source object is in.

    It should be omitted when source_bucket_key is provided as a full s3:// url.

  • dest_bucket_name (str) --

    Name of the S3 bucket to where the object is copied.

    It should be omitted when dest_bucket_key is provided as a full s3:// url.

  • source_version_id (str) -- Version ID of the source object (OPTIONAL)

  • acl_policy (str) -- The string to specify the canned ACL policy for the object to be copied which is private by default.

delete_bucket(self, bucket_name: str, force_delete: bool = False)[source]

To delete s3 bucket, delete all s3 bucket objects and then delete the bucket.

Parameters
  • bucket_name (str) -- Bucket name

  • force_delete (bool) -- Enable this to delete bucket even if not empty

Returns

None

Return type

None

delete_objects(self, bucket: str, keys: Union[str, list])[source]

Delete keys from the bucket.

Parameters
  • bucket (str) -- Name of the bucket in which you are going to delete object(s)

  • keys (str or list) --

    The key(s) to delete from S3 bucket.

    When keys is a string, it's supposed to be the key name of the single object to delete.

    When keys is a list, it's supposed to be the list of the keys to delete.

download_file(self, key: str, bucket_name: Optional[str] = None, local_path: Optional[str] = None)[source]

Downloads a file from the S3 location to the local file system.

Parameters
  • key (str) -- The key path in S3.

  • bucket_name (Optional[str]) -- The specific bucket to use.

  • local_path (Optional[str]) -- The local path to the downloaded file. If no path is provided it will use the system's temporary directory.

Returns

the file name.

Return type

str

generate_presigned_url(self, client_method: str, params: Optional[dict] = None, expires_in: int = 3600, http_method: Optional[str] = None)[source]

Generate a presigned url given a client, its method, and arguments

Parameters
  • client_method (str) -- The client method to presign for.

  • params (dict) -- The parameters normally passed to ClientMethod.

  • expires_in (int) -- The number of seconds the presigned url is valid for. By default it expires in an hour (3600 seconds).

  • http_method (str) -- The http method to use on the generated url. By default, the http method is whatever is used in the method's model.

Returns

The presigned url.

Return type

str

Was this entry helpful?