Interact with Google Cloud Storage. This hook uses the Google Cloud Platform connection.
Returns a Google Cloud Storage service object.
copy(self, source_bucket, source_object, destination_bucket=None, destination_object=None)¶
Copies an object from a bucket to another, with renaming if requested.
destination_bucket or destination_object can be omitted, in which case source bucket/object is used, but not both.
source_bucket (str) – The bucket of the object to copy from.
source_object (str) – The object to copy.
destination_bucket (str) – The destination of the object to copied to. Can be omitted; then the same bucket is used.
destination_object (str) – The (renamed) path of the object if given. Can be omitted; then the same name is used.
rewrite(self, source_bucket, source_object, destination_bucket, destination_object=None)¶
Has the same functionality as copy, except that will work on files over 5 TB, as well as when copying between locations and/or storage classes.
destination_object can be omitted, in which case source_object is used.
download(self, bucket, object, filename=None)¶
Get a file from Google Cloud Storage.
upload(self, bucket, object, filename, mime_type='application/octet-stream', gzip=False, multipart=False, num_retries=0)¶
Uploads a local file to Google Cloud Storage.
bucket (str) – The bucket to upload to.
object (str) – The object name to set when uploading the local file.
filename (str) – The local file path to the file to be uploaded.
mime_type (str) – The MIME type to set when uploading the file.
gzip (bool) – Option to compress file for upload
multipart (bool or int) – If True, the upload will be split into multiple HTTP requests. The default size is 256MiB per request. Pass a number instead of True to specify the request size, which must be a multiple of 262144 (256KiB).
num_retries (int) – The number of times to attempt to re-upload the file (or individual chunks, in the case of multipart uploads). Retries are attempted with exponential backoff.
exists(self, bucket, object)¶
Checks for the existence of a file in Google Cloud Storage.
is_updated_after(self, bucket, object, ts)¶
Checks if an object is updated in Google Cloud Storage.
delete(self, bucket, object, generation=None)¶
Delete an object if versioning is not enabled for the bucket, or if generation parameter is used.
list(self, bucket, versions=None, maxResults=None, prefix=None, delimiter=None)¶
List all objects from the bucket with the give string prefix in name
bucket (str) – bucket name
versions (bool) – if true, list all versions of the objects
maxResults (int) – max count of items to return in a single page of responses
prefix (str) – prefix string which filters objects whose name begin with this prefix
delimiter (str) – filters objects based on the delimiter (for e.g ‘.csv’)
a stream of object names matching the filtering criteria
get_size(self, bucket, object)¶
Gets the size of a file in Google Cloud Storage.
get_crc32c(self, bucket, object)¶
Gets the CRC32c checksum of an object in Google Cloud Storage.
get_md5hash(self, bucket, object)¶
Gets the MD5 hash of an object in Google Cloud Storage.
create_bucket(self, bucket_name, resource=None, storage_class='MULTI_REGIONAL', location='US', project_id=None, labels=None)¶
Creates a new bucket. Google Cloud Storage uses a flat namespace, so you can’t create a bucket with a name that is already in use.
For more information, see Bucket Naming Guidelines: https://cloud.google.com/storage/docs/bucketnaming.html#requirements
bucket_name (str) – The name of the bucket.
resource (dict) – An optional dict with parameters for creating the bucket. For information on available parameters, see Cloud Storage API doc: https://cloud.google.com/storage/docs/json_api/v1/buckets/insert
storage_class (str) –
This defines how objects in the bucket are stored and determines the SLA and the cost of storage. Values include
If this value is not specified when the bucket is created, it will default to STANDARD.
location (str) –
The location of the bucket. Object data for objects in the bucket resides in physical storage within this region. Defaults to US.
project_id (str) – The ID of the GCP Project.
labels (dict) – User-provided labels, in key/value pairs.
If successful, it returns the
idof the bucket.
insert_bucket_acl(self, bucket, entity, role, user_project)¶
Creates a new ACL entry on the specified bucket. See: https://cloud.google.com/storage/docs/json_api/v1/bucketAccessControls/insert
bucket (str) – Name of a bucket.
entity (str) – The entity holding the permission, in one of the following forms: user-userId, user-email, group-groupId, group-email, domain-domain, project-team-projectId, allUsers, allAuthenticatedUsers. See: https://cloud.google.com/storage/docs/access-control/lists#scopes
role (str) – The access permission for the entity. Acceptable values are: “OWNER”, “READER”, “WRITER”.
user_project (str) – (Optional) The project to be billed for this request. Required for Requester Pays buckets.
insert_object_acl(self, bucket, object_name, entity, role, generation, user_project)¶
Creates a new ACL entry on the specified object. See: https://cloud.google.com/storage/docs/json_api/v1/objectAccessControls/insert
bucket (str) – Name of a bucket.
object_name (str) – Name of the object. For information about how to URL encode object names to be path safe, see: https://cloud.google.com/storage/docs/json_api/#encoding
entity (str) – The entity holding the permission, in one of the following forms: user-userId, user-email, group-groupId, group-email, domain-domain, project-team-projectId, allUsers, allAuthenticatedUsers See: https://cloud.google.com/storage/docs/access-control/lists#scopes
role (str) – The access permission for the entity. Acceptable values are: “OWNER”, “READER”.
generation (str) – (Optional) If present, selects a specific revision of this object (as opposed to the latest version, the default).
compose(self, bucket, source_objects, destination_object, num_retries=5)¶
Composes a list of existing object into a new object in the same storage bucket
Currently it only supports up to 32 objects that can be concatenated in a single operation
Given a Google Cloud Storage URL (gs://<bucket>/<blob>), returns a tuple containing the corresponding bucket and blob.