airflow.contrib.hooks.gcs_hook¶
Module Contents¶
- 
class airflow.contrib.hooks.gcs_hook.GoogleCloudStorageHook(google_cloud_storage_conn_id='google_cloud_default', delegate_to=None)[source]¶
- Bases: - airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook- Interact with Google Cloud Storage. This hook uses the Google Cloud Platform connection. - 
copy(self, source_bucket, source_object, destination_bucket=None, destination_object=None)[source]¶
- Copies an object from a bucket to another, with renaming if requested. - destination_bucket or destination_object can be omitted, in which case source bucket/object is used, but not both. - Parameters
- source_bucket (str) – The bucket of the object to copy from. 
- source_object (str) – The object to copy. 
- destination_bucket (str) – The destination of the object to copied to. Can be omitted; then the same bucket is used. 
- destination_object (str) – The (renamed) path of the object if given. Can be omitted; then the same name is used. 
 
 
 - 
rewrite(self, source_bucket, source_object, destination_bucket, destination_object=None)[source]¶
- Has the same functionality as copy, except that will work on files over 5 TB, as well as when copying between locations and/or storage classes. - destination_object can be omitted, in which case source_object is used. 
 - 
download(self, bucket, object, filename=None)[source]¶
- Downloads a file from Google Cloud Storage. - When no filename is supplied, the operator loads the file into memory and returns its content. When a filename is supplied, it writes the file to the specified location and returns the location. For file sizes that exceed the available memory it is recommended to write to a file. 
 - 
upload(self, bucket, object, filename, mime_type='application/octet-stream', gzip=False, multipart=None, num_retries=None)[source]¶
- Uploads a local file to Google Cloud Storage. 
 - 
is_updated_after(self, bucket, object, ts)[source]¶
- Checks if an object is updated in Google Cloud Storage. - Parameters
- bucket (str) – The Google cloud storage bucket where the object is. 
- object (str) – The name of the object to check in the Google cloud storage bucket. 
- ts (datetime.datetime) – The timestamp to check against. 
 
 
 - 
list(self, bucket, versions=None, maxResults=None, prefix=None, delimiter=None)[source]¶
- List all objects from the bucket with the give string prefix in name - Parameters
- bucket (str) – bucket name 
- versions (bool) – if true, list all versions of the objects 
- maxResults (int) – max count of items to return in a single page of responses 
- prefix (str) – prefix string which filters objects whose name begin with this prefix 
- delimiter (str) – filters objects based on the delimiter (for e.g ‘.csv’) 
 
- Returns
- a stream of object names matching the filtering criteria 
 
 - 
get_crc32c(self, bucket, object)[source]¶
- Gets the CRC32c checksum of an object in Google Cloud Storage. 
 - 
create_bucket(self, bucket_name, resource=None, storage_class='MULTI_REGIONAL', location='US', project_id=None, labels=None)[source]¶
- Creates a new bucket. Google Cloud Storage uses a flat namespace, so you can’t create a bucket with a name that is already in use. - See also - For more information, see Bucket Naming Guidelines: https://cloud.google.com/storage/docs/bucketnaming.html#requirements - Parameters
- bucket_name (str) – The name of the bucket. 
- resource (dict) – An optional dict with parameters for creating the bucket. For information on available parameters, see Cloud Storage API doc: https://cloud.google.com/storage/docs/json_api/v1/buckets/insert 
- storage_class (str) – - This defines how objects in the bucket are stored and determines the SLA and the cost of storage. Values include - MULTI_REGIONAL
- REGIONAL
- STANDARD
- NEARLINE
- COLDLINE.
 - If this value is not specified when the bucket is created, it will default to STANDARD. 
- location (str) – - The location of the bucket. Object data for objects in the bucket resides in physical storage within this region. Defaults to US. 
- project_id (str) – The ID of the GCP Project. 
- labels (dict) – User-provided labels, in key/value pairs. 
 
- Returns
- If successful, it returns the - idof the bucket.
 
 - 
insert_bucket_acl(self, bucket, entity, role, user_project=None)[source]¶
- Creates a new ACL entry on the specified bucket. See: https://cloud.google.com/storage/docs/json_api/v1/bucketAccessControls/insert - Parameters
- bucket (str) – Name of a bucket. 
- entity (str) – The entity holding the permission, in one of the following forms: user-userId, user-email, group-groupId, group-email, domain-domain, project-team-projectId, allUsers, allAuthenticatedUsers. See: https://cloud.google.com/storage/docs/access-control/lists#scopes 
- role (str) – The access permission for the entity. Acceptable values are: “OWNER”, “READER”, “WRITER”. 
- user_project (str) – (Optional) The project to be billed for this request. Required for Requester Pays buckets. 
 
 
 - 
insert_object_acl(self, bucket, object_name, entity, role, generation=None, user_project=None)[source]¶
- Creates a new ACL entry on the specified object. See: https://cloud.google.com/storage/docs/json_api/v1/objectAccessControls/insert - Parameters
- bucket (str) – Name of a bucket. 
- object_name (str) – Name of the object. For information about how to URL encode object names to be path safe, see: https://cloud.google.com/storage/docs/json_api/#encoding 
- entity (str) – The entity holding the permission, in one of the following forms: user-userId, user-email, group-groupId, group-email, domain-domain, project-team-projectId, allUsers, allAuthenticatedUsers See: https://cloud.google.com/storage/docs/access-control/lists#scopes 
- role (str) – The access permission for the entity. Acceptable values are: “OWNER”, “READER”. 
- user_project (str) – (Optional) The project to be billed for this request. Required for Requester Pays buckets. 
 
 
 - 
compose(self, bucket, source_objects, destination_object, num_retries=None)[source]¶
- Composes a list of existing object into a new object in the same storage bucket - Currently it only supports up to 32 objects that can be concatenated in a single operation - https://cloud.google.com/storage/docs/json_api/v1/objects/compose - Parameters
 
 
-