:mod:`airflow.contrib.hooks.gcs_hook` ===================================== .. py:module:: airflow.contrib.hooks.gcs_hook Module Contents --------------- .. py:class:: GoogleCloudStorageHook(google_cloud_storage_conn_id='google_cloud_default', delegate_to=None) Bases: :class:`airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook` Interact with Google Cloud Storage. This hook uses the Google Cloud Platform connection. .. attribute:: _conn .. method:: get_conn(self) Returns a Google Cloud Storage service object. .. method:: copy(self, source_bucket, source_object, destination_bucket=None, destination_object=None) Copies an object from a bucket to another, with renaming if requested. destination_bucket or destination_object can be omitted, in which case source bucket/object is used, but not both. :param source_bucket: The bucket of the object to copy from. :type source_bucket: str :param source_object: The object to copy. :type source_object: str :param destination_bucket: The destination of the object to copied to. Can be omitted; then the same bucket is used. :type destination_bucket: str :param destination_object: The (renamed) path of the object if given. Can be omitted; then the same name is used. :type destination_object: str .. method:: rewrite(self, source_bucket, source_object, destination_bucket, destination_object=None) Has the same functionality as copy, except that will work on files over 5 TB, as well as when copying between locations and/or storage classes. destination_object can be omitted, in which case source_object is used. :param source_bucket: The bucket of the object to copy from. :type source_bucket: str :param source_object: The object to copy. :type source_object: str :param destination_bucket: The destination of the object to copied to. :type destination_bucket: str :param destination_object: The (renamed) path of the object if given. Can be omitted; then the same name is used. .. method:: download(self, bucket, object, filename=None) Get a file from Google Cloud Storage. :param bucket: The bucket to fetch from. :type bucket: str :param object: The object to fetch. :type object: str :param filename: If set, a local file path where the file should be written to. :type filename: str .. method:: upload(self, bucket, object, filename, mime_type='application/octet-stream', gzip=False, multipart=None, num_retries=None) Uploads a local file to Google Cloud Storage. :param bucket: The bucket to upload to. :type bucket: str :param object: The object name to set when uploading the local file. :type object: str :param filename: The local file path to the file to be uploaded. :type filename: str :param mime_type: The MIME type to set when uploading the file. :type mime_type: str :param gzip: Option to compress file for upload :type gzip: bool .. method:: exists(self, bucket, object) Checks for the existence of a file in Google Cloud Storage. :param bucket: The Google cloud storage bucket where the object is. :type bucket: str :param object: The name of the object to check in the Google cloud storage bucket. :type object: str .. method:: is_updated_after(self, bucket, object, ts) Checks if an object is updated in Google Cloud Storage. :param bucket: The Google cloud storage bucket where the object is. :type bucket: str :param object: The name of the object to check in the Google cloud storage bucket. :type object: str :param ts: The timestamp to check against. :type ts: datetime.datetime .. method:: delete(self, bucket, object, generation=None) Deletes an object from the bucket. :param bucket: name of the bucket, where the object resides :type bucket: str :param object: name of the object to delete :type object: str .. method:: list(self, bucket, versions=None, maxResults=None, prefix=None, delimiter=None) List all objects from the bucket with the give string prefix in name :param bucket: bucket name :type bucket: str :param versions: if true, list all versions of the objects :type versions: bool :param maxResults: max count of items to return in a single page of responses :type maxResults: int :param prefix: prefix string which filters objects whose name begin with this prefix :type prefix: str :param delimiter: filters objects based on the delimiter (for e.g '.csv') :type delimiter: str :return: a stream of object names matching the filtering criteria .. method:: get_size(self, bucket, object) Gets the size of a file in Google Cloud Storage in bytes. :param bucket: The Google cloud storage bucket where the object is. :type bucket: str :param object: The name of the object to check in the Google cloud storage bucket. :type object: str .. method:: get_crc32c(self, bucket, object) Gets the CRC32c checksum of an object in Google Cloud Storage. :param bucket: The Google cloud storage bucket where the object is. :type bucket: str :param object: The name of the object to check in the Google cloud storage bucket. :type object: str .. method:: get_md5hash(self, bucket, object) Gets the MD5 hash of an object in Google Cloud Storage. :param bucket: The Google cloud storage bucket where the object is. :type bucket: str :param object: The name of the object to check in the Google cloud storage bucket. :type object: str .. method:: create_bucket(self, bucket_name, resource=None, storage_class='MULTI_REGIONAL', location='US', project_id=None, labels=None) Creates a new bucket. Google Cloud Storage uses a flat namespace, so you can't create a bucket with a name that is already in use. .. seealso:: For more information, see Bucket Naming Guidelines: https://cloud.google.com/storage/docs/bucketnaming.html#requirements :param bucket_name: The name of the bucket. :type bucket_name: str :param resource: An optional dict with parameters for creating the bucket. For information on available parameters, see Cloud Storage API doc: https://cloud.google.com/storage/docs/json_api/v1/buckets/insert :type resource: dict :param storage_class: This defines how objects in the bucket are stored and determines the SLA and the cost of storage. Values include - ``MULTI_REGIONAL`` - ``REGIONAL`` - ``STANDARD`` - ``NEARLINE`` - ``COLDLINE``. If this value is not specified when the bucket is created, it will default to STANDARD. :type storage_class: str :param location: The location of the bucket. Object data for objects in the bucket resides in physical storage within this region. Defaults to US. .. seealso:: https://developers.google.com/storage/docs/bucket-locations :type location: str :param project_id: The ID of the GCP Project. :type project_id: str :param labels: User-provided labels, in key/value pairs. :type labels: dict :return: If successful, it returns the ``id`` of the bucket. .. method:: insert_bucket_acl(self, bucket, entity, role, user_project=None) Creates a new ACL entry on the specified bucket. See: https://cloud.google.com/storage/docs/json_api/v1/bucketAccessControls/insert :param bucket: Name of a bucket. :type bucket: str :param entity: The entity holding the permission, in one of the following forms: user-userId, user-email, group-groupId, group-email, domain-domain, project-team-projectId, allUsers, allAuthenticatedUsers. See: https://cloud.google.com/storage/docs/access-control/lists#scopes :type entity: str :param role: The access permission for the entity. Acceptable values are: "OWNER", "READER", "WRITER". :type role: str :param user_project: (Optional) The project to be billed for this request. Required for Requester Pays buckets. :type user_project: str .. method:: insert_object_acl(self, bucket, object_name, entity, role, generation=None, user_project=None) Creates a new ACL entry on the specified object. See: https://cloud.google.com/storage/docs/json_api/v1/objectAccessControls/insert :param bucket: Name of a bucket. :type bucket: str :param object_name: Name of the object. For information about how to URL encode object names to be path safe, see: https://cloud.google.com/storage/docs/json_api/#encoding :type object_name: str :param entity: The entity holding the permission, in one of the following forms: user-userId, user-email, group-groupId, group-email, domain-domain, project-team-projectId, allUsers, allAuthenticatedUsers See: https://cloud.google.com/storage/docs/access-control/lists#scopes :type entity: str :param role: The access permission for the entity. Acceptable values are: "OWNER", "READER". :type role: str :param user_project: (Optional) The project to be billed for this request. Required for Requester Pays buckets. :type user_project: str .. method:: compose(self, bucket, source_objects, destination_object, num_retries=None) Composes a list of existing object into a new object in the same storage bucket Currently it only supports up to 32 objects that can be concatenated in a single operation https://cloud.google.com/storage/docs/json_api/v1/objects/compose :param bucket: The name of the bucket containing the source objects. This is also the same bucket to store the composed destination object. :type bucket: str :param source_objects: The list of source objects that will be composed into a single object. :type source_objects: list :param destination_object: The path of the object if given. :type destination_object: str .. function:: _parse_gcs_url(gsurl) Given a Google Cloud Storage URL (gs:///), returns a tuple containing the corresponding bucket and blob.