airflow.contrib.operators.gcp_vision_operator

Module Contents

class airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetCreateOperator(product_set, location, project_id=None, product_set_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates a new ProductSet resource.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionProductSetCreateOperator

Parameters
  • product_set (dict or google.cloud.vision_v1.types.ProductSet) – (Required) The ProductSet to create. If a dict is provided, it must be of the same form as the protobuf message ProductSet.

  • location (str) – (Required) The region where the ProductSet should be created. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • project_id (str) – (Optional) The project in which the ProductSet should be created. If set to None or missing, the default project_id from the GCP connection is used.

  • product_set_id (str) – (Optional) A user-supplied resource id for this ProductSet. If set, the server will attempt to use this value as the resource id. If it is already in use, an error is returned with code ALREADY_EXISTS. Must be at most 128 characters long. It cannot contain the character /.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['location', 'project_id', 'product_set_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetGetOperator(location, product_set_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Gets information associated with a ProductSet.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionProductSetGetOperator

Parameters
  • location (str) – (Required) The region where the ProductSet is located. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • product_set_id (str) – (Required) The resource id of this ProductSet.

  • project_id (str) – (Optional) The project in which the ProductSet is located. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['location', 'project_id', 'product_set_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetUpdateOperator(product_set, location=None, product_set_id=None, project_id=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Makes changes to a ProductSet resource. Only display_name can be updated currently.

Note

To locate the ProductSet resource, its name in the form projects/PROJECT_ID/locations/LOC_ID/productSets/PRODUCT_SET_ID is necessary.

You can provide the name directly as an attribute of the product_set object. However, you can leave it blank and provide location and product_set_id instead (and optionally project_id - if not present, the connection default will be used) and the name will be created by the operator itself.

This mechanism exists for your convenience, to allow leaving the project_id empty and having Airflow use the connection default project_id.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionProductSetUpdateOperator

Parameters
  • product_set (dict or google.cloud.vision_v1.types.ProductSet) – (Required) The ProductSet resource which replaces the one on the server. If a dict is provided, it must be of the same form as the protobuf message ProductSet.

  • location (str) – (Optional) The region where the ProductSet is located. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • product_set_id (str) – (Optional) The resource id of this ProductSet.

  • project_id (str) – (Optional) The project in which the ProductSet should be created. If set to None or missing, the default project_id from the GCP connection is used.

  • update_mask (dict or google.cloud.vision_v1.types.FieldMask) – (Optional) The FieldMask that specifies which fields to update. If update_mask isn’t specified, all mutable fields are to be updated. Valid mask path is display_name. If a dict is provided, it must be of the same form as the protobuf message FieldMask.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['location', 'project_id', 'product_set_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionProductSetDeleteOperator(location, product_set_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Permanently deletes a ProductSet. Products and ReferenceImages in the ProductSet are not deleted. The actual image files are not deleted from Google Cloud Storage.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionProductSetDeleteOperator

Parameters
  • location (str) – (Required) The region where the ProductSet is located. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • product_set_id (str) – (Required) The resource id of this ProductSet.

  • project_id (str) – (Optional) The project in which the ProductSet should be created. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['location', 'project_id', 'product_set_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionProductCreateOperator(location, product, project_id=None, product_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates and returns a new product resource.

Possible errors regarding the Product object provided:

  • Returns INVALID_ARGUMENT if display_name is missing or longer than 4096 characters.

  • Returns INVALID_ARGUMENT if description is longer than 4096 characters.

  • Returns INVALID_ARGUMENT if product_category is missing or invalid.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionProductCreateOperator

Parameters
  • location (str) – (Required) The region where the Product should be created. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • product (dict or google.cloud.vision_v1.types.Product) – (Required) The product to create. If a dict is provided, it must be of the same form as the protobuf message Product.

  • project_id (str) – (Optional) The project in which the Product should be created. If set to None or missing, the default project_id from the GCP connection is used.

  • product_id (str) – (Optional) A user-supplied resource id for this Product. If set, the server will attempt to use this value as the resource id. If it is already in use, an error is returned with code ALREADY_EXISTS. Must be at most 128 characters long. It cannot contain the character /.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['location', 'project_id', 'product_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionProductGetOperator(location, product_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Gets information associated with a Product.

Possible errors:

  • Returns NOT_FOUND if the Product does not exist.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionProductGetOperator

Parameters
  • location (str) – (Required) The region where the Product is located. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • product_id (str) – (Required) The resource id of this Product.

  • project_id (str) – (Optional) The project in which the Product is located. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['location', 'project_id', 'product_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionProductUpdateOperator(product, location=None, product_id=None, project_id=None, update_mask=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Makes changes to a Product resource. Only the display_name, description, and labels fields can be updated right now.

If labels are updated, the change will not be reflected in queries until the next index time.

Note

To locate the Product resource, its name in the form projects/PROJECT_ID/locations/LOC_ID/products/PRODUCT_ID is necessary.

You can provide the name directly as an attribute of the product object. However, you can leave it blank and provide location and product_id instead (and optionally project_id - if not present, the connection default will be used) and the name will be created by the operator itself.

This mechanism exists for your convenience, to allow leaving the project_id empty and having Airflow use the connection default project_id.

Possible errors related to the provided Product:

  • Returns NOT_FOUND if the Product does not exist.

  • Returns INVALID_ARGUMENT if display_name is present in update_mask but is missing from the request

    or longer than 4096 characters.

  • Returns INVALID_ARGUMENT if description is present in update_mask but is longer than 4096

    characters.

  • Returns INVALID_ARGUMENT if product_category is present in update_mask.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionProductUpdateOperator

Parameters
  • product (dict or google.cloud.vision_v1.types.ProductSet) – (Required) The Product resource which replaces the one on the server. product.name is immutable. If a dict is provided, it must be of the same form as the protobuf message Product.

  • location (str) – (Optional) The region where the Product is located. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • product_id (str) – (Optional) The resource id of this Product.

  • project_id (str) – (Optional) The project in which the Product is located. If set to None or missing, the default project_id from the GCP connection is used.

  • update_mask (dict or google.cloud.vision_v1.types.FieldMask) – (Optional) The FieldMask that specifies which fields to update. If update_mask isn’t specified, all mutable fields are to be updated. Valid mask paths include product_labels, display_name, and description. If a dict is provided, it must be of the same form as the protobuf message FieldMask.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['location', 'project_id', 'product_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionProductDeleteOperator(location, product_id, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Permanently deletes a product and its reference images.

Metadata of the product and all its images will be deleted right away, but search queries against ProductSets containing the product may still work until all related caches are refreshed.

Possible errors:

  • Returns NOT_FOUND if the product does not exist.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionProductDeleteOperator

Parameters
  • location (str) – (Required) The region where the Product is located. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • product_id (str) – (Required) The resource id of this Product.

  • project_id (str) – (Optional) The project in which the Product is located. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['location', 'project_id', 'product_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionAnnotateImageOperator(request, retry=None, timeout=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Run image detection and annotation for an image or a batch of images.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionAnnotateImageOperator

Parameters
  • request (list[dict or google.cloud.vision_v1.types.AnnotateImageRequest] for batch or dict or google.cloud.vision_v1.types.AnnotateImageRequest for single image.) – (Required) Annotation request for image or a batch. If a dict is provided, it must be of the same form as the protobuf message class:google.cloud.vision_v1.types.AnnotateImageRequest

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['request', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionReferenceImageCreateOperator(location, reference_image, product_id, reference_image_id=None, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates and returns a new ReferenceImage ID resource.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionReferenceImageCreateOperator

Parameters
  • location (str) – (Required) The region where the Product is located. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • reference_image (dict or google.cloud.vision_v1.types.ReferenceImage) – (Required) The reference image to create. If an image ID is specified, it is ignored. If a dict is provided, it must be of the same form as the protobuf message google.cloud.vision_v1.types.ReferenceImage

  • reference_image_id (str) – (Optional) A user-supplied resource id for the ReferenceImage to be added. If set, the server will attempt to use this value as the resource id. If it is already in use, an error is returned with code ALREADY_EXISTS. Must be at most 128 characters long. It cannot contain the character /.

  • product_id (str) – (Optional) The resource id of this Product.

  • project_id (str) – (Optional) The project in which the Product is located. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

template_fields = ['location', 'reference_image', 'product_id', 'reference_image_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionAddProductToProductSetOperator(product_set_id, product_id, location, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Adds a Product to the specified ProductSet. If the Product is already present, no change is made.

One Product can be added to at most 100 ProductSets.

Possible errors:

  • Returns NOT_FOUND if the Product or the ProductSet doesn’t exist.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionAddProductToProductSetOperator

Parameters
  • product_set_id (str) – (Required) The resource id for the ProductSet to modify.

  • product_id (str) – (Required) The resource id of this Product.

  • location – (Required) The region where the ProductSet is located. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • project_id (str) – (Optional) The project in which the Product is located. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Type

str

template_fields = ['location', 'product_set_id', 'product_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionRemoveProductFromProductSetOperator(product_set_id, product_id, location, project_id=None, retry=None, timeout=None, metadata=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Removes a Product from the specified ProductSet.

See also

For more information on how to use this operator, take a look at the guide: CloudVisionRemoveProductFromProductSetOperator

Parameters
  • product_set_id (str) – (Required) The resource id for the ProductSet to modify.

  • product_id (str) – (Required) The resource id of this Product.

  • location – (Required) The region where the ProductSet is located. Valid regions (as of 2019-02-05) are: us-east1, us-west1, europe-west1, asia-east1

  • project_id (str) – (Optional) The project in which the Product is located. If set to None or missing, the default project_id from the GCP connection is used.

  • retry (google.api_core.retry.Retry) – (Optional) A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float) – (Optional) The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (sequence[tuple[str, str]]) – (Optional) Additional metadata that is provided to the method.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud Platform.

Type

str

template_fields = ['location', 'product_set_id', 'product_id', 'project_id', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionDetectTextOperator(image, max_results=None, retry=None, timeout=None, language_hints=None, web_detection_params=None, additional_properties=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Detects Text in the image

See also

For more information on how to use this operator, take a look at the guide: CloudVisionDetectTextOperator

Parameters
template_fields = ['image', 'max_results', 'timeout', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionDetectDocumentTextOperator(image, max_results=None, retry=None, timeout=None, language_hints=None, web_detection_params=None, additional_properties=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Detects Document Text in the image

See also

For more information on how to use this operator, take a look at the guide: CloudVisionDetectDocumentTextOperator

Parameters
template_fields = ['image', 'max_results', 'timeout', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionDetectImageLabelsOperator(image, max_results=None, retry=None, timeout=None, additional_properties=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Detects Document Text in the image

See also

For more information on how to use this operator, take a look at the guide: CloudVisionDetectImageLabelsOperator

Parameters
template_fields = ['image', 'max_results', 'timeout', 'gcp_conn_id'][source]
execute(self, context)[source]
class airflow.contrib.operators.gcp_vision_operator.CloudVisionDetectImageSafeSearchOperator(image, max_results=None, retry=None, timeout=None, additional_properties=None, gcp_conn_id='google_cloud_default', *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Detects Document Text in the image

See also

For more information on how to use this operator, take a look at the guide: CloudVisionDetectImageSafeSearchOperator

Parameters
template_fields = ['image', 'max_results', 'timeout', 'gcp_conn_id'][source]
execute(self, context)[source]
airflow.contrib.operators.gcp_vision_operator.prepare_additional_parameters(additional_properties, language_hints, web_detection_params)[source]
Creates additional_properties parameter based on language_hints, web_detection_params and
additional_properties parameters specified by the user

Was this entry helpful?