airflow.providers.google.cloud.hooks.translate

This module contains a Google Cloud Translate Hook.

Module Contents

Classes

CloudTranslateHook

Hook for Google Cloud translate APIs.

TranslateHook

Hook for Google Cloud translation (Advanced) using client version V3.

exception airflow.providers.google.cloud.hooks.translate.WaitOperationNotDoneYetError[source]

Bases: Exception

Wait operation not done yet error.

class airflow.providers.google.cloud.hooks.translate.CloudTranslateHook(gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.common.hooks.base_google.GoogleBaseHook

Hook for Google Cloud translate APIs.

All the methods in the hook where project_id is used must be called with keyword arguments rather than positional.

get_conn()[source]

Retrieve connection to Cloud Translate.

Returns

Google Cloud Translate client object.

Return type

google.cloud.translate_v2.Client

translate(values, target_language, format_=None, source_language=None, model=None)[source]

Translate a string or list of strings.

See https://cloud.google.com/translate/docs/translating-text

Parameters
  • values (str | list[str]) – String or list of strings to translate.

  • target_language (str) – The language to translate results into. This is required by the API and defaults to the target language of the current instance.

  • format – (Optional) One of text or html, to specify if the input text is plain text or HTML.

  • source_language (str | None) – (Optional) The language of the text to be translated.

  • model (str | list[str] | None) – (Optional) The model used to translate the text, such as 'base' or 'NMT'.

Returns

A list of dictionaries for each queried value. Each dictionary typically contains three keys (though not all will be present in all cases)

  • detectedSourceLanguage: The detected language (as an ISO 639-1 language code) of the text.

  • translatedText: The translation of the text into the target language.

  • input: The corresponding input value.

  • model: The model used to translate the text.

If only a single value is passed, then only a single dictionary will be returned.

Raises

ValueError if the number of values and translations differ.

Return type

dict

class airflow.providers.google.cloud.hooks.translate.TranslateHook(gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.common.hooks.base_google.GoogleBaseHook

Hook for Google Cloud translation (Advanced) using client version V3.

See related docs https://cloud.google.com/translate/docs/editions#advanced.

get_client()[source]

Retrieve TranslationService client.

Returns

Google Cloud Translation Service client object.

Return type

google.cloud.translate_v3.TranslationServiceClient

static wait_for_operation_done(*, operation, timeout=None, initial=3, multiplier=2, maximum=3600)[source]

Wait for long-running operation to be done.

Calls operation.done() until success or timeout exhaustion, following the back-off retry strategy. See google.api_core.retry.Retry. It’s intended use on Operation instances that have empty result (:class google.protobuf.empty_pb2.Empty) by design. Thus calling operation.result() for such operation triggers the exception GoogleAPICallError("Unexpected state: Long-running operation had neither response nor error set.") even though operation itself is totally fine.

static wait_for_operation_result(operation, timeout=None)[source]

Wait for long-lasting operation to complete.

static extract_object_id(obj)[source]

Return unique id of the object.

translate_text(*, project_id=PROVIDE_PROJECT_ID, contents, target_language_code, source_language_code=None, mime_type=None, location=None, model=None, transliteration_config=None, glossary_config=None, labels=None, timeout=DEFAULT, metadata=(), retry=DEFAULT)[source]

Translate text content provided.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • contents (collections.abc.Sequence[str]) – Required. The content of the input in string format. Max length 1024 items with 30_000 codepoints recommended.

  • mime_type (str | None) – Optional. The format of the source text, If left blank, the MIME type defaults to “text/html”.

  • source_language_code (str | None) – Optional. The ISO-639 language code of the input text if known. If the source language isn’t specified, the API attempts to identify the source language automatically and returns the source language within the response.

  • target_language_code (str) – Required. The ISO-639 language code to use for translation of the input text

  • location (str | None) – Optional. Project or location to make a call. Must refer to a caller’s project. If not specified, ‘global’ is used. Non-global location is required for requests using AutoML models or custom glossaries. Models and glossaries must be within the same region (have the same location-id).

  • model (str | None) –

    Optional. The model type requested for this translation. If not provided, the default Google model (NMT) will be used. The format depends on model type:

    • AutoML Translation models: projects/{project-number-or-id}/locations/{location-id}/models/{model-id}

    • General (built-in) models: projects/{project-number-or-id}/locations/{location-id}/models/general/nmt

    • Translation LLM models: projects/{project-number-or-id}/locations/{location-id}/models/general/translation-llm

    For global (no region) requests, use location-id global. For example, projects/{project-number-or-id}/locations/global/models/general/nmt.

  • glossary_config (google.cloud.translate_v3.types.TranslateTextGlossaryConfig | None) – Optional. Glossary to be applied. The glossary must be within the same region (have the same location-id) as the model.

  • transliteration_config (google.cloud.translate_v3.types.TransliterationConfig | None) – Optional. Transliteration to be applied.

  • labels (str | None) – Optional. The labels with user-defined metadata for the request. See https://cloud.google.com/translate/docs/advanced/labels for more information.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Translate text result from the API response.

Return type

dict

batch_translate_text(*, project_id=PROVIDE_PROJECT_ID, location, source_language_code, target_language_codes, input_configs, output_config, models=None, glossaries=None, labels=None, timeout=DEFAULT, metadata=(), retry=DEFAULT)[source]

Translate large volumes of text data.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • location (str) – Optional. Project or location to make a call. Must refer to a caller’s project. Must be non-global.

  • source_language_code (str) – Required. Source language code.

  • target_language_codes (collections.abc.MutableSequence[str]) – Required. Specify up to 10 language codes here.

  • models (str | None) –

    Optional. The models to use for translation. Map’s key is target language code. Map’s value is model name. Value can be a built-in general model, or an AutoML Translation model. The value format depends on model type:

    • AutoML Translation models: projects/{project-number-or-id}/locations/{location-id}/models/{model-id}

    • General (built-in) models: projects/{project-number-or-id}/locations/{location-id}/models/general/nmt

    If the map is empty or a specific model is not requested for a language pair, then the default Google model (NMT) is used.

  • input_configs (collections.abc.MutableSequence[google.cloud.translate_v3.types.InputConfig | dict]) – Required. Input configurations. The total number of files matched should be <= 100. The total content size should be <= 100M Unicode codepoints. The files must use UTF-8 encoding.

  • output_config (google.cloud.translate_v3.types.OutputConfig | dict) – Required. Output configuration.

  • glossaries (collections.abc.MutableMapping[str, google.cloud.translate_v3.types.TranslateTextGlossaryConfig] | None) – Optional. Glossaries to be applied for translation. It’s keyed by target language code.

  • labels (collections.abc.MutableMapping[str, str] | None) – Optional. The labels with user-defined metadata for the request. See https://cloud.google.com/translate/docs/advanced/labels for more information.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Operation object with the batch text translate results, that are returned by batches as they are ready.

Return type

google.api_core.operation.Operation

create_dataset(*, project_id=PROVIDE_PROJECT_ID, location, dataset, timeout=DEFAULT, metadata=(), retry=DEFAULT)[source]

Create the translation dataset.

Parameters
  • dataset (dict | google.cloud.translate_v3.types.automl_translation.Dataset) – The dataset to create. If a dict is provided, it must correspond to the automl_translation.Dataset type.

  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

Returns

Operation object for the dataset to be created.

Return type

google.api_core.operation.Operation

get_dataset(dataset_id, project_id, location, retry=DEFAULT, timeout=DEFAULT, metadata=())[source]

Retrieve the dataset for the given dataset_id.

Parameters
  • dataset_id (str) – ID of translation dataset to be retrieved.

  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

Returns

automl_translation.Dataset instance.

Return type

google.cloud.translate_v3.types.automl_translation.Dataset

import_dataset_data(dataset_id, location, input_config, project_id=PROVIDE_PROJECT_ID, retry=DEFAULT, timeout=None, metadata=())[source]

Import data into the translation dataset.

Parameters
  • dataset_id (str) – ID of the translation dataset.

  • input_config (dict | google.cloud.translate_v3.types.DatasetInputConfig) – The desired input location and its domain specific semantics, if any. If a dict is provided, it must be of the same form as the protobuf message InputConfig.

  • project_id (str) – ID of the Google Cloud project where dataset is located if None then default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float | None) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

Returns

Operation object for the import data.

Return type

google.api_core.operation.Operation

list_datasets(project_id, location, retry=DEFAULT, timeout=DEFAULT, metadata=())[source]

List translation datasets in a project.

Parameters
  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

Returns

pagers.ListDatasetsPager instance, iterable object to retrieve the datasets list.

Return type

google.cloud.translate_v3.services.translation_service.pagers.ListDatasetsPager

delete_dataset(dataset_id, project_id, location, retry=DEFAULT, timeout=None, metadata=())[source]

Delete the translation dataset and all of its contents.

Parameters
  • dataset_id (str) – ID of dataset to be deleted.

  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float | None) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

Returns

Operation object with dataset deletion results, when finished.

Return type

google.api_core.operation.Operation

create_model(dataset_id, display_name, project_id, location, retry=DEFAULT, timeout=None, metadata=())[source]

Create the native model by training on translation dataset provided.

Parameters
  • dataset_id (str) – ID of dataset to be used for model training.

  • display_name (str) – Display name of the model trained. A-Z and a-z, underscores (_), and ASCII digits 0-9.

  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float | None) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

Returns

Operation object with the model creation results, when finished.

Return type

google.api_core.operation.Operation

get_model(model_id, project_id, location, retry=DEFAULT, timeout=DEFAULT, metadata=())[source]

Retrieve the dataset for the given model_id.

Parameters
  • model_id (str) – ID of translation model to be retrieved.

  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

Returns

automl_translation.Model instance.

Return type

google.cloud.translate_v3.types.automl_translation.Model

list_models(project_id, location, filter_str=None, page_size=None, retry=DEFAULT, timeout=DEFAULT, metadata=())[source]

List translation models in a project.

Parameters
  • project_id (str) – ID of the Google Cloud project where models are located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • filter_str (str | None) – An optional expression for filtering the models that will be returned. Supported filter: dataset_id=${dataset_id}.

  • page_size (int | None) – Optional custom page size value. The server can return fewer results than requested.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

Returns

pagers.ListDatasetsPager instance, iterable object to retrieve the datasets list.

Return type

google.cloud.translate_v3.services.translation_service.pagers.ListModelsPager

delete_model(model_id, project_id, location, retry=DEFAULT, timeout=None, metadata=())[source]

Delete the translation model and all of its contents.

Parameters
  • model_id (str) – ID of model to be deleted.

  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • timeout (float | None) – The amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

Returns

Operation object with dataset deletion results, when finished.

Return type

google.api_core.operation.Operation

translate_document(*, project_id=PROVIDE_PROJECT_ID, source_language_code=None, target_language_code, location=None, document_input_config, document_output_config, customized_attribution=None, is_translate_native_pdf_only=False, enable_shadow_removal_native_pdf=False, enable_rotation_correction=False, model=None, glossary_config=None, labels=None, timeout=DEFAULT, metadata=(), retry=DEFAULT)[source]

Translate the document provided.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • source_language_code (str | None) – Optional. The ISO-639 language code of the input document text if known. If the source language isn’t specified, the API attempts to identify the source language automatically and returns the source language within the response.

  • target_language_code (str) – Required. The ISO-639 language code to use for translation of the input document text.

  • location (str | None) – Optional. Project or location to make a call. Must refer to a caller’s project. If not specified, ‘global’ is used. Non-global location is required for requests using AutoML models or custom glossaries. Models and glossaries must be within the same region (have the same location-id).

  • document_input_config (google.cloud.translate_v3.types.DocumentInputConfig | dict) – A document translation request input config.

  • document_output_config (google.cloud.translate_v3.types.DocumentOutputConfig | dict | None) – Optional. A document translation request output config. If not provided the translated file will only be returned through a byte-stream and its output mime type will be the same as the input file’s mime type.

  • customized_attribution (str | None) – Optional. This flag is to support user customized attribution. If not provided, the default is Machine Translated by Google. Customized attribution should follow rules in https://cloud.google.com/translate/attribution#attribution_and_logos

  • is_translate_native_pdf_only (bool) – Optional. Param for external customers. If true, the page limit of online native PDF translation is 300 and only native PDF pages will be translated.

  • enable_shadow_removal_native_pdf (bool) – Optional. If true, use the text removal server to remove the shadow text on background image for native PDF translation. Shadow removal feature can only be enabled when both is_translate_native_pdf_only, pdf_native_only are False.

  • enable_rotation_correction (bool) – Optional. If true, enable auto rotation correction in DVS.

  • model (str | None) –

    Optional. The model type requested for this translation. If not provided, the default Google model (NMT) will be used. The format depends on model type:

    • AutoML Translation models: projects/{project-number-or-id}/locations/{location-id}/models/{model-id}

    • General (built-in) models: projects/{project-number-or-id}/locations/{location-id}/models/general/nmt,

    If not provided, the default Google model (NMT) will be used for translation.

  • glossary_config (google.cloud.translate_v3.types.TranslateTextGlossaryConfig | None) – Optional. Glossary to be applied. The glossary must be within the same region (have the same location-id) as the model.

  • labels (str | None) – Optional. The labels with user-defined metadata for the request. See https://cloud.google.com/translate/docs/advanced/labels for more information.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Translate document result from the API response.

Return type

google.cloud.translate_v3.types.TranslateDocumentResponse

batch_translate_document(*, project_id=PROVIDE_PROJECT_ID, source_language_code, target_language_codes=None, location=None, input_configs, output_config, customized_attribution=None, format_conversions=None, enable_shadow_removal_native_pdf=False, enable_rotation_correction=False, models=None, glossaries=None, timeout=DEFAULT, metadata=(), retry=DEFAULT)[source]

Translate documents batch by configs provided.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • source_language_code (str) – Optional. The ISO-639 language code of the input text if known. If the source language isn’t specified, the API attempts to identify the source language automatically and returns the source language within the response.

  • target_language_codes (collections.abc.MutableSequence[str] | None) – Required. The ISO-639 language code to use for translation of the input document. Specify up to 10 language codes here.

  • location (str | None) – Optional. Project or location to make a call. Must refer to a caller’s project. If not specified, ‘global’ is used. Non-global location is required for requests using AutoML models or custom glossaries. Models and glossaries must be within the same region (have the same location-id).

  • input_configs (collections.abc.MutableSequence[google.cloud.translate_v3.types.BatchDocumentInputConfig | dict]) – Input configurations. The total number of files matched should be <= 100. The total content size to translate should be <= 100M Unicode codepoints. The files must use UTF-8 encoding.

  • output_config (google.cloud.translate_v3.types.BatchDocumentOutputConfig | dict) – Output configuration. If 2 input configs match to the same file (that is, same input path), no output for duplicate inputs will be generated.

  • format_conversions (collections.abc.MutableMapping[str, str] | None) –

    Optional. The file format conversion map that is applied to all input files. The map key is the original mime_type. The map value is the target mime_type of translated documents. Supported file format conversion includes:

    • application/pdf to application/vnd.openxmlformats-officedocument.wordprocessingml.document

    If nothing specified, output files will be in the same format as the original file.

  • customized_attribution (str | None) – Optional. This flag is to support user customized attribution. If not provided, the default is Machine Translated by Google. Customized attribution should follow rules in https://cloud.google.com/translate/attribution#attribution_and_logos

  • enable_shadow_removal_native_pdf (bool) – Optional. If true, use the text removal server to remove the shadow text on background image for native PDF translation. Shadow removal feature can only be enabled when both is_translate_native_pdf_only, pdf_native_only are False.

  • enable_rotation_correction (bool) – Optional. If true, enable auto rotation correction in DVS.

  • models (collections.abc.MutableMapping[str, str] | None) –

    Optional. The models to use for translation. Map’s key is target language code. Map’s value is the model name. Value can be a built-in general model, or an AutoML Translation model. The value format depends on model type:

    • AutoML Translation models: projects/{project-number-or-id}/locations/{location-id}/models/{model-id}

    • General (built-in) models: projects/{project-number-or-id}/locations/{location-id}/models/general/nmt,

    If the map is empty or a specific model is not requested for a language pair, then default google model (NMT) is used.

  • glossaries (collections.abc.MutableMapping[str, google.cloud.translate_v3.types.TranslateTextGlossaryConfig] | None) – Glossaries to be applied. It’s keyed by target language code.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Returns

Batch translate document result from the API response.

Return type

google.api_core.operation.Operation

Was this entry helpful?