airflow.providers.google.cloud.operators.translate

This module contains Google Translate operators.

Module Contents

Classes

CloudTranslateTextOperator

Translate a string or list of strings.

TranslateTextOperator

Translate text content of moderate amount, for larger volumes of text please use the TranslateTextBatchOperator.

TranslateTextBatchOperator

Translate large volumes of text content, by the inputs provided.

TranslateCreateDatasetOperator

Create a Google Cloud Translate dataset.

TranslateDatasetsListOperator

Get a list of native Google Cloud Translation datasets in a project.

TranslateImportDataOperator

Import data to the translation dataset.

TranslateDeleteDatasetOperator

Delete translation dataset and all of its contents.

class airflow.providers.google.cloud.operators.translate.CloudTranslateTextOperator(*, values, target_language, format_, source_language, model, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Translate a string or list of strings.

See also

For more information on how to use this operator, take a look at the guide: CloudTranslateTextOperator

See https://cloud.google.com/translate/docs/translating-text

Execute method returns str or list.

This is a list of dictionaries for each queried value. Each dictionary typically contains three keys (though not all will be present in all cases):

  • detectedSourceLanguage: The detected language (as an ISO 639-1 language code) of the text.

  • translatedText: The translation of the text into the target language.

  • input: The corresponding input value.

  • model: The model used to translate the text.

If only a single value is passed, then only a single dictionary is set as the XCom return value.

Parameters
  • values (list[str] | str) – String or list of strings to translate.

  • target_language (str) – The language to translate results into. This is required by the API.

  • format – (Optional) One of text or html, to specify if the input text is plain text or HTML.

  • source_language (str | None) – (Optional) The language of the text to be translated.

  • model (str) – (Optional) The model used to translate the text, such as 'base' or 'nmt'.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with the first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('values', 'target_language', 'format_', 'source_language', 'model', 'gcp_conn_id',...[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.translate.TranslateTextOperator(*, contents, source_language_code=None, target_language_code, mime_type=None, location=None, project_id=PROVIDE_PROJECT_ID, model=None, transliteration_config=None, glossary_config=None, labels=None, timeout=DEFAULT, retry=DEFAULT, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Translate text content of moderate amount, for larger volumes of text please use the TranslateTextBatchOperator.

Wraps the Google cloud Translate Text (Advanced) functionality. See https://cloud.google.com/translate/docs/advanced/translating-text-v3

For more information on how to use this operator, take a look at the guide:

TranslateTextOperator.

Parameters
  • project_id (str) – Optional. The ID of the Google Cloud project that the service belongs to.

  • location (str | None) – optional. The ID of the Google Cloud location that the service belongs to. if not specified, ‘global’ is used. Non-global location is required for requests using AutoML models or custom glossaries.

  • contents (collections.abc.Sequence[str]) – Required. The sequence of content strings to be translated. Limited to 1024 items with 30_000 codepoints total recommended.

  • mime_type (str | None) – Optional. The format of the source text, If left blank, the MIME type defaults to “text/html”.

  • source_language_code (str | None) – Optional. The ISO-639 language code of the input text if known. If not specified, attempted to recognize automatically.

  • target_language_code (str) – Required. The ISO-639 language code to use for translation of the input text.

  • model (str | None) –

    Optional. The model type requested for this translation. If not provided, the default Google model (NMT) will be used. The format depends on model type:

    • AutoML Translation models: projects/{project-number-or-id}/locations/{location-id}/models/{model-id}

    • General (built-in) models: projects/{project-number-or-id}/locations/{location-id}/models/general/nmt

    • Translation LLM models: projects/{project-number-or-id}/locations/{location-id}/models/general/translation-llm

    For global (non-region) requests, use ‘global’ location-id.

  • glossary_config (google.cloud.translate_v3.types.TranslateTextGlossaryConfig | None) – Optional. Glossary to be applied.

  • transliteration_config (google.cloud.translate_v3.types.TransliterationConfig | None) – Optional. Transliteration to be applied.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('contents', 'target_language_code', 'mime_type', 'source_language_code', 'model',...[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.translate.TranslateTextBatchOperator(*, project_id=PROVIDE_PROJECT_ID, location, target_language_codes, source_language_code, input_configs, output_config, models=None, glossaries=None, labels=None, metadata=(), timeout=DEFAULT, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Translate large volumes of text content, by the inputs provided.

Wraps the Google cloud Translate Text (Advanced) functionality. See https://cloud.google.com/translate/docs/advanced/batch-translation

For more information on how to use this operator, take a look at the guide: TranslateTextBatchOperator.

Parameters
  • project_id (str) – Optional. The ID of the Google Cloud project that the service belongs to. If not specified the hook project_id will be used.

  • location (str) – required. The ID of the Google Cloud location, (non-global) that the service belongs to.

  • source_language_code (str) – Required. Source language code.

  • target_language_codes (collections.abc.MutableSequence[str]) – Required. Up to 10 language codes allowed here.

  • input_configs (collections.abc.MutableSequence[google.cloud.translate_v3.types.InputConfig | dict]) – Required. Input configurations. The total number of files matched should be <=100. The total content size should be <= 100M Unicode codepoints. The files must use UTF-8 encoding.

  • models (str | None) –

    Optional. The models to use for translation. Map’s key is target language code. Map’s value is model name. Value can be a built-in general model, or an AutoML Translation model. The value format depends on model type:

    • AutoML Translation models: projects/{project-number-or-id}/locations/{location-id}/models/{model-id}

    • General (built-in) models: projects/{project-number-or-id}/locations/{location-id}/models/general/nmt

    If the map is empty or a specific model is not requested for a language pair, then the default Google model (NMT) is used.

  • output_config (google.cloud.translate_v3.types.OutputConfig | dict) – Required. Output configuration.

  • glossaries (collections.abc.MutableMapping[str, google.cloud.translate_v3.types.TranslateTextGlossaryConfig] | None) – Optional. Glossaries to be applied for translation. It’s keyed by target language code.

  • labels (collections.abc.MutableMapping[str, str] | None) – Optional. The labels with user-defined metadata. See https://cloud.google.com/translate/docs/advanced/labels for more information.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('input_configs', 'target_language_codes', 'source_language_code', 'models', 'glossaries',...[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.translate.TranslateCreateDatasetOperator(*, project_id=PROVIDE_PROJECT_ID, location, dataset, metadata=(), timeout=DEFAULT, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Create a Google Cloud Translate dataset.

Creates a native translation dataset, using API V3. For more information on how to use this operator, take a look at the guide: TranslateCreateDatasetOperator.

Parameters
  • dataset (dict | google.cloud.translate_v3.types.automl_translation.Dataset) – The dataset to create. If a dict is provided, it must correspond to the automl_translation.Dataset type.

  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('dataset', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.translate.TranslateDatasetsListOperator(*, project_id=PROVIDE_PROJECT_ID, location, metadata=(), timeout=DEFAULT, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Get a list of native Google Cloud Translation datasets in a project.

Get project’s list of native translation datasets, using API V3. For more information on how to use this operator, take a look at the guide: TranslateDatasetsListOperator.

Parameters
  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.translate.TranslateImportDataOperator(*, dataset_id, location, input_config, project_id=PROVIDE_PROJECT_ID, metadata=(), timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Import data to the translation dataset.

Loads data to the translation dataset, using API V3. For more information on how to use this operator, take a look at the guide: TranslateImportDataOperator.

Parameters
  • dataset_id (str) – The dataset_id of target native dataset to import data to.

  • input_config (dict | google.cloud.translate_v3.types.DatasetInputConfig) – The desired input location of translations language pairs file. If a dict provided, must follow the structure of DatasetInputConfig. If a dict is provided, it must be of the same form as the protobuf message InputConfig.

  • project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('dataset_id', 'input_config', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.translate.TranslateDeleteDatasetOperator(*, dataset_id, location, project_id=PROVIDE_PROJECT_ID, metadata=(), timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Delete translation dataset and all of its contents.

Deletes the translation dataset and it’s data, using API V3. For more information on how to use this operator, take a look at the guide: TranslateDeleteDatasetOperator.

Parameters
  • dataset_id (str) – The dataset_id of target native dataset to be deleted.

  • location (str) – The location of the project.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('dataset_id', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?