airflow.providers.google.cloud.operators.dataproc_metastore

This module contains Google Dataproc Metastore operators.

Module Contents

Classes

DataprocMetastoreLink

Helper class for constructing Dataproc Metastore resource link.

DataprocMetastoreDetailedLink

Helper class for constructing Dataproc Metastore detailed resource link.

DataprocMetastoreCreateBackupOperator

Create a new backup in a given project and location.

DataprocMetastoreCreateMetadataImportOperator

Create a new MetadataImport in a given project and location.

DataprocMetastoreCreateServiceOperator

Create a metastore service in a project and location.

DataprocMetastoreDeleteBackupOperator

Delete a single backup.

DataprocMetastoreDeleteServiceOperator

Delete a single service.

DataprocMetastoreExportMetadataOperator

Export metadata from a service.

DataprocMetastoreGetServiceOperator

Get the details of a single service.

DataprocMetastoreListBackupsOperator

List backups in a service.

DataprocMetastoreRestoreServiceOperator

Restore a service from a backup.

DataprocMetastoreUpdateServiceOperator

Update the parameters of a single service.

Attributes

BASE_LINK

METASTORE_BASE_LINK

METASTORE_BACKUP_LINK

METASTORE_BACKUPS_LINK

METASTORE_EXPORT_LINK

METASTORE_IMPORT_LINK

METASTORE_SERVICE_LINK

Bases: airflow.models.BaseOperatorLink

Helper class for constructing Dataproc Metastore resource link.

name = 'Dataproc Metastore'[source]
key = 'conf'[source]
static persist(context, task_instance, url)[source]

Link to external system.

Note: The old signature of this function was (self, operator, dttm: datetime). That is still supported at runtime but is deprecated.

Parameters
Returns

link to external system

Return type

str

Bases: airflow.models.BaseOperatorLink

Helper class for constructing Dataproc Metastore detailed resource link.

name = 'Dataproc Metastore resource'[source]
key = 'config'[source]
static persist(context, task_instance, url, resource)[source]

Link to external system.

Note: The old signature of this function was (self, operator, dttm: datetime). That is still supported at runtime but is deprecated.

Parameters
Returns

link to external system

Return type

str

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreCreateBackupOperator(*, project_id, region, service_id, backup, backup_id, request_id=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Create a new backup in a given project and location.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • backup (dict | google.cloud.metastore_v1.types.Backup) –

    Required. The backup to create. The name field is ignored. The ID of the created backup must be provided in the request’s backup_id field.

    This corresponds to the backup field on the request instance; if request is provided, this should not be set.

  • backup_id (str) –

    Required. The ID of the backup, which is used as the final component of the backup’s name. This value must be between 1 and 64 characters long, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the backup_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Optional. Designation of what errors, if any, should be retried.

  • timeout (float | None) – Optional. The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Optional. Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('project_id', 'backup', 'impersonation_chain')[source]
template_fields_renderers[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreCreateMetadataImportOperator(*, project_id, region, service_id, metadata_import, metadata_import_id, request_id=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Create a new MetadataImport in a given project and location.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • metadata_import (dict | google.cloud.metastore_v1.types.MetadataImport) –

    Required. The metadata import to create. The name field is ignored. The ID of the created metadata import must be provided in the request’s metadata_import_id field.

    This corresponds to the metadata_import field on the request instance; if request is provided, this should not be set.

  • metadata_import_id (str) –

    Required. The ID of the metadata import, which is used as the final component of the metadata import’s name. This value must be between 1 and 64 characters long, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the metadata_import_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Optional. Designation of what errors, if any, should be retried.

  • timeout (float | None) – Optional. The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Optional. Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('project_id', 'metadata_import', 'impersonation_chain')[source]
template_fields_renderers[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreCreateServiceOperator(*, region, project_id, service, service_id, request_id=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Create a metastore service in a project and location.

Parameters
  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • service (dict | google.cloud.metastore_v1.types.Service) –

    Required. The Metastore service to create. The name field is ignored. The ID of the created metastore service must be provided in the request’s service_id field.

    This corresponds to the service field on the request instance; if request is provided, this should not be set.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('project_id', 'service', 'impersonation_chain')[source]
template_fields_renderers[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreDeleteBackupOperator(*, project_id, region, service_id, backup_id, request_id=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Delete a single backup.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the backup belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the backup belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • backup_id (str) –

    Required. The ID of the backup, which is used as the final component of the backup’s name. This value must be between 1 and 64 characters long, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the backup_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Optional. Designation of what errors, if any, should be retried.

  • timeout (float | None) – Optional. The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Optional. Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('project_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreDeleteServiceOperator(*, region, project_id, service_id, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Delete a single service.

Parameters
  • request – The request object. Request message for [DataprocMetastore.DeleteService][google.cloud.metastore.v1.DataprocMetastore.DeleteService].

  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) –

template_fields: Sequence[str] = ('project_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreExportMetadataOperator(*, destination_gcs_folder, project_id, region, service_id, request_id=None, database_dump_type=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Export metadata from a service.

Parameters
  • destination_gcs_folder (str) – A Cloud Storage URI of a folder, in the format gs://<bucket_name>/<path_inside_bucket>. A sub-folder <export_folder> containing exported files will be created below it.

  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) – Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens. This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Optional. Designation of what errors, if any, should be retried.

  • timeout (float | None) – Optional. The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Optional. Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('project_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreGetServiceOperator(*, region, project_id, service_id, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Get the details of a single service.

Parameters
  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('project_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreListBackupsOperator(*, project_id, region, service_id, page_size=None, page_token=None, filter=None, order_by=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

List backups in a service.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the backup belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the backup belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Optional. Designation of what errors, if any, should be retried.

  • timeout (float | None) – Optional. The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Optional. Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('project_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreRestoreServiceOperator(*, project_id, region, service_id, backup_project_id, backup_region, backup_service_id, backup_id, restore_type=None, request_id=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Restore a service from a backup.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • backup_project_id (str) – Required. The ID of the Google Cloud project that the metastore service backup to restore from.

  • backup_region (str) – Required. The ID of the Google Cloud region that the metastore service backup to restore from.

  • backup_service_id (str) – Required. The ID of the metastore service backup to restore from, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

  • backup_id (str) – Required. The ID of the metastore service backup to restore from

  • restore_type (google.cloud.metastore_v1.types.metastore.Restore | None) – Optional. The type of restore. If unspecified, defaults to METADATA_ONLY

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Optional. Designation of what errors, if any, should be retried.

  • timeout (float | None) – Optional. The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Optional. Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('project_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataproc_metastore.DataprocMetastoreUpdateServiceOperator(*, project_id, region, service_id, service, update_mask, request_id=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Update the parameters of a single service.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • service (dict | google.cloud.metastore_v1.types.Service) –

    Required. The metastore service to update. The server only merges fields in the service if they are specified in update_mask.

    The metastore service’s name field is used to identify the metastore service to be updated.

    This corresponds to the service field on the request instance; if request is provided, this should not be set.

  • update_mask (google.protobuf.field_mask_pb2.FieldMask) –

    Required. A field mask used to specify the fields to be overwritten in the metastore service resource by the update. Fields specified in the update_mask are relative to the resource (not to the full request). A field is overwritten if it is in the mask.

    This corresponds to the update_mask field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Optional. Designation of what errors, if any, should be retried.

  • timeout (float | None) – Optional. The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Optional. Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: Sequence[str] = ('project_id', 'impersonation_chain')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?