Complete the airflow survey & get a free airflow 3 certification!

airflow.providers.google.cloud.operators.vertex_ai.dataset

This module contains Google Vertex AI operators.

Classes

CreateDatasetOperator

Creates a Dataset.

GetDatasetOperator

Get a Dataset.

DeleteDatasetOperator

Deletes a Dataset.

ExportDataOperator

Exports data from a Dataset.

DatasetImportDataResultsCheckHelper

Helper utils to verify import dataset data results.

ImportDataOperator

Imports data into a Dataset.

ListDatasetsOperator

Lists Datasets in a Location.

UpdateDatasetOperator

Updates a Dataset.

Module Contents

class airflow.providers.google.cloud.operators.vertex_ai.dataset.CreateDatasetOperator(*, region, project_id, dataset, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Creates a Dataset.

Parameters:
  • project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • dataset (google.cloud.aiplatform_v1.types.Dataset | dict) – Required. The Dataset to create. This corresponds to the dataset field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ('region', 'project_id', 'impersonation_chain')[source]
region[source]
project_id[source]
dataset[source]
retry[source]
timeout = None[source]
metadata = ()[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]

Override this method to include parameters for link formatting in extra links.

For example; most of the links on the Google provider require project_id and location in the Link. To be not repeat; you can override this function and return something like the following:

{
    "project_id": self.project_id,
    "location": self.location,
}
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.vertex_ai.dataset.GetDatasetOperator(*, region, project_id, dataset_id, read_mask=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Get a Dataset.

Parameters:
  • project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • dataset_id (str) – Required. The ID of the Dataset to get.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ('region', 'dataset_id', 'project_id', 'impersonation_chain')[source]
region[source]
project_id[source]
dataset_id[source]
read_mask = None[source]
retry[source]
timeout = None[source]
metadata = ()[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]

Override this method to include parameters for link formatting in extra links.

For example; most of the links on the Google provider require project_id and location in the Link. To be not repeat; you can override this function and return something like the following:

{
    "project_id": self.project_id,
    "location": self.location,
}
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.vertex_ai.dataset.DeleteDatasetOperator(*, region, project_id, dataset_id, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Deletes a Dataset.

Parameters:
  • project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • dataset_id (str) – Required. The ID of the Dataset to delete.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ('region', 'dataset_id', 'project_id', 'impersonation_chain')[source]
region[source]
project_id[source]
dataset_id[source]
retry[source]
timeout = None[source]
metadata = ()[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.vertex_ai.dataset.ExportDataOperator(*, region, project_id, dataset_id, export_config, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Exports data from a Dataset.

Parameters:
  • project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • dataset_id (str) – Required. The ID of the Dataset to delete.

  • export_config (google.cloud.aiplatform_v1.types.ExportDataConfig | dict) – Required. The desired output location.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ('region', 'dataset_id', 'project_id', 'impersonation_chain')[source]
region[source]
project_id[source]
dataset_id[source]
export_config[source]
retry[source]
timeout = None[source]
metadata = ()[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.vertex_ai.dataset.DatasetImportDataResultsCheckHelper[source]

Helper utils to verify import dataset data results.

class airflow.providers.google.cloud.operators.vertex_ai.dataset.ImportDataOperator(*, region, project_id, dataset_id, import_configs, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, raise_for_empty_result=False, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator, DatasetImportDataResultsCheckHelper

Imports data into a Dataset.

Parameters:
  • project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.

  • region (str) – Required. The Cloud Dataproc region in which to handle the request.

  • dataset_id (str) – Required. The ID of the Dataset to delete.

  • import_configs (collections.abc.Sequence[google.cloud.aiplatform_v1.types.ImportDataConfig] | list) – Required. The desired input locations. The contents of all input locations will be imported in one batch.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • raise_for_empty_result (bool) – Raise an error if no additional data has been populated after the import.

template_fields = ('region', 'dataset_id', 'project_id', 'impersonation_chain')[source]
region[source]
project_id[source]
dataset_id[source]
import_configs[source]
retry[source]
timeout = None[source]
metadata = ()[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]
raise_for_empty_result = False[source]
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.vertex_ai.dataset.ListDatasetsOperator(*, region, project_id, filter=None, page_size=None, page_token=None, read_mask=None, order_by=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Lists Datasets in a Location.

Parameters:
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • filter (str | None) – The standard list filter.

  • page_size (int | None) – The standard list page size.

  • page_token (str | None) – The standard list page token.

  • read_mask (str | None) – Mask specifying which fields to read.

  • order_by (str | None) – A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ('region', 'project_id', 'impersonation_chain')[source]
region[source]
project_id[source]
filter = None[source]
page_size = None[source]
page_token = None[source]
read_mask = None[source]
order_by = None[source]
retry[source]
timeout = None[source]
metadata = ()[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]

Override this method to include parameters for link formatting in extra links.

For example; most of the links on the Google provider require project_id and location in the Link. To be not repeat; you can override this function and return something like the following:

{
    "project_id": self.project_id,
    "location": self.location,
}
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.vertex_ai.dataset.UpdateDatasetOperator(*, project_id, region, dataset_id, dataset, update_mask, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Updates a Dataset.

Parameters:
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset_id (str) – Required. The ID of the Dataset to update.

  • dataset (google.cloud.aiplatform_v1.types.Dataset | dict) – Required. The Dataset which replaces the resource on the server.

  • update_mask (google.protobuf.field_mask_pb2.FieldMask | dict) – Required. The update mask applies to the resource.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ('region', 'dataset_id', 'project_id', 'impersonation_chain')[source]
project_id[source]
region[source]
dataset_id[source]
dataset[source]
update_mask[source]
retry[source]
timeout = None[source]
metadata = ()[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?