airflow.providers.google.cloud.hooks.vertex_ai.dataset

This module contains a Google Cloud Vertex AI hook.

Module Contents

Classes

DatasetHook

Hook for Google Cloud Vertex AI Dataset APIs.

class airflow.providers.google.cloud.hooks.vertex_ai.dataset.DatasetHook(**kwargs)[source]

Bases: airflow.providers.google.common.hooks.base_google.GoogleBaseHook

Hook for Google Cloud Vertex AI Dataset APIs.

get_dataset_service_client(region=None)[source]

Return DatasetServiceClient.

wait_for_operation(operation, timeout=None)[source]

Wait for long-lasting operation to complete.

static extract_dataset_id(obj)[source]

Return unique id of the dataset.

create_dataset(project_id, region, dataset, retry=DEFAULT, timeout=None, metadata=())[source]

Create a Dataset.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset (google.cloud.aiplatform_v1.types.Dataset | dict) – Required. The Dataset to create.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

delete_dataset(project_id, region, dataset, retry=DEFAULT, timeout=None, metadata=())[source]

Delete a Dataset.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset (str) – Required. The ID of the Dataset to delete.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

export_data(project_id, region, dataset, export_config, retry=DEFAULT, timeout=None, metadata=())[source]

Export data from a Dataset.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset (str) – Required. The ID of the Dataset to export.

  • export_config (google.cloud.aiplatform_v1.types.ExportDataConfig | dict) – Required. The desired output location.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

get_annotation_spec(project_id, region, dataset, annotation_spec, read_mask=None, retry=DEFAULT, timeout=None, metadata=())[source]

Get an AnnotationSpec.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset (str) – Required. The ID of the Dataset.

  • annotation_spec (str) – The ID of the AnnotationSpec resource.

  • read_mask (str | None) – Optional. Mask specifying which fields to read.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

get_dataset(project_id, region, dataset, read_mask=None, retry=DEFAULT, timeout=None, metadata=())[source]

Get a Dataset.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset (str) – Required. The ID of the Dataset to export.

  • read_mask (str | None) – Optional. Mask specifying which fields to read.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

import_data(project_id, region, dataset, import_configs, retry=DEFAULT, timeout=None, metadata=())[source]

Import data into a Dataset.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset (str) – Required. The ID of the Dataset to import.

  • import_configs (Sequence[google.cloud.aiplatform_v1.types.ImportDataConfig]) – Required. The desired input locations. The contents of all input locations will be imported in one batch.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

list_annotations(project_id, region, dataset, data_item, filter=None, page_size=None, page_token=None, read_mask=None, order_by=None, retry=DEFAULT, timeout=None, metadata=())[source]

List Annotations belongs to a data item.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset (str) – Required. The ID of the Dataset.

  • data_item (str) – Required. The ID of the DataItem to list Annotations from.

  • filter (str | None) – The standard list filter.

  • page_size (int | None) – The standard list page size.

  • page_token (str | None) – The standard list page token.

  • read_mask (str | None) – Mask specifying which fields to read.

  • order_by (str | None) – A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

list_data_items(project_id, region, dataset, filter=None, page_size=None, page_token=None, read_mask=None, order_by=None, retry=DEFAULT, timeout=None, metadata=())[source]

List DataItems in a Dataset.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset (str) – Required. The ID of the Dataset.

  • filter (str | None) – The standard list filter.

  • page_size (int | None) – The standard list page size.

  • page_token (str | None) – The standard list page token.

  • read_mask (str | None) – Mask specifying which fields to read.

  • order_by (str | None) – A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

list_datasets(project_id, region, filter=None, page_size=None, page_token=None, read_mask=None, order_by=None, retry=DEFAULT, timeout=None, metadata=())[source]

List Datasets in a Location.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • filter (str | None) – The standard list filter.

  • page_size (int | None) – The standard list page size.

  • page_token (str | None) – The standard list page token.

  • read_mask (str | None) – Mask specifying which fields to read.

  • order_by (str | None) – A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

update_dataset(project_id, region, dataset_id, dataset, update_mask, retry=DEFAULT, timeout=None, metadata=())[source]

Update a Dataset.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • dataset_id (str) – Required. The ID of the Dataset.

  • dataset (google.cloud.aiplatform_v1.types.Dataset | dict) – Required. The Dataset which replaces the resource on the server.

  • update_mask (google.protobuf.field_mask_pb2.FieldMask | dict) – Required. The update mask applies to the resource.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

Was this entry helpful?