airflow.providers.google.cloud.operators.vertex_ai.dataset¶
This module contains Google Vertex AI operators.
Classes¶
Creates a Dataset. |
|
Get a Dataset. |
|
Deletes a Dataset. |
|
Exports data from a Dataset. |
|
Helper utils to verify import dataset data results. |
|
Imports data into a Dataset. |
|
Lists Datasets in a Location. |
|
Updates a Dataset. |
Module Contents¶
- class airflow.providers.google.cloud.operators.vertex_ai.dataset.CreateDatasetOperator(*, region, project_id, dataset, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorCreates a Dataset.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.
region (str) – Required. The Cloud Dataproc region in which to handle the request.
dataset (google.cloud.aiplatform_v1.types.Dataset | dict) – Required. The Dataset to create. This corresponds to the
datasetfield on therequestinstance; ifrequestis provided, this should not be set.retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- property extra_links_params: dict[str, Any][source]¶
Override this method to include parameters for link formatting in extra links.
For example; most of the links on the Google provider require project_id and location in the Link. To be not repeat; you can override this function and return something like the following:
{ "project_id": self.project_id, "location": self.location, }
- class airflow.providers.google.cloud.operators.vertex_ai.dataset.GetDatasetOperator(*, region, project_id, dataset_id, read_mask=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorGet a Dataset.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.
region (str) – Required. The Cloud Dataproc region in which to handle the request.
dataset_id (str) – Required. The ID of the Dataset to get.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- property extra_links_params: dict[str, Any][source]¶
Override this method to include parameters for link formatting in extra links.
For example; most of the links on the Google provider require project_id and location in the Link. To be not repeat; you can override this function and return something like the following:
{ "project_id": self.project_id, "location": self.location, }
- class airflow.providers.google.cloud.operators.vertex_ai.dataset.DeleteDatasetOperator(*, region, project_id, dataset_id, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorDeletes a Dataset.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.
region (str) – Required. The Cloud Dataproc region in which to handle the request.
dataset_id (str) – Required. The ID of the Dataset to delete.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.vertex_ai.dataset.ExportDataOperator(*, region, project_id, dataset_id, export_config, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorExports data from a Dataset.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.
region (str) – Required. The Cloud Dataproc region in which to handle the request.
dataset_id (str) – Required. The ID of the Dataset to delete.
export_config (google.cloud.aiplatform_v1.types.ExportDataConfig | dict) – Required. The desired output location.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- class airflow.providers.google.cloud.operators.vertex_ai.dataset.DatasetImportDataResultsCheckHelper[source]¶
Helper utils to verify import dataset data results.
- class airflow.providers.google.cloud.operators.vertex_ai.dataset.ImportDataOperator(*, region, project_id, dataset_id, import_configs, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, raise_for_empty_result=False, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator,DatasetImportDataResultsCheckHelperImports data into a Dataset.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project the cluster belongs to.
region (str) – Required. The Cloud Dataproc region in which to handle the request.
dataset_id (str) – Required. The ID of the Dataset to delete.
import_configs (collections.abc.Sequence[google.cloud.aiplatform_v1.types.ImportDataConfig] | list) – Required. The desired input locations. The contents of all input locations will be imported in one batch.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
raise_for_empty_result (bool) – Raise an error if no additional data has been populated after the import.
- class airflow.providers.google.cloud.operators.vertex_ai.dataset.ListDatasetsOperator(*, region, project_id, filter=None, page_size=None, page_token=None, read_mask=None, order_by=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorLists Datasets in a Location.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
filter (str | None) – The standard list filter.
page_size (int | None) – The standard list page size.
page_token (str | None) – The standard list page token.
read_mask (str | None) – Mask specifying which fields to read.
order_by (str | None) – A comma-separated list of fields to order by, sorted in ascending order. Use “desc” after a field name for descending.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- property extra_links_params: dict[str, Any][source]¶
Override this method to include parameters for link formatting in extra links.
For example; most of the links on the Google provider require project_id and location in the Link. To be not repeat; you can override this function and return something like the following:
{ "project_id": self.project_id, "location": self.location, }
- class airflow.providers.google.cloud.operators.vertex_ai.dataset.UpdateDatasetOperator(*, project_id, region, dataset_id, dataset, update_mask, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperatorUpdates a Dataset.
- Parameters:
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
dataset_id (str) – Required. The ID of the Dataset to update.
dataset (google.cloud.aiplatform_v1.types.Dataset | dict) – Required. The Dataset which replaces the resource on the server.
update_mask (google.protobuf.field_mask_pb2.FieldMask | dict) – Required. The update mask applies to the resource.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).