airflow.providers.google.cloud.operators.vertex_ai.pipeline_job

This module contains Google Vertex AI operators.

Module Contents

Classes

RunPipelineJobOperator

Create and run a Pipeline job.

GetPipelineJobOperator

Get a Pipeline job.

ListPipelineJobOperator

Lists PipelineJob in a Location.

DeletePipelineJobOperator

Delete a Pipeline job.

class airflow.providers.google.cloud.operators.vertex_ai.pipeline_job.RunPipelineJobOperator(*, project_id, region, display_name, template_path, job_id=None, pipeline_root=None, parameter_values=None, input_artifacts=None, enable_caching=None, encryption_spec_key_name=None, labels=None, failure_policy=None, service_account=None, network=None, create_request_timeout=None, experiment=None, gcp_conn_id='google_cloud_default', impersonation_chain=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), poll_interval=5 * 60, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Create and run a Pipeline job.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • display_name (str) – Required. The user-defined name of this Pipeline.

  • template_path (str) – Required. The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. “gs://project.name”), an Artifact Registry URI (e.g. “https://us-central1-kfp.pkg.dev/proj/repo/pack/latest”), or an HTTPS URI.

  • job_id (str | None) – Optional. The unique ID of the job run. If not specified, pipeline name + timestamp will be used.

  • pipeline_root (str | None) – Optional. The root of the pipeline outputs. If not set, the staging bucket set in aiplatform.init will be used. If that’s not set a pipeline-specific artifacts bucket will be used.

  • parameter_values (dict[str, Any] | None) – Optional. The mapping from runtime parameter names to its values that control the pipeline run.

  • input_artifacts (dict[str, str] | None) – Optional. The mapping from the runtime parameter name for this artifact to its resource id. For example: “vertex_model”:”456”. Note: full resource name (“projects/123/locations/us-central1/metadataStores/default/artifacts/456”) cannot be used.

  • enable_caching (bool | None) – Optional. Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.

  • encryption_spec_key_name (str | None) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the job. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If this is set, then all resources created by the PipelineJob will be encrypted with the provided encryption key. Overrides encryption_spec_key_name set in aiplatform.init.

  • labels (dict[str, str] | None) – Optional. The user defined metadata to organize PipelineJob.

  • failure_policy (str | None) – Optional. The failure policy - “slow” or “fast”. Currently, the default of a pipeline is that the pipeline will continue to run until no more tasks can be executed, also known as PIPELINE_FAILURE_POLICY_FAIL_SLOW (corresponds to “slow”). However, if a pipeline is set to PIPELINE_FAILURE_POLICY_FAIL_FAST (corresponds to “fast”), it will stop scheduling any new tasks when a task has failed. Any scheduled tasks will continue to completion.

  • service_account (str | None) – Optional. Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.

  • network (str | None) – Optional. The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the network set in aiplatform.init will be used. Otherwise, the job is not peered with any network.

  • create_request_timeout (float | None) – Optional. The timeout for the create request in seconds.

  • experiment (str | google.cloud.aiplatform.metadata.experiment_resources.Experiment | None) – Optional. The Vertex AI experiment name or instance to associate to this PipelineJob. Metrics produced by the PipelineJob as system.Metric Artifacts will be associated as metrics to the current Experiment Run. Pipeline parameters will be associated as parameters to the current Experiment Run.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • deferrable (bool) – If True, run the task in the deferrable mode. Note that it requires calling the operator with sync=False parameter.

  • poll_interval (int) – Time (seconds) to wait between two consecutive calls to check the job. The default is 300 seconds.

template_fields = ['region', 'project_id', 'input_artifacts', 'impersonation_chain'][source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event)[source]
on_kill()[source]

Act as a callback called when the operator is killed; cancel any running job.

hook()[source]
class airflow.providers.google.cloud.operators.vertex_ai.pipeline_job.GetPipelineJobOperator(*, project_id, region, pipeline_job_id, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Get a Pipeline job.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • pipeline_job_id (str) – Required. The ID of the PipelineJob resource.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ['region', 'pipeline_job_id', 'project_id', 'impersonation_chain'][source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.vertex_ai.pipeline_job.ListPipelineJobOperator(*, region, project_id, page_size=None, page_token=None, filter=None, order_by=None, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Lists PipelineJob in a Location.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • filter (str | None) –

    Optional. Lists the PipelineJobs that match the filter expression. The following fields are supported:

    • pipeline_name: Supports = and != comparisons.

    • display_name: Supports =, != comparisons, and

      : wildcard.

    • pipeline_job_user_id: Supports =, !=

      comparisons, and : wildcard. for example, can check if pipeline’s display_name contains step by doing display_name:”step

    • create_time: Supports =, !=, <, >,

      <=, and >= comparisons. Values must be in RFC 3339 format.

    • update_time: Supports =, !=, <, >,

      <=, and >= comparisons. Values must be in RFC 3339 format.

    • end_time: Supports =, !=, <, >,

      <=, and >= comparisons. Values must be in RFC 3339 format.

    • labels: Supports key-value equality and key presence.

    Filter expressions can be combined together using logical operators (AND & OR). For example: pipeline_name="test" AND create_time>"2020-05-18T13:30:00Z".

    The syntax to define filter expression is based on https://google.aip.dev/160.

  • page_size (int | None) – Optional. The standard list page size.

  • page_token (str | None) – Optional. The standard list page token. Typically obtained via [ListPipelineJobsResponse.next_page_token][google.cloud.aiplatform.v1.ListPipelineJobsResponse.next_page_token] of the previous [PipelineService.ListPipelineJobs][google.cloud.aiplatform.v1.PipelineService.ListPipelineJobs] call.

  • order_by (str | None) –

    Optional. A comma-separated list of fields to order by. The default sort order is in ascending order. Use “desc” after a field name for descending. You can have multiple order_by fields provided e.g. “create_time desc, end_time”, “end_time, start_time, update_time” For example, using “create_time desc, end_time” will order results by create time in descending order, and if there are multiple jobs having the same create time, order them by the end time in ascending order. if order_by is not specified, it will order by default order is create time in descending order. Supported fields:

    • create_time

    • update_time

    • end_time

    • start_time

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ['region', 'project_id', 'impersonation_chain'][source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.vertex_ai.pipeline_job.DeletePipelineJobOperator(*, project_id, region, pipeline_job_id, retry=DEFAULT, timeout=None, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Delete a Pipeline job.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • pipeline_job_id (str) – Required. The ID of the PipelineJob resource to be deleted.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

  • gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.

  • impersonation_chain (str | Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ['region', 'project_id', 'pipeline_job_id', 'impersonation_chain'][source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?