airflow.providers.google.cloud.hooks.vertex_ai.pipeline_job

This module contains a Google Cloud Vertex AI hook.

Module Contents

Classes

PipelineJobHook

Hook for Google Cloud Vertex AI Pipeline Job APIs.

PipelineJobAsyncHook

Asynchronous hook for Google Cloud Vertex AI Pipeline Job APIs.

class airflow.providers.google.cloud.hooks.vertex_ai.pipeline_job.PipelineJobHook(gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.common.hooks.base_google.GoogleBaseHook

Hook for Google Cloud Vertex AI Pipeline Job APIs.

get_pipeline_service_client(region=None)[source]

Return PipelineServiceClient object.

get_pipeline_job_object(display_name, template_path, job_id=None, pipeline_root=None, parameter_values=None, input_artifacts=None, enable_caching=None, encryption_spec_key_name=None, labels=None, project=None, location=None, failure_policy=None)[source]

Return PipelineJob object.

wait_for_operation(operation, timeout=None)[source]

Wait for long-lasting operation to complete.

cancel_pipeline_job()[source]

Cancel PipelineJob.

create_pipeline_job(project_id, region, pipeline_job, pipeline_job_id, retry=DEFAULT, timeout=None, metadata=())[source]

Create a PipelineJob. A PipelineJob will run immediately when created.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • pipeline_job (google.cloud.aiplatform.PipelineJob) – Required. The PipelineJob to create.

  • pipeline_job_id (str) –

    The ID to use for the PipelineJob, which will become the final component of the PipelineJob name. If not provided, an ID will be automatically generated.

    This value should be less than 128 characters, and valid characters are /[a-z][0-9]-/.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

run_pipeline_job(project_id, region, display_name, template_path, job_id=None, pipeline_root=None, parameter_values=None, input_artifacts=None, enable_caching=None, encryption_spec_key_name=None, labels=None, failure_policy=None, service_account=None, network=None, create_request_timeout=None, experiment=None)[source]

Create and run a PipelineJob until its completion.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • display_name (str) – Required. The user-defined name of this Pipeline.

  • template_path (str) – Required. The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. “gs://project.name”), an Artifact Registry URI (e.g. “https://us-central1-kfp.pkg.dev/proj/repo/pack/latest”), or an HTTPS URI.

  • job_id (str | None) – Optional. The unique ID of the job run. If not specified, pipeline name + timestamp will be used.

  • pipeline_root (str | None) – Optional. The root of the pipeline outputs. If not set, the staging bucket set in aiplatform.init will be used. If that’s not set a pipeline-specific artifacts bucket will be used.

  • parameter_values (dict[str, Any] | None) – Optional. The mapping from runtime parameter names to its values that control the pipeline run.

  • input_artifacts (dict[str, str] | None) – Optional. The mapping from the runtime parameter name for this artifact to its resource id. For example: “vertex_model”:”456”. Note: full resource name (“projects/123/locations/us-central1/metadataStores/default/artifacts/456”) cannot be used.

  • enable_caching (bool | None) – Optional. Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.

  • encryption_spec_key_name (str | None) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the job. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If this is set, then all resources created by the PipelineJob will be encrypted with the provided encryption key. Overrides encryption_spec_key_name set in aiplatform.init.

  • labels (dict[str, str] | None) – Optional. The user defined metadata to organize PipelineJob.

  • failure_policy (str | None) – Optional. The failure policy - “slow” or “fast”. Currently, the default of a pipeline is that the pipeline will continue to run until no more tasks can be executed, also known as PIPELINE_FAILURE_POLICY_FAIL_SLOW (corresponds to “slow”). However, if a pipeline is set to PIPELINE_FAILURE_POLICY_FAIL_FAST (corresponds to “fast”), it will stop scheduling any new tasks when a task has failed. Any scheduled tasks will continue to completion.

  • service_account (str | None) – Optional. Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.

  • network (str | None) – Optional. The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the network set in aiplatform.init will be used. Otherwise, the job is not peered with any network.

  • create_request_timeout (float | None) – Optional. The timeout for the create request in seconds.

  • experiment (str | google.cloud.aiplatform.metadata.experiment_resources.Experiment | None) – Optional. The Vertex AI experiment name or instance to associate to this PipelineJob. Metrics produced by the PipelineJob as system.Metric Artifacts will be associated as metrics to the current Experiment Run. Pipeline parameters will be associated as parameters to the current Experiment Run.

submit_pipeline_job(project_id, region, display_name, template_path, job_id=None, pipeline_root=None, parameter_values=None, input_artifacts=None, enable_caching=None, encryption_spec_key_name=None, labels=None, failure_policy=None, service_account=None, network=None, create_request_timeout=None, experiment=None)[source]

Create and start a PipelineJob run.

For more info about the client method please see: https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob#google_cloud_aiplatform_PipelineJob_submit

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • display_name (str) – Required. The user-defined name of this Pipeline.

  • template_path (str) – Required. The path of PipelineJob or PipelineSpec JSON or YAML file. It can be a local path, a Google Cloud Storage URI (e.g. “gs://project.name”), an Artifact Registry URI (e.g. “https://us-central1-kfp.pkg.dev/proj/repo/pack/latest”), or an HTTPS URI.

  • job_id (str | None) – Optional. The unique ID of the job run. If not specified, pipeline name + timestamp will be used.

  • pipeline_root (str | None) – Optional. The root of the pipeline outputs. If not set, the staging bucket set in aiplatform.init will be used. If that’s not set a pipeline-specific artifacts bucket will be used.

  • parameter_values (dict[str, Any] | None) – Optional. The mapping from runtime parameter names to its values that control the pipeline run.

  • input_artifacts (dict[str, str] | None) – Optional. The mapping from the runtime parameter name for this artifact to its resource id. For example: “vertex_model”:”456”. Note: full resource name (“projects/123/locations/us-central1/metadataStores/default/artifacts/456”) cannot be used.

  • enable_caching (bool | None) – Optional. Whether to turn on caching for the run. If this is not set, defaults to the compile time settings, which are True for all tasks by default, while users may specify different caching options for individual tasks. If this is set, the setting applies to all tasks in the pipeline. Overrides the compile time settings.

  • encryption_spec_key_name (str | None) – Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the job. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. If this is set, then all resources created by the PipelineJob will be encrypted with the provided encryption key. Overrides encryption_spec_key_name set in aiplatform.init.

  • labels (dict[str, str] | None) – Optional. The user defined metadata to organize PipelineJob.

  • failure_policy (str | None) – Optional. The failure policy - “slow” or “fast”. Currently, the default of a pipeline is that the pipeline will continue to run until no more tasks can be executed, also known as PIPELINE_FAILURE_POLICY_FAIL_SLOW (corresponds to “slow”). However, if a pipeline is set to PIPELINE_FAILURE_POLICY_FAIL_FAST (corresponds to “fast”), it will stop scheduling any new tasks when a task has failed. Any scheduled tasks will continue to completion.

  • service_account (str | None) – Optional. Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.

  • network (str | None) – Optional. The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the network set in aiplatform.init will be used. Otherwise, the job is not peered with any network.

  • create_request_timeout (float | None) – Optional. The timeout for the create request in seconds.

  • experiment (str | google.cloud.aiplatform.metadata.experiment_resources.Experiment | None) – Optional. The Vertex AI experiment name or instance to associate to this PipelineJob. Metrics produced by the PipelineJob as system.Metric Artifacts will be associated as metrics to the current Experiment Run. Pipeline parameters will be associated as parameters to the current Experiment Run.

get_pipeline_job(project_id, region, pipeline_job_id, retry=DEFAULT, timeout=None, metadata=())[source]

Get a PipelineJob.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • pipeline_job_id (str) – Required. The ID of the PipelineJob resource.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

list_pipeline_jobs(project_id, region, page_size=None, page_token=None, filter=None, order_by=None, retry=DEFAULT, timeout=None, metadata=())[source]

List PipelineJobs in a Location.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • filter (str | None) –

    Optional. Lists the PipelineJobs that match the filter expression. The following fields are supported:

    • pipeline_name: Supports = and != comparisons.

    • display_name: Supports =, != comparisons, and : wildcard.

    • pipeline_job_user_id: Supports =, != comparisons, and : wildcard. for example, can check if pipeline’s display_name contains step by doing display_name:”step

    • create_time: Supports =, !=, <, >, <=, and >= comparisons. Values must be in RFC 3339 format.

    • update_time: Supports =, !=, <, >, <=, and >= comparisons. Values must be in RFC 3339 format.

    • end_time: Supports =, !=, <, >, <=, and >= comparisons. Values must be in RFC 3339 format.

    • labels: Supports key-value equality and key presence.

    Filter expressions can be combined together using logical operators (AND & OR). For example: pipeline_name="test" AND create_time>"2020-05-18T13:30:00Z".

    The syntax to define filter expression is based on https://google.aip.dev/160.

  • page_size (int | None) – Optional. The standard list page size.

  • page_token (str | None) – Optional. The standard list page token. Typically obtained via [ListPipelineJobsResponse.next_page_token][google.cloud.aiplatform.v1.ListPipelineJobsResponse.next_page_token] of the previous [PipelineService.ListPipelineJobs][google.cloud.aiplatform.v1.PipelineService.ListPipelineJobs] call.

  • order_by (str | None) –

    Optional. A comma-separated list of fields to order by. The default sort order is in ascending order. Use “desc” after a field name for descending. You can have multiple order_by fields provided e.g. “create_time desc, end_time”, “end_time, start_time, update_time” For example, using “create_time desc, end_time” will order results by create time in descending order, and if there are multiple jobs having the same create time, order them by the end time in ascending order. if order_by is not specified, it will order by default order is create time in descending order. Supported fields:

    • create_time

    • update_time

    • end_time

    • start_time

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

delete_pipeline_job(project_id, region, pipeline_job_id, retry=DEFAULT, timeout=None, metadata=())[source]

Delete a PipelineJob.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • pipeline_job_id (str) – Required. The ID of the PipelineJob resource to be deleted.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

static extract_pipeline_job_id(obj)[source]

Return unique id of a pipeline job from its name.

class airflow.providers.google.cloud.hooks.vertex_ai.pipeline_job.PipelineJobAsyncHook(gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.common.hooks.base_google.GoogleBaseAsyncHook

Asynchronous hook for Google Cloud Vertex AI Pipeline Job APIs.

sync_hook_class[source]
PIPELINE_COMPLETE_STATES = ()[source]
async get_credentials()[source]
async get_project_id()[source]
async get_location()[source]
async get_pipeline_service_client(region=None)[source]
async get_pipeline_job(project_id, location, job_id, retry=DEFAULT, timeout=DEFAULT, metadata=())[source]

Get a PipelineJob proto message from PipelineServiceAsyncClient.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • location (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • job_id (str) – Required. The ID of the PipelineJob resource.

  • retry (google.api_core.retry.AsyncRetry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | google.api_core.gapic_v1.method._MethodDefault | None) – The timeout for this request.

  • metadata (Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

async wait_for_pipeline_job(project_id, location, job_id, retry=DEFAULT, timeout=None, metadata=(), poll_interval=10)[source]

Wait until the pipeline job is in a complete state and return it.

Was this entry helpful?