airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook

This module contains the Amazon SageMaker Unified Studio Notebook Run hook.

Attributes

TWELVE_HOURS_IN_MINUTES

MIN_BOTOCORE_VERSION

NOTEBOOK_RUN_SUCCESS_STATES

NOTEBOOK_RUN_IN_PROGRESS_STATES

NOTEBOOK_RUN_FAILURE_STATES

NOTEBOOK_OUTPUT_KEY_PREFIX

Classes

SageMakerUnifiedStudioNotebookHook

Interact with Sagemaker Unified Studio Workflows for asynchronous notebook execution.

Module Contents

airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.TWELVE_HOURS_IN_MINUTES = 720[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.MIN_BOTOCORE_VERSION = '1.43.1'[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_RUN_SUCCESS_STATES[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_RUN_IN_PROGRESS_STATES[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_RUN_FAILURE_STATES[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_OUTPUT_KEY_PREFIX = 'NOTEBOOK_OUTPUT'[source]
class airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.SageMakerUnifiedStudioNotebookHook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with Sagemaker Unified Studio Workflows for asynchronous notebook execution.

This hook provides a wrapper around the DataZone StartNotebookRun / GetNotebookRun APIs.

Examples:
from airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook import (
    SageMakerUnifiedStudioNotebookHook,
)

hook = SageMakerUnifiedStudioNotebookHook(aws_conn_id="my_aws_conn")

Additional arguments (such as aws_conn_id or region_name) may be specified and are passed down to the underlying AwsBaseHook.

property conn[source]

Get the underlying boto3 DataZone client, optionally with a custom endpoint URL.

start_notebook_run(notebook_identifier, domain_identifier, owning_project_identifier, client_token=None, notebook_parameters=None, compute_configuration=None, timeout_configuration=None, workflow_name=None)[source]

Start an asynchronous notebook run via the DataZone StartNotebookRun API.

Parameters:
  • notebook_identifier (str) – The ID of the notebook to execute.

  • domain_identifier (str) – The ID of the DataZone domain containing the notebook.

  • owning_project_identifier (str) – The ID of the DataZone project containing the notebook.

  • client_token (str | None) – Idempotency token. Auto-generated if not provided.

  • notebook_parameters (dict | None) – Parameters to pass to the notebook.

  • compute_configuration (dict | None) – Compute config (e.g. instanceType).

  • timeout_configuration (dict | None) – Timeout settings (runTimeoutInMinutes).

  • workflow_name (str | None) – Name of the workflow (DAG) that triggered this run.

Returns:

The StartNotebookRun API response dict.

Return type:

dict

get_notebook_run(notebook_run_id, domain_identifier)[source]

Get the status of a notebook run via the DataZone GetNotebookRun API.

Parameters:
  • notebook_run_id (str) – The ID of the notebook run.

  • domain_identifier (str) – The ID of the DataZone domain.

Returns:

The GetNotebookRun API response dict.

Return type:

dict

wait_for_notebook_run(notebook_run_id, domain_identifier, waiter_delay=10, timeout_configuration=None)[source]

Poll GetNotebookRun until the run reaches a terminal state.

Parameters:
  • notebook_run_id (str) – The ID of the notebook run to monitor.

  • domain_identifier (str) – The ID of the DataZone domain.

  • waiter_delay (int) – Interval in seconds to poll the notebook run status.

  • timeout_configuration (dict | None) – Timeout settings for the notebook execution. When provided, the maximum number of poll attempts is derived from runTimeoutInMinutes * 60 / waiter_delay. Defaults to 12 hours.

Returns:

A dict with Status and NotebookRunId on success.

Raises:

RuntimeError – If the run fails or times out.

Return type:

dict

get_project_s3_path(domain_identifier, project_id)[source]

Look up the S3 location for a SageMaker Unified Studio project.

The bucket and key prefix are read from the s3BucketPath provisioned resource of the project’s default (“Tooling”) environment via the DataZone APIs. This mirrors how SageMaker Unified Studio resolves the project bucket and accommodates projects whose bucket name does not follow the amazon-sagemaker-{account_id}-{region}-{project_id} template (for example, BYOR-bucket projects).

Parameters:
  • domain_identifier (str) – The ID of the DataZone domain.

  • project_id (str) – The ID of the DataZone project.

Returns:

A (bucket, prefix) tuple. bucket is the S3 bucket name. prefix is the path component of the project’s s3BucketPath (with no leading or trailing /).

Raises:

RuntimeError – If the default tooling environment or the s3BucketPath provisioned resource cannot be found.

Return type:

tuple[str, str]

get_notebook_outputs(notebook_identifier, notebook_run_id, domain_identifier, owning_project_identifier)[source]

Read notebook output artifacts from the S3 project bucket.

After a notebook run completes, the SDK writes output variables as a JSON file to a well-known S3 location within the project bucket. This method reads that file and returns the parsed key-value pairs.

Parameters:
  • notebook_identifier (str) – The ID of the notebook that was executed.

  • notebook_run_id (str) – The ID of the completed notebook run.

  • domain_identifier (str) – The ID of the DataZone domain.

  • owning_project_identifier (str) – The ID of the DataZone project.

Returns:

A dict of notebook output key-value pairs. Returns an empty dict if no outputs were written or the file cannot be parsed.

Return type:

dict[str, Any]

Was this entry helpful?