airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook

This module contains the Amazon SageMaker Unified Studio Notebook Run hook.

Attributes

TWELVE_HOURS_IN_MINUTES

MIN_BOTOCORE_VERSION

NOTEBOOK_RUN_SUCCESS_STATES

NOTEBOOK_RUN_IN_PROGRESS_STATES

NOTEBOOK_RUN_FAILURE_STATES

NOTEBOOK_OUTPUT_KEY_PREFIX

Classes

SageMakerUnifiedStudioNotebookHook

Interact with Sagemaker Unified Studio Workflows for asynchronous notebook execution.

Module Contents

airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.TWELVE_HOURS_IN_MINUTES = 720[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.MIN_BOTOCORE_VERSION = '1.43.1'[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_RUN_SUCCESS_STATES[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_RUN_IN_PROGRESS_STATES[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_RUN_FAILURE_STATES[source]
airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_OUTPUT_KEY_PREFIX = 'NOTEBOOK_OUTPUT'[source]
class airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.SageMakerUnifiedStudioNotebookHook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with Sagemaker Unified Studio Workflows for asynchronous notebook execution.

This hook provides a wrapper around the DataZone StartNotebookRun / GetNotebookRun APIs.

Examples:
from airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook import (
    SageMakerUnifiedStudioNotebookHook,
)

hook = SageMakerUnifiedStudioNotebookHook(aws_conn_id="my_aws_conn")

Additional arguments (such as aws_conn_id or region_name) may be specified and are passed down to the underlying AwsBaseHook.

property conn[source]

Get the underlying boto3 DataZone client, optionally with a custom endpoint URL.

start_notebook_run(notebook_identifier, domain_identifier, owning_project_identifier, client_token=None, notebook_parameters=None, compute_configuration=None, timeout_configuration=None, workflow_name=None)[source]

Start an asynchronous notebook run via the DataZone StartNotebookRun API.

Parameters:
  • notebook_identifier (str) – The ID of the notebook to execute.

  • domain_identifier (str) – The ID of the DataZone domain containing the notebook.

  • owning_project_identifier (str) – The ID of the DataZone project containing the notebook.

  • client_token (str | None) – Idempotency token. Auto-generated if not provided.

  • notebook_parameters (dict | None) – Parameters to pass to the notebook.

  • compute_configuration (dict | None) – Compute config (e.g. instanceType).

  • timeout_configuration (dict | None) – Timeout settings (runTimeoutInMinutes).

  • workflow_name (str | None) – Name of the workflow (DAG) that triggered this run.

Returns:

The StartNotebookRun API response dict.

Return type:

dict

get_notebook_run(notebook_run_id, domain_identifier)[source]

Get the status of a notebook run via the DataZone GetNotebookRun API.

Parameters:
  • notebook_run_id (str) – The ID of the notebook run.

  • domain_identifier (str) – The ID of the DataZone domain.

Returns:

The GetNotebookRun API response dict.

Return type:

dict

wait_for_notebook_run(notebook_run_id, domain_identifier, waiter_delay=10, timeout_configuration=None)[source]

Poll GetNotebookRun until the run reaches a terminal state.

Parameters:
  • notebook_run_id (str) – The ID of the notebook run to monitor.

  • domain_identifier (str) – The ID of the DataZone domain.

  • waiter_delay (int) – Interval in seconds to poll the notebook run status.

  • timeout_configuration (dict | None) – Timeout settings for the notebook execution. When provided, the maximum number of poll attempts is derived from runTimeoutInMinutes * 60 / waiter_delay. Defaults to 12 hours.

Returns:

A dict with Status and NotebookRunId on success.

Raises:

RuntimeError – If the run fails or times out.

Return type:

dict

get_project_s3_path(project_id)[source]

Construct the S3 path for a SageMaker Unified Studio project bucket.

Parameters:

project_id (str) – The ID of the DataZone project.

Returns:

The S3 bucket name for the project.

Return type:

str

get_notebook_outputs(notebook_identifier, notebook_run_id, owning_project_identifier)[source]

Read notebook output artifacts from the S3 project bucket.

After a notebook run completes, the SDK writes output variables as a JSON file to a well-known S3 location within the project bucket. This method reads that file and returns the parsed key-value pairs.

Parameters:
  • notebook_identifier (str) – The ID of the notebook that was executed.

  • notebook_run_id (str) – The ID of the completed notebook run.

  • owning_project_identifier (str) – The ID of the DataZone project.

Returns:

A dict of notebook output key-value pairs. Returns an empty dict if no outputs were written or the file cannot be parsed.

Return type:

dict[str, Any]

Was this entry helpful?