airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook¶
This module contains the Amazon SageMaker Unified Studio Notebook Run hook.
Attributes¶
Classes¶
Interact with Sagemaker Unified Studio Workflows for asynchronous notebook execution. |
Module Contents¶
- airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.TWELVE_HOURS_IN_MINUTES = 720[source]¶
- airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.MIN_BOTOCORE_VERSION = '1.43.1'[source]¶
- airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_RUN_SUCCESS_STATES[source]¶
- airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_RUN_IN_PROGRESS_STATES[source]¶
- airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_RUN_FAILURE_STATES[source]¶
- airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.NOTEBOOK_OUTPUT_KEY_PREFIX = 'NOTEBOOK_OUTPUT'[source]¶
- class airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook.SageMakerUnifiedStudioNotebookHook(*args, **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHookInteract with Sagemaker Unified Studio Workflows for asynchronous notebook execution.
This hook provides a wrapper around the DataZone StartNotebookRun / GetNotebookRun APIs.
- Examples:
from airflow.providers.amazon.aws.hooks.sagemaker_unified_studio_notebook import ( SageMakerUnifiedStudioNotebookHook, ) hook = SageMakerUnifiedStudioNotebookHook(aws_conn_id="my_aws_conn")
Additional arguments (such as
aws_conn_idorregion_name) may be specified and are passed down to the underlying AwsBaseHook.- property conn[source]¶
Get the underlying boto3 DataZone client, optionally with a custom endpoint URL.
- start_notebook_run(notebook_identifier, domain_identifier, owning_project_identifier, client_token=None, notebook_parameters=None, compute_configuration=None, timeout_configuration=None, workflow_name=None)[source]¶
Start an asynchronous notebook run via the DataZone StartNotebookRun API.
- Parameters:
notebook_identifier (str) – The ID of the notebook to execute.
domain_identifier (str) – The ID of the DataZone domain containing the notebook.
owning_project_identifier (str) – The ID of the DataZone project containing the notebook.
client_token (str | None) – Idempotency token. Auto-generated if not provided.
notebook_parameters (dict | None) – Parameters to pass to the notebook.
compute_configuration (dict | None) – Compute config (e.g. instanceType).
timeout_configuration (dict | None) – Timeout settings (runTimeoutInMinutes).
workflow_name (str | None) – Name of the workflow (DAG) that triggered this run.
- Returns:
The StartNotebookRun API response dict.
- Return type:
- get_notebook_run(notebook_run_id, domain_identifier)[source]¶
Get the status of a notebook run via the DataZone GetNotebookRun API.
- wait_for_notebook_run(notebook_run_id, domain_identifier, waiter_delay=10, timeout_configuration=None)[source]¶
Poll GetNotebookRun until the run reaches a terminal state.
- Parameters:
notebook_run_id (str) – The ID of the notebook run to monitor.
domain_identifier (str) – The ID of the DataZone domain.
waiter_delay (int) – Interval in seconds to poll the notebook run status.
timeout_configuration (dict | None) – Timeout settings for the notebook execution. When provided, the maximum number of poll attempts is derived from
runTimeoutInMinutes * 60 / waiter_delay. Defaults to 12 hours.
- Returns:
A dict with Status and NotebookRunId on success.
- Raises:
RuntimeError – If the run fails or times out.
- Return type:
- get_project_s3_path(domain_identifier, project_id)[source]¶
Look up the S3 location for a SageMaker Unified Studio project.
The bucket and key prefix are read from the
s3BucketPathprovisioned resource of the project’s default (“Tooling”) environment via the DataZone APIs. This mirrors how SageMaker Unified Studio resolves the project bucket and accommodates projects whose bucket name does not follow theamazon-sagemaker-{account_id}-{region}-{project_id}template (for example, BYOR-bucket projects).- Parameters:
- Returns:
A
(bucket, prefix)tuple.bucketis the S3 bucket name.prefixis the path component of the project’ss3BucketPath(with no leading or trailing/).- Raises:
RuntimeError – If the default tooling environment or the
s3BucketPathprovisioned resource cannot be found.- Return type:
- get_notebook_outputs(notebook_identifier, notebook_run_id, domain_identifier, owning_project_identifier)[source]¶
Read notebook output artifacts from the S3 project bucket.
After a notebook run completes, the SDK writes output variables as a JSON file to a well-known S3 location within the project bucket. This method reads that file and returns the parsed key-value pairs.
- Parameters:
- Returns:
A dict of notebook output key-value pairs. Returns an empty dict if no outputs were written or the file cannot be parsed.
- Return type: