Amazon SageMaker Unified Studio¶

Amazon SageMaker Unified Studio is a unified development experience that brings together AWS data, analytics, artificial intelligence (AI), and machine learning (ML) services. It provides a place to build, deploy, execute, and monitor end-to-end workflows from a single interface. This helps drive collaboration across teams and facilitate agile development.

Airflow provides different operators for running artifacts in SageMaker Unified Studio. Read the descriptions below to understand which operator is best suited for your use case.

Prerequisite Tasks¶

To use these operators, you must do a few things:

Create a SageMaker Unified Studio domain and project, following the instruction in AWS documentation.

If the domain is an IdC domain, navigate to the “Compute > Workflow environments” tab, and click “Create” to create a new MWAA environment.

Create a Jupyter notebook, querybook, Visual ETL job, or SageMaker Unified Studio notebook and save it to your project.

Operators¶

Run Jupyter notebooks, Querybooks, and Visual ETL jobs¶

Use SageMakerNotebookOperator to execute Jupyter notebooks, querybooks, and visual ETL jobs. This operator relies on the sagemaker_studio Python library to execute these artifacts.

The artifact is identified by its relative file path within the project (e.g. test_notebook.ipynb).

tests/system/amazon/aws/example_sagemaker_unified_studio.py[source]

# Run notebook using the legacy env-var-based resolution path (MWAA-style).
run_notebook = SageMakerNotebookOperator(
    task_id="run-notebook",
    input_config={"input_path": notebook_path, "input_params": {}},
    output_config={"output_formats": ["NOTEBOOK"]},  # optional
    compute={
        "instance_type": "ml.m5.large",
        "volume_size_in_gb": 30,
    },  # optional
    termination_condition={"max_runtime_in_seconds": 600},  # optional
    tags={},  # optional
    wait_for_completion=True,  # optional
    waiter_delay=5,  # optional
    deferrable=False,  # optional
    executor_config={  # optional
        "overrides": {
            "containerOverrides": [
                {
                    "environment": [
                        {"name": key, "value": value}
                        for key, value in mock_mwaa_environment_params.items()
                    ],
                    "name": "ECSExecutorContainer",  # Necessary parameter
                }
            ]
        }
    },
)

Run SageMaker Unified Studio notebooks¶

Use SageMakerUnifiedStudioNotebookOperator to execute SageMaker Unified Studio notebooks through the DataZone StartNotebookRun API.

The notebook is identified by its notebook ID (e.g. nb-1234567890), along with the domain ID and project ID where the notebook resides.

tests/system/amazon/aws/example_sagemaker_unified_studio_notebook.py[source]

import time

client_token = f"idempotency-token-{int(time.time())}"

run_notebook = SageMakerUnifiedStudioNotebookOperator(
    task_id="notebook-task",
    notebook_identifier=notebook_id,  # This should be the notebook asset identifier from within the SageMaker Unified Studio domain
    domain_identifier=domain_id,
    owning_project_identifier=project_id,
    client_token=client_token,  # optional
    notebook_parameters={
        "param1": "value1",
        "param2": "value2",
    },  # optional
    compute_configuration={"instanceType": "sc.m5.large"},  # optional
    timeout_configuration={"runTimeoutInMinutes": 1440},  # optional
    wait_for_completion=True,  # optional
    waiter_delay=30,  # optional
    deferrable=False,  # optional
)

The following example adds domain ID, project ID, and domain name as operator parameters.

tests/system/amazon/aws/example_sagemaker_unified_studio.py[source]

# Run notebook with domain_id/project_id/domain_region passed explicitly as operator parameters.
# No environment variables needed — the SDK resolves the S3 path and region from these params.
# Requires sagemaker-studio>=1.0.25.
# NOTE: this task runs BEFORE env vars are set intentionally, to prove that explicit params
# work without any MWAA-style environment variables present.
run_notebook_explicit_params = SageMakerNotebookOperator(
    task_id="run-notebook-explicit",
    domain_id=domain_id,
    project_id=project_id,
    domain_region=region_name,
    input_config={"input_path": notebook_path, "input_params": {}},
    output_config={"output_formats": ["NOTEBOOK"]},  # optional
    compute={
        "instance_type": "ml.m5.large",
        "volume_size_in_gb": 30,
    },  # optional
    termination_condition={"max_runtime_in_seconds": 600},  # optional
    tags={},  # optional
    wait_for_completion=True,  # optional
    waiter_delay=5,  # optional
    deferrable=False,  # optional
)

Notebooks can produce output variables that are automatically pushed to XCom when the run completes. Downstream tasks can consume these outputs via Jinja templating in notebook_parameters.

In this example, Notebook A produces outputs (e.g., name and age). Notebook B receives those values as parameters using Jinja templates like {{ task_instance.xcom_pull(task_ids='notebook-a-task', key='name') }}.

tests/system/amazon/aws/example_sagemaker_unified_studio_notebook.py[source]

# Notebook A produces outputs (e.g., name, age) that are pushed to xcom.
# Notebook B consumes those outputs via Jinja templating in notebook_parameters.
run_notebook_a = SageMakerUnifiedStudioNotebookOperator(
    task_id="notebook-a-task",
    notebook_identifier=notebook_id,
    domain_identifier=domain_id,
    owning_project_identifier=project_id,
    wait_for_completion=True,
)

run_notebook_b = SageMakerUnifiedStudioNotebookOperator(
    task_id="notebook-b-task",
    notebook_identifier=notebook_b_id,
    domain_identifier=domain_id,
    owning_project_identifier=project_id,
    notebook_parameters={
        "employee_name": "{{ task_instance.xcom_pull(task_ids='notebook-a-task', key='NOTEBOOK_OUTPUT.name') }}",
        "employee_age": "{{ task_instance.xcom_pull(task_ids='notebook-a-task', key='NOTEBOOK_OUTPUT.age') }}",
    },
    wait_for_completion=True,
)

Reference¶

What is Amazon SageMaker Unified Studio