Amazon SageMaker Unified Studio¶
Amazon SageMaker Unified Studio is a unified development experience that brings together AWS data, analytics, artificial intelligence (AI), and machine learning (ML) services. It provides a place to build, deploy, execute, and monitor end-to-end workflows from a single interface. This helps drive collaboration across teams and facilitate agile development.
Airflow provides different operators for running artifacts in SageMaker Unified Studio. Read the descriptions below to understand which operator is best suited for your use case.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create a SageMaker Unified Studio domain and project, following the instruction in AWS documentation.
If the domain is an IdC domain, navigate to the “Compute > Workflow environments” tab, and click “Create” to create a new MWAA environment.
Create a Jupyter notebook, querybook, Visual ETL job, or SageMaker Unified Studio notebook and save it to your project.
Operators¶
Run Jupyter notebooks, Querybooks, and Visual ETL jobs¶
Use SageMakerNotebookOperator
to execute Jupyter notebooks, querybooks, and visual ETL jobs. This operator relies on the sagemaker_studio
Python library to execute these artifacts.
The artifact is identified by its relative file path within the project (e.g. test_notebook.ipynb).
# Run notebook using the legacy env-var-based resolution path (MWAA-style).
run_notebook = SageMakerNotebookOperator(
task_id="run-notebook",
input_config={"input_path": notebook_path, "input_params": {}},
output_config={"output_formats": ["NOTEBOOK"]}, # optional
compute={
"instance_type": "ml.m5.large",
"volume_size_in_gb": 30,
}, # optional
termination_condition={"max_runtime_in_seconds": 600}, # optional
tags={}, # optional
wait_for_completion=True, # optional
waiter_delay=5, # optional
deferrable=False, # optional
executor_config={ # optional
"overrides": {
"containerOverrides": [
{
"environment": [
{"name": key, "value": value}
for key, value in mock_mwaa_environment_params.items()
],
"name": "ECSExecutorContainer", # Necessary parameter
}
]
}
},
)
Run SageMaker Unified Studio notebooks¶
Use SageMakerUnifiedStudioNotebookOperator
to execute SageMaker Unified Studio notebooks through the DataZone StartNotebookRun API.
The notebook is identified by its notebook ID (e.g. nb-1234567890), along with the domain ID and project ID
where the notebook resides.
import time
client_token = f"idempotency-token-{int(time.time())}"
run_notebook = SageMakerUnifiedStudioNotebookOperator(
task_id="notebook-task",
notebook_identifier=notebook_id, # This should be the notebook asset identifier from within the SageMaker Unified Studio domain
domain_identifier=domain_id,
owning_project_identifier=project_id,
client_token=client_token, # optional
notebook_parameters={
"param1": "value1",
"param2": "value2",
}, # optional
compute_configuration={"instanceType": "sc.m5.large"}, # optional
timeout_configuration={"runTimeoutInMinutes": 1440}, # optional
wait_for_completion=True, # optional
waiter_delay=30, # optional
deferrable=False, # optional
)
The following example adds domain ID, project ID, and domain name as operator parameters.
# Run notebook with domain_id/project_id/domain_region passed explicitly as operator parameters.
# No environment variables needed — the SDK resolves the S3 path and region from these params.
# Requires sagemaker-studio>=1.0.25.
# NOTE: this task runs BEFORE env vars are set intentionally, to prove that explicit params
# work without any MWAA-style environment variables present.
run_notebook_explicit_params = SageMakerNotebookOperator(
task_id="run-notebook-explicit",
domain_id=domain_id,
project_id=project_id,
domain_region=region_name,
input_config={"input_path": notebook_path, "input_params": {}},
output_config={"output_formats": ["NOTEBOOK"]}, # optional
compute={
"instance_type": "ml.m5.large",
"volume_size_in_gb": 30,
}, # optional
termination_condition={"max_runtime_in_seconds": 600}, # optional
tags={}, # optional
wait_for_completion=True, # optional
waiter_delay=5, # optional
deferrable=False, # optional
)
Notebooks can produce output variables that are automatically pushed to XCom when the run completes.
Downstream tasks can consume these outputs via Jinja templating in notebook_parameters.
In this example, Notebook A produces outputs (e.g., name and age). Notebook B receives
those values as parameters using Jinja templates like
{{ task_instance.xcom_pull(task_ids='notebook-a-task', key='name') }}.
# Notebook A produces outputs (e.g., name, age) that are pushed to xcom.
# Notebook B consumes those outputs via Jinja templating in notebook_parameters.
run_notebook_a = SageMakerUnifiedStudioNotebookOperator(
task_id="notebook-a-task",
notebook_identifier=notebook_id,
domain_identifier=domain_id,
owning_project_identifier=project_id,
wait_for_completion=True,
)
run_notebook_b = SageMakerUnifiedStudioNotebookOperator(
task_id="notebook-b-task",
notebook_identifier=notebook_b_id,
domain_identifier=domain_id,
owning_project_identifier=project_id,
notebook_parameters={
"employee_name": "{{ task_instance.xcom_pull(task_ids='notebook-a-task', key='NOTEBOOK_OUTPUT.name') }}",
"employee_age": "{{ task_instance.xcom_pull(task_ids='notebook-a-task', key='NOTEBOOK_OUTPUT.age') }}",
},
wait_for_completion=True,
)