airflow.providers.amazon.aws.hooks.sagemaker_unified_studio

This module contains the Amazon SageMaker Unified Studio Notebook hook.

Classes

SageMakerNotebookHook

Interact with Sagemaker Unified Studio Workflows for executing Jupyter notebooks, querybooks, and visual ETL jobs.

Module Contents

class airflow.providers.amazon.aws.hooks.sagemaker_unified_studio.SageMakerNotebookHook(execution_name, input_config=None, domain_id=None, project_id=None, output_config=None, domain_region=None, compute=None, termination_condition=None, tags=None, waiter_delay=10, waiter_max_attempts=1440, *args, **kwargs)[source]

Bases: airflow.providers.common.compat.sdk.BaseHook

Interact with Sagemaker Unified Studio Workflows for executing Jupyter notebooks, querybooks, and visual ETL jobs.

This hook provides a wrapper around the Sagemaker Workflows Notebook Execution API.

Examples:
from airflow.providers.amazon.aws.hooks.sagemaker_unified_studio import SageMakerNotebookHook

notebook_hook = SageMakerNotebookHook(
    execution_name="notebook_execution",
    domain_id="dzd-example123456",
    project_id="example123456",
    input_config={"input_path": "path/to/notebook.ipynb", "input_params": {"param1": "value1"}},
    output_config={"output_uri": "folder/output/location/prefix", "output_formats": "NOTEBOOK"},
    domain_region="us-east-1",
    waiter_delay=10,
    waiter_max_attempts=1440,
)
Parameters:
  • execution_name (str) – The name of the notebook job to be executed, this is same as task_id.

  • domain_id (str | None) – The domain ID for Amazon SageMaker Unified Studio. Optional - if not provided, the SDK will attempt to resolve it from the environment.

  • project_id (str | None) – The project ID for Amazon SageMaker Unified Studio. Optional - if not provided, the SDK will attempt to resolve it from the environment.

  • input_config (dict | None) – Configuration for the input file. Example: {‘input_path’: ‘folder/input/notebook.ipynb’, ‘input_params’: {‘param1’: ‘value1’}}

  • output_config (dict | None) – Configuration for the output format. It should include an output_formats parameter to specify the output format. Example: {‘output_formats’: [‘NOTEBOOK’]}

  • domain_region (str | None) – The AWS region for the domain. If not provided, the default AWS region will be used.

  • compute (dict | None) –

    compute configuration to use for the notebook execution. This is a required attribute if the execution is on a remote compute. Example:

    {
        "instance_type": "ml.c5.xlarge",
        "image_details": {
            "image_name": "sagemaker-distribution-prod",
            "image_version": "3",
            "ecr_uri": "123456123456.dkr.ecr.us-west-2.amazonaws.com/ImageName:latest",
        },
    }
    

  • termination_condition (dict | None) – conditions to match to terminate the remote execution. Example: {"MaxRuntimeInSeconds": 3600}

  • tags (dict | None) – tags to be associated with the remote execution runs. Example: {"md_analytics": "logs"}

  • waiter_delay (int) – Interval in seconds to check the task execution status.

  • waiter_max_attempts (int) – Number of attempts to wait before returning FAILED.

execution_name[source]
domain_id = None[source]
project_id = None[source]
domain_region = None[source]
input_config[source]
output_config[source]
compute = None[source]
termination_condition[source]
tags[source]
waiter_delay = 10[source]
waiter_max_attempts = 1440[source]
start_notebook_execution()[source]
wait_for_execution_completion(execution_id, context)[source]

Was this entry helpful?