airflow.providers.amazon.aws.hooks.sagemaker
¶
Module Contents¶
Classes¶
Enum-style class holding all possible states of CloudWatch log streams. |
|
Interact with Amazon SageMaker. |
Functions¶
|
Return the index, i, in arr that minimizes f(arr[i]) |
|
Returns true if training job's secondary status message has changed. |
|
Returns a string contains start time and the secondary training job status message. |
Attributes¶
- class airflow.providers.amazon.aws.hooks.sagemaker.LogState[source]¶
Enum-style class holding all possible states of CloudWatch log streams. https://sagemaker.readthedocs.io/en/stable/session.html#sagemaker.session.LogState
- airflow.providers.amazon.aws.hooks.sagemaker.argmin(arr, f)[source]¶
Return the index, i, in arr that minimizes f(arr[i])
- airflow.providers.amazon.aws.hooks.sagemaker.secondary_training_status_changed(current_job_description, prev_job_description)[source]¶
Returns true if training job's secondary status message has changed.
- Parameters
- Returns
Whether the secondary status message of a training job changed or not.
- Return type
- airflow.providers.amazon.aws.hooks.sagemaker.secondary_training_status_message(job_description, prev_description)[source]¶
Returns a string contains start time and the secondary training job status message.
- class airflow.providers.amazon.aws.hooks.sagemaker.SageMakerHook(*args, **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook
Interact with Amazon SageMaker.
Additional arguments (such as
aws_conn_id
) may be specified and are passed down to the underlying AwsBaseHook.See also
- tar_and_s3_upload(self, path, key, bucket)[source]¶
Tar the local file or directory and upload to s3
- configure_s3_resources(self, config)[source]¶
Extract the S3 operations from the configuration and execute them.
- check_training_config(self, training_config)[source]¶
Check if a training configuration is valid
- Parameters
training_config (dict) -- training_config
- Returns
None
- Return type
None
- check_tuning_config(self, tuning_config)[source]¶
Check if a tuning configuration is valid
- Parameters
tuning_config (dict) -- tuning_config
- Returns
None
- Return type
None
- get_log_conn(self)[source]¶
This method is deprecated. Please use
airflow.providers.amazon.aws.hooks.logs.AwsLogsHook.get_conn()
instead.
- log_stream(self, log_group, stream_name, start_time=0, skip=0)[source]¶
This method is deprecated. Please use
airflow.providers.amazon.aws.hooks.logs.AwsLogsHook.get_log_events()
instead.
- multi_stream_iter(self, log_group, streams, positions=None)[source]¶
Iterate over the available events coming from a set of log streams in a single log group interleaving the events from each stream so they're yielded in timestamp order.
- Parameters
- Returns
A tuple of (stream number, cloudwatch log event).
- Return type
Generator
- create_training_job(self, config, wait_for_completion=True, print_log=True, check_interval=30, max_ingestion_time=None)[source]¶
Create a training job
- Parameters
config (dict) -- the config for training
wait_for_completion (bool) -- if the program should keep running until job finishes
check_interval (int) -- the time interval in seconds which the operator will check the status of any SageMaker job
max_ingestion_time (Optional[int]) -- the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job.
- Returns
A response to training job creation
- create_tuning_job(self, config, wait_for_completion=True, check_interval=30, max_ingestion_time=None)[source]¶
Create a tuning job
- Parameters
config (dict) -- the config for tuning
wait_for_completion (bool) -- if the program should keep running until job finishes
check_interval (int) -- the time interval in seconds which the operator will check the status of any SageMaker job
max_ingestion_time (Optional[int]) -- the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job.
- Returns
A response to tuning job creation
- create_transform_job(self, config, wait_for_completion=True, check_interval=30, max_ingestion_time=None)[source]¶
Create a transform job
- Parameters
config (dict) -- the config for transform job
wait_for_completion (bool) -- if the program should keep running until job finishes
check_interval (int) -- the time interval in seconds which the operator will check the status of any SageMaker job
max_ingestion_time (Optional[int]) -- the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job.
- Returns
A response to transform job creation
- create_processing_job(self, config, wait_for_completion=True, check_interval=30, max_ingestion_time=None)[source]¶
Create a processing job
- Parameters
config (dict) -- the config for processing job
wait_for_completion (bool) -- if the program should keep running until job finishes
check_interval (int) -- the time interval in seconds which the operator will check the status of any SageMaker job
max_ingestion_time (Optional[int]) -- the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job.
- Returns
A response to transform job creation
- create_model(self, config)[source]¶
Create a model job
- Parameters
config (dict) -- the config for model
- Returns
A response to model creation
- create_endpoint_config(self, config)[source]¶
Create an endpoint config
- Parameters
config (dict) -- the config for endpoint-config
- Returns
A response to endpoint config creation
- create_endpoint(self, config, wait_for_completion=True, check_interval=30, max_ingestion_time=None)[source]¶
Create an endpoint
- Parameters
config (dict) -- the config for endpoint
wait_for_completion (bool) -- if the program should keep running until job finishes
check_interval (int) -- the time interval in seconds which the operator will check the status of any SageMaker job
max_ingestion_time (Optional[int]) -- the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job.
- Returns
A response to endpoint creation
- update_endpoint(self, config, wait_for_completion=True, check_interval=30, max_ingestion_time=None)[source]¶
Update an endpoint
- Parameters
config (dict) -- the config for endpoint
wait_for_completion (bool) -- if the program should keep running until job finishes
check_interval (int) -- the time interval in seconds which the operator will check the status of any SageMaker job
max_ingestion_time (Optional[int]) -- the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job.
- Returns
A response to endpoint update
- describe_training_job(self, name)[source]¶
Return the training job info associated with the name
- Parameters
name (str) -- the name of the training job
- Returns
A dict contains all the training job info
- describe_training_job_with_log(self, job_name, positions, stream_names, instance_count, state, last_description, last_describe_job_call)[source]¶
Return the training job info associated with job_name and print CloudWatch logs
- describe_processing_job(self, name)[source]¶
Return the processing job info associated with the name
- describe_endpoint_config(self, name)[source]¶
Return the endpoint config info associated with the name
- check_status(self, job_name, key, describe_function, check_interval, max_ingestion_time=None, non_terminal_states=None)[source]¶
Check status of a SageMaker job
- Parameters
job_name (str) -- name of the job to check status
key (str) -- the key of the response dict that points to the state
describe_function (Callable) -- the function used to retrieve the status
args -- the arguments for the function
check_interval (int) -- the time interval in seconds which the operator will check the status of any SageMaker job
max_ingestion_time (Optional[int]) -- the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job.
non_terminal_states (Optional[Set]) -- the set of nonterminal states
- Returns
response of describe call after job is done
- check_training_status_with_log(self, job_name, non_terminal_states, failed_states, wait_for_completion, check_interval, max_ingestion_time=None)[source]¶
Display the logs for a given training job, optionally tailing them until the job is complete.
- Parameters
job_name (str) -- name of the training job to check status and display logs for
non_terminal_states (set) -- the set of non_terminal states
failed_states (set) -- the set of failed states
wait_for_completion (bool) -- Whether to keep looking for new log entries until the job completes
check_interval (int) -- The interval in seconds between polling for new log entries and job completion
max_ingestion_time (Optional[int]) -- the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job.
- Returns
None
- list_training_jobs(self, name_contains=None, max_results=None, **kwargs)[source]¶
This method wraps boto3's list_training_jobs. The training job name and max results are configurable via arguments. Other arguments are not, and should be provided via kwargs. Note boto3 expects these in CamelCase format, for example:
list_training_jobs(name_contains="myjob", StatusEquals="Failed")
- Parameters
- Returns
results of the list_training_jobs request
- Return type
List[Dict]
- list_processing_jobs(self, **kwargs)[source]¶
This method wraps boto3's list_processing_jobs. All arguments should be provided via kwargs. Note boto3 expects these in CamelCase format, for example:
list_processing_jobs(NameContains="myjob", StatusEquals="Failed")
- Parameters
kwargs -- (optional) kwargs to boto3's list_training_jobs method
- Returns
results of the list_processing_jobs request
- Return type
List[Dict]