airflow.contrib.hooks.sagemaker_hook¶
Module Contents¶
- 
airflow.contrib.hooks.sagemaker_hook.argmin(arr, f)[source]¶
- Return the index, i, in arr that minimizes f(arr[i]) 
- 
airflow.contrib.hooks.sagemaker_hook.secondary_training_status_changed(current_job_description, prev_job_description)[source]¶
- Returns true if training job’s secondary status message has changed. 
- 
airflow.contrib.hooks.sagemaker_hook.secondary_training_status_message(job_description, prev_description)[source]¶
- Returns a string contains start time and the secondary training job status message. 
- 
class airflow.contrib.hooks.sagemaker_hook.SageMakerHook(*args, **kwargs)[source]¶
- Bases: - airflow.contrib.hooks.aws_hook.AwsHook- Interact with Amazon SageMaker. - 
tar_and_s3_upload(self, path, key, bucket)[source]¶
- Tar the local file or directory and upload to s3 
 - 
configure_s3_resources(self, config)[source]¶
- Extract the S3 operations from the configuration and execute them. 
 - 
check_training_config(self, training_config)[source]¶
- Check if a training configuration is valid - Parameters
- training_config (dict) – training_config 
- Returns
- None 
 
 - 
check_tuning_config(self, tuning_config)[source]¶
- Check if a tuning configuration is valid - Parameters
- tuning_config (dict) – tuning_config 
- Returns
- None 
 
 - 
get_log_conn(self)[source]¶
- Establish an AWS connection for retrieving logs during training - Return type
 
 - 
log_stream(self, log_group, stream_name, start_time=0, skip=0)[source]¶
- A generator for log items in a single stream. This will yield all the items that are available at the current moment. - Parameters
- log_group (str) – The name of the log group. 
- stream_name (str) – The name of the specific stream. 
- start_time (int) – The time stamp value to start reading the logs from (default: 0). 
- skip (int) – The number of log entries to skip at the start (default: 0). This is for when there are multiple entries at the same timestamp. 
 
- Return type
- Returns
- A CloudWatch log event with the following key-value pairs:’timestamp’ (int): The time in milliseconds of the event.’message’ (str): The log event data.’ingestionTime’ (int): The time in milliseconds the event was ingested.
 
 - 
multi_stream_iter(self, log_group, streams, positions=None)[source]¶
- Iterate over the available events coming from a set of log streams in a single log group interleaving the events from each stream so they’re yielded in timestamp order. - Parameters
- Returns
- A tuple of (stream number, cloudwatch log event). 
 
 - 
create_training_job(self, config, wait_for_completion=True, print_log=True, check_interval=30, max_ingestion_time=None)[source]¶
- Create a training job - Parameters
- config (dict) – the config for training 
- wait_for_completion (bool) – if the program should keep running until job finishes 
- check_interval (int) – the time interval in seconds which the operator will check the status of any SageMaker job 
- max_ingestion_time (int) – the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job. 
 
- Returns
- A response to training job creation 
 
 - 
create_tuning_job(self, config, wait_for_completion=True, check_interval=30, max_ingestion_time=None)[source]¶
- Create a tuning job - Parameters
- config (dict) – the config for tuning 
- wait_for_completion (bool) – if the program should keep running until job finishes 
- check_interval (int) – the time interval in seconds which the operator will check the status of any SageMaker job 
- max_ingestion_time (int) – the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job. 
 
- Returns
- A response to tuning job creation 
 
 - 
create_transform_job(self, config, wait_for_completion=True, check_interval=30, max_ingestion_time=None)[source]¶
- Create a transform job - Parameters
- config (dict) – the config for transform job 
- wait_for_completion (bool) – if the program should keep running until job finishes 
- check_interval (int) – the time interval in seconds which the operator will check the status of any SageMaker job 
- max_ingestion_time (int) – the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job. 
 
- Returns
- A response to transform job creation 
 
 - 
create_model(self, config)[source]¶
- Create a model job - Parameters
- config (dict) – the config for model 
- Returns
- A response to model creation 
 
 - 
create_endpoint_config(self, config)[source]¶
- Create an endpoint config - Parameters
- config (dict) – the config for endpoint-config 
- Returns
- A response to endpoint config creation 
 
 - 
create_endpoint(self, config, wait_for_completion=True, check_interval=30, max_ingestion_time=None)[source]¶
- Create an endpoint - Parameters
- config (dict) – the config for endpoint 
- wait_for_completion (bool) – if the program should keep running until job finishes 
- check_interval (int) – the time interval in seconds which the operator will check the status of any SageMaker job 
- max_ingestion_time (int) – the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job. 
 
- Returns
- A response to endpoint creation 
 
 - 
update_endpoint(self, config, wait_for_completion=True, check_interval=30, max_ingestion_time=None)[source]¶
- Update an endpoint - Parameters
- config (dict) – the config for endpoint 
- wait_for_completion (bool) – if the program should keep running until job finishes 
- check_interval (int) – the time interval in seconds which the operator will check the status of any SageMaker job 
- max_ingestion_time (int) – the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job. 
 
- Returns
- A response to endpoint update 
 
 - 
describe_training_job(self, name)[source]¶
- Return the training job info associated with the name - Parameters
- name (str) – the name of the training job 
- Returns
- A dict contains all the training job info 
 
 - 
describe_training_job_with_log(self, job_name, positions, stream_names, instance_count, state, last_description, last_describe_job_call)[source]¶
- Return the training job info associated with job_name and print CloudWatch logs 
 - 
describe_tuning_job(self, name)[source]¶
- Return the tuning job info associated with the name - Parameters
- name (string) – the name of the tuning job 
- Returns
- A dict contains all the tuning job info 
 
 - 
describe_model(self, name)[source]¶
- Return the SageMaker model info associated with the name - Parameters
- name (string) – the name of the SageMaker model 
- Returns
- A dict contains all the model info 
 
 - 
describe_transform_job(self, name)[source]¶
- Return the transform job info associated with the name - Parameters
- name (string) – the name of the transform job 
- Returns
- A dict contains all the transform job info 
 
 - 
describe_endpoint_config(self, name)[source]¶
- Return the endpoint config info associated with the name - Parameters
- name (string) – the name of the endpoint config 
- Returns
- A dict contains all the endpoint config info 
 
 - 
describe_endpoint(self, name)[source]¶
- Parameters
- name (string) – the name of the endpoint 
- Returns
- A dict contains all the endpoint info 
 
 - 
check_status(self, job_name, key, describe_function, check_interval, max_ingestion_time, non_terminal_states=None)[source]¶
- Check status of a SageMaker job - Parameters
- job_name (str) – name of the job to check status 
- key (str) – the key of the response dict that points to the state 
- describe_function (python callable) – the function used to retrieve the status 
- args – the arguments for the function 
- check_interval (int) – the time interval in seconds which the operator will check the status of any SageMaker job 
- max_ingestion_time (int) – the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job. 
- non_terminal_states (set) – the set of nonterminal states 
 
- Returns
- response of describe call after job is done 
 
 - 
check_training_status_with_log(self, job_name, non_terminal_states, failed_states, wait_for_completion, check_interval, max_ingestion_time)[source]¶
- Display the logs for a given training job, optionally tailing them until the job is complete. - Parameters
- job_name (str) – name of the training job to check status and display logs for 
- non_terminal_states (set) – the set of non_terminal states 
- failed_states (set) – the set of failed states 
- wait_for_completion (bool) – Whether to keep looking for new log entries until the job completes 
- check_interval (int) – The interval in seconds between polling for new log entries and job completion 
- max_ingestion_time (int) – the maximum ingestion time in seconds. Any SageMaker jobs that run longer than this will fail. Setting this to None implies no timeout for any SageMaker job. 
 
- Returns
- None 
 
 
-