Amazon Bedrock¶
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation of Apache Airflow®
Generic Parameters¶
- aws_conn_id
Reference to Amazon Web Services Connection ID. If this parameter is set to
None
then the default boto3 behaviour is used without a connection lookup. Otherwise use the credentials stored in the Connection. Default:aws_default
- region_name
AWS Region Name. If this parameter is set to
None
or omitted then region_name from AWS Connection Extra Parameter will be used. Otherwise use the specified value instead of the connection value. Default:None
- verify
Whether or not to verify SSL certificates.
False
- Do not validate SSL certificates.path/to/cert/bundle.pem - A filename of the CA cert bundle to use. You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.
If this parameter is set to
None
or is omitted then verify from AWS Connection Extra Parameter will be used. Otherwise use the specified value instead of the connection value. Default:None
- botocore_config
The provided dictionary is used to construct a botocore.config.Config. This configuration can be used to configure Avoid Throttling exceptions, timeouts, etc.
{ "signature_version": "unsigned", "s3": { "us_east_1_regional_endpoint": True, }, "retries": { "mode": "standard", "max_attempts": 10, }, "connect_timeout": 300, "read_timeout": 300, "tcp_keepalive": True, }
If this parameter is set to
None
or omitted then config_kwargs from AWS Connection Extra Parameter will be used. Otherwise use the specified value instead of the connection value. Default:None
Note
Specifying an empty dictionary,
{}
, will overwrite the connection configuration for botocore.config.Config
Operators¶
Invoke an existing Amazon Bedrock Model¶
To invoke an existing Amazon Bedrock model, you can use
BedrockInvokeModelOperator
.
Note that every model family has different input and output formats. For example, to invoke a Meta Llama model you would use:
invoke_llama_model = BedrockInvokeModelOperator(
task_id="invoke_llama",
model_id=LLAMA_MODEL_ID,
input_data={"prompt": PROMPT},
)
To invoke an Amazon Titan model you would use:
invoke_titan_model = BedrockInvokeModelOperator(
task_id="invoke_titan",
model_id=TITAN_MODEL_ID,
input_data={"inputText": PROMPT},
)
For details on the different formats, see Inference parameters for foundation models
Customize an existing Amazon Bedrock Model¶
To create a fine-tuning job to customize a base model, you can use
BedrockCustomizeModelOperator
.
Model-customization jobs are asynchronous and the completion time depends on the base model
and the training/validation data size. To monitor the state of the job, you can use the
“model_customization_job_complete” Waiter, the
BedrockCustomizeModelCompletedSensor
Sensor,
or the BedrockCustomizeModelCompletedTrigger
Trigger.
customize_model = BedrockCustomizeModelOperator(
task_id="customize_model",
job_name=custom_model_job_name,
custom_model_name=custom_model_name,
role_arn=test_context[ROLE_ARN_KEY],
base_model_id=f"arn:aws:bedrock:us-east-1::foundation-model/{TITAN_MODEL_ID}",
hyperparameters=HYPERPARAMETERS,
training_data_uri=training_data_uri,
output_data_uri=f"s3://{bucket_name}/myOutputData",
)
Sensors¶
Wait for an Amazon Bedrock customize model job¶
To wait on the state of an Amazon Bedrock customize model job until it reaches a terminal state you can use
BedrockCustomizeModelCompletedSensor
await_custom_model_job = BedrockCustomizeModelCompletedSensor(
task_id="await_custom_model_job",
job_name=custom_model_job_name,
)