Google Cloud Life Sciences Operators¶
The Google Cloud Life Sciences is a service that executes series of compute engine containers on the Google Cloud. It is used to process, analyze and annotate genomics and biomedical data at scale.
Prerequisite Tasks¶
Pipeline Configuration¶
In order to run the pipeline, it is necessary to configure the request body. Here is an example of the pipeline configuration with a single action.
SIMPLE_ACTION_PIPELINE = {
"pipeline": {
"actions": [
{"imageUri": "bash", "commands": ["-c", "echo Hello, world"]},
],
"resources": {
"regions": [f"{LOCATION}"],
"virtualMachine": {
"machineType": "n1-standard-1",
},
},
},
}
The pipeline can also be configured with multiple action.
MULTI_ACTION_PIPELINE = {
"pipeline": {
"actions": [
{
"imageUri": "google/cloud-sdk",
"commands": ["gsutil", "cp", f"gs://{BUCKET}/{FILENAME}", "/tmp"],
},
{"imageUri": "bash", "commands": ["-c", "echo Hello, world"]},
{
"imageUri": "google/cloud-sdk",
"commands": [
"gsutil",
"cp",
f"gs://{BUCKET}/{FILENAME}",
f"gs://{BUCKET}/output.in",
],
},
],
"resources": {
"regions": [f"{LOCATION}"],
"virtualMachine": {
"machineType": "n1-standard-1",
},
},
}
}
Read about the request body parameters to understand all the fields you can include in the configuration
Running a pipeline¶
Use the
LifeSciencesRunPipelineOperator
to execute pipelines.
simple_life_science_action_pipeline = LifeSciencesRunPipelineOperator(
task_id='simple-action-pipeline',
body=SIMPLE_ACTION_PIPELINE,
project_id=PROJECT_ID,
location=LOCATION,
)