Google Cloud Life Sciences Operators

The Google Cloud Life Sciences is a service that executes series of compute engine containers on the Google Cloud. It is used to process, analyze and annotate genomics and biomedical data at scale.

Pipeline Configuration

In order to run the pipeline, it is necessary to configure the request body. Here is an example of the pipeline configuration with a single action.

airflow/providers/google/cloud/example_dags/example_life_sciences.py

SIMPLE_ACTION_PIPELINE = {
    "pipeline": {
        "actions": [
            {"imageUri": "bash", "commands": ["-c", "echo Hello, world"]},
        ],
        "resources": {
            "regions": [f"{LOCATION}"],
            "virtualMachine": {
                "machineType": "n1-standard-1",
            },
        },
    },
}
Copy to clipboard

The pipeline can also be configured with multiple action.

airflow/providers/google/cloud/example_dags/example_life_sciences.py

MULTI_ACTION_PIPELINE = {
    "pipeline": {
        "actions": [
            {
                "imageUri": "google/cloud-sdk",
                "commands": ["gsutil", "cp", f"gs://{BUCKET}/{FILENAME}", "/tmp"],
            },
            {"imageUri": "bash", "commands": ["-c", "echo Hello, world"]},
            {
                "imageUri": "google/cloud-sdk",
                "commands": [
                    "gsutil",
                    "cp",
                    f"gs://{BUCKET}/{FILENAME}",
                    f"gs://{BUCKET}/output.in",
                ],
            },
        ],
        "resources": {
            "regions": [f"{LOCATION}"],
            "virtualMachine": {
                "machineType": "n1-standard-1",
            },
        },
    }
}
Copy to clipboard

Read about the request body parameters to understand all the fields you can include in the configuration

Running a pipeline

Use the LifeSciencesRunPipelineOperator to execute pipelines.

airflow/providers/google/cloud/example_dags/example_life_sciences.py

    simple_life_science_action_pipeline = LifeSciencesRunPipelineOperator(
        task_id='simple-action-pipeline',
        body=SIMPLE_ACTION_PIPELINE,
        project_id=PROJECT_ID,
        location=LOCATION,
    )
Copy to clipboard

Was this entry helpful?