Google Cloud Life Sciences Operators

The Google Cloud Life Sciences is a service that executes series of compute engine containers on the Google Cloud. It is used to process, analyze and annotate genomics and biomedical data at scale.

Prerequisite Tasks

Pipeline Configuration

In order to run the pipeline, it is necessary to configure the request body. Here is an example of the pipeline configuration with a single action.

airflow/providers/google/cloud/example_dags/example_life_sciences.py[source]

SIMPLE_ACTION_PIPELINE = {
    "pipeline": {
        "actions": [
            {"imageUri": "bash", "commands": ["-c", "echo Hello, world"]},
        ],
        "resources": {
            "regions": [f"{LOCATION}"],
            "virtualMachine": {
                "machineType": "n1-standard-1",
            },
        },
    },
}

The pipeline can also be configured with multiple action.

airflow/providers/google/cloud/example_dags/example_life_sciences.py[source]

MULTI_ACTION_PIPELINE = {
    "pipeline": {
        "actions": [
            {
                "imageUri": "google/cloud-sdk",
                "commands": ["gsutil", "cp", f"gs://{BUCKET}/{FILENAME}", "/tmp"],
            },
            {"imageUri": "bash", "commands": ["-c", "echo Hello, world"]},
            {
                "imageUri": "google/cloud-sdk",
                "commands": [
                    "gsutil",
                    "cp",
                    f"gs://{BUCKET}/{FILENAME}",
                    f"gs://{BUCKET}/output.in",
                ],
            },
        ],
        "resources": {
            "regions": [f"{LOCATION}"],
            "virtualMachine": {
                "machineType": "n1-standard-1",
            },
        },
    }
}

Read about the request body parameters to understand all the fields you can include in the configuration

Running a pipeline

Use the LifeSciencesRunPipelineOperator to execute pipelines.

airflow/providers/google/cloud/example_dags/example_life_sciences.py[source]

    simple_life_science_action_pipeline = LifeSciencesRunPipelineOperator(
        task_id='simple-action-pipeline',
        body=SIMPLE_ACTION_PIPELINE,
        project_id=PROJECT_ID,
        location=LOCATION,
    )

Reference

For further information, look at:

Was this entry helpful?