Google Cloud Life Sciences Operators

The Google Cloud Life Sciences is a service that executes series of compute engine containers on the Google Cloud. It is used to process, analyze and annotate genomics and biomedical data at scale.

Prerequisite Tasks

Pipeline Configuration

In order to run the pipeline, it is necessary to configure the request body. Here is an example of the pipeline configuration with a single action.

airflow/providers/google/cloud/example_dags/example_life_sciences.pyView Source

SIMPLE_ACTION_PIPELINE = {
    "pipeline": {
        "actions": [
            {"imageUri": "bash", "commands": ["-c", "echo Hello, world"]},
        ],
        "resources": {
            "regions": [f"{LOCATION}"],
            "virtualMachine": {
                "machineType": "n1-standard-1",
            },
        },
    },
}

The pipeline can also be configured with multiple action.

airflow/providers/google/cloud/example_dags/example_life_sciences.pyView Source

MULTI_ACTION_PIPELINE = {
    "pipeline": {
        "actions": [
            {
                "imageUri": "google/cloud-sdk",
                "commands": ["gsutil", "cp", f"gs://{BUCKET}/{FILENAME}", "/tmp"],
            },
            {"imageUri": "bash", "commands": ["-c", "echo Hello, world"]},
            {
                "imageUri": "google/cloud-sdk",
                "commands": [
                    "gsutil",
                    "cp",
                    f"gs://{BUCKET}/{FILENAME}",
                    f"gs://{BUCKET}/output.in",
                ],
            },
        ],
        "resources": {
            "regions": [f"{LOCATION}"],
            "virtualMachine": {
                "machineType": "n1-standard-1",
            },
        },
    }
}

Read about the request body parameters to understand all the fields you can include in the configuration

Running a pipeline

Use the LifeSciencesRunPipelineOperator to execute pipelines.

airflow/providers/google/cloud/example_dags/example_life_sciences.pyView Source

    simple_life_science_action_pipeline = LifeSciencesRunPipelineOperator(
        task_id='simple-action-pipeline',
        body=SIMPLE_ACTION_PIPELINE,
        project_id=PROJECT_ID,
        location=LOCATION,
    )

Reference

For further information, look at:

Was this entry helpful?