airflow.providers.google.cloud.operators.dataprep

This module contains a Google Dataprep operator.

Module Contents

Classes

DataprepGetJobsForJobGroupOperator

Get information about the batch jobs within a Cloud Dataprep job.

DataprepGetJobGroupOperator

Get the specified job group.

DataprepRunJobGroupOperator

Create a jobGroup, which launches the specified job as the authenticated user.

DataprepCopyFlowOperator

Create a copy of the provided flow id, as well as all contained recipes.

DataprepDeleteFlowOperator

Delete the flow with provided id.

DataprepRunFlowOperator

Runs the flow with the provided id copy of the provided flow id.

class airflow.providers.google.cloud.operators.dataprep.DataprepGetJobsForJobGroupOperator(*, dataprep_conn_id='dataprep_default', job_group_id, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Get information about the batch jobs within a Cloud Dataprep job.

API documentation: https://clouddataprep.com/documentation/api#section/Overview.

See also

For more information on how to use this operator, take a look at the guide: Get Jobs For Job Group

:param job_group_id The ID of the job group that will be requests

template_fields: Sequence[str] = ('job_group_id',)[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataprep.DataprepGetJobGroupOperator(*, dataprep_conn_id='dataprep_default', project_id=PROVIDE_PROJECT_ID, job_group_id, embed, include_deleted, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Get the specified job group.

A job group is a job that is executed from a specific node in a flow.

API documentation: https://clouddataprep.com/documentation/api#section/Overview.

See also

For more information on how to use this operator, take a look at the guide: Get Job Group

Parameters
  • job_group_id (int | str) – The ID of the job group that will be requests

  • embed (str) – Comma-separated list of objects to pull in as part of the response

  • include_deleted (bool) – if set to “true”, will include deleted objects

template_fields: Sequence[str] = ('job_group_id', 'embed', 'project_id')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataprep.DataprepRunJobGroupOperator(*, project_id=PROVIDE_PROJECT_ID, dataprep_conn_id='dataprep_default', body_request, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Create a jobGroup, which launches the specified job as the authenticated user.

This performs the same action as clicking on the Run Job button in the application.

To get recipe_id please follow the Dataprep API documentation: https://clouddataprep.com/documentation/api#operation/runJobGroup.

See also

For more information on how to use this operator, take a look at the guide: Run Job Group

Parameters
  • dataprep_conn_id (str) – The Dataprep connection ID

  • body_request (dict) – Passed as the body_request to GoogleDataprepHook’s run_job_group, where it’s the identifier for the recipe to run

template_fields: Sequence[str] = ('body_request',)[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataprep.DataprepCopyFlowOperator(*, project_id=PROVIDE_PROJECT_ID, dataprep_conn_id='dataprep_default', flow_id, name='', description='', copy_datasources=False, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Create a copy of the provided flow id, as well as all contained recipes.

Parameters
  • dataprep_conn_id (str) – The Dataprep connection ID

  • flow_id (int | str) – ID of the flow to be copied

  • name (str) – Name for the copy of the flow

  • description (str) – Description of the copy of the flow

  • copy_datasources (bool) – Bool value to define should the copy of data inputs be made or not.

template_fields: Sequence[str] = ('flow_id', 'name', 'project_id', 'description')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataprep.DataprepDeleteFlowOperator(*, dataprep_conn_id='dataprep_default', flow_id, **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Delete the flow with provided id.

Parameters
  • dataprep_conn_id (str) – The Dataprep connection ID

  • flow_id (int | str) – ID of the flow to be copied

template_fields: Sequence[str] = ('flow_id',)[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.google.cloud.operators.dataprep.DataprepRunFlowOperator(*, project_id=PROVIDE_PROJECT_ID, flow_id, body_request, dataprep_conn_id='dataprep_default', **kwargs)[source]

Bases: airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator

Runs the flow with the provided id copy of the provided flow id.

Parameters
  • dataprep_conn_id (str) – The Dataprep connection ID

  • flow_id (int | str) – ID of the flow to be copied

  • body_request (dict) – Body of the POST request to be sent.

template_fields: Sequence[str] = ('flow_id', 'project_id')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?