airflow.providers.presto.transfers.gcs_to_presto

This module contains Google Cloud Storage to Presto operator.

Module Contents

Classes

GCSToPrestoOperator

Loads a csv file from Google Cloud Storage into a Presto table.

class airflow.providers.presto.transfers.gcs_to_presto.GCSToPrestoOperator(*, source_bucket, source_object, presto_table, presto_conn_id='presto_default', gcp_conn_id='google_cloud_default', schema_fields=None, schema_object=None, delegate_to=None, impersonation_chain=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Loads a csv file from Google Cloud Storage into a Presto table. Assumptions: 1. CSV file should not have headers 2. Presto table with requisite columns is already created 3. Optionally, a separate JSON file with headers or list of headers can be provided

Parameters
  • source_bucket (str) – Source GCS bucket that contains the csv

  • source_object (str) – csv file including the path

  • presto_table (str) – presto table to upload the data

  • presto_conn_id (str) – destination presto connection

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud and interact with the Google Cloud Storage service.

  • delegate_to (Optional[str]) – The account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.

  • impersonation_chain (Optional[Union[str, Sequence[str]]]) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account.

template_fields :Sequence[str] = ['source_bucket', 'source_object', 'presto_table'][source]
execute(self, context)[source]

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?