Google API to Amazon S3 Transfer Operator

Use the GoogleApiToS3Transfer transfer to call requests to any Google API which supports discovery and save its response on Amazon S3.

Prerequisite Tasks

To use these operators, you must do a few things:

Google Sheets to Amazon S3

This example loads data from Google Sheets and save it to an Amazon S3 file.

airflow/providers/amazon/aws/example_dags/example_google_api_sheets_to_s3.py[source]

task_google_sheets_values_to_s3 = GoogleApiToS3Operator(
    task_id='google_sheet_data_to_s3',
    google_api_service_name='sheets',
    google_api_service_version='v4',
    google_api_endpoint_path='sheets.spreadsheets.values.get',
    google_api_endpoint_params={'spreadsheetId': GOOGLE_SHEET_ID, 'range': GOOGLE_SHEET_RANGE},
    s3_destination_key=S3_DESTINATION_KEY,
)

You can find more information about the Google API endpoint used here.

Google Youtube to Amazon S3

This is a more advanced example dag for using GoogleApiToS3Transfer which uses xcom to pass data between tasks to retrieve specific information about YouTube videos.

Get YouTube Videos

It searches for up to 50 videos (due to pagination) in a given time range (YOUTUBE_VIDEO_PUBLISHED_AFTER, YOUTUBE_VIDEO_PUBLISHED_BEFORE) on a YouTube channel (YOUTUBE_CHANNEL_ID) saves the response in Amazon S3 and also pushes the data to xcom.

airflow/providers/amazon/aws/example_dags/example_google_api_youtube_to_s3.py[source]

task_video_ids_to_s3 = GoogleApiToS3Operator(
    task_id='video_ids_to_s3',
    google_api_service_name='youtube',
    google_api_service_version='v3',
    google_api_endpoint_path='youtube.search.list',
    google_api_endpoint_params={
        'part': 'snippet',
        'channelId': YOUTUBE_CHANNEL_ID,
        'maxResults': 50,
        'publishedAfter': YOUTUBE_VIDEO_PUBLISHED_AFTER,
        'publishedBefore': YOUTUBE_VIDEO_PUBLISHED_BEFORE,
        'type': 'video',
        'fields': 'items/id/videoId',
    },
    google_api_response_via_xcom='video_ids_response',
    s3_destination_key=f'{S3_BUCKET_NAME}/youtube_search.json',
    s3_overwrite=True,
)

It passes over the YouTube IDs to the next request which then gets the information (YOUTUBE_VIDEO_FIELDS) for the requested videos and saves them in Amazon S3 (S3_BUCKET_NAME).

airflow/providers/amazon/aws/example_dags/example_google_api_youtube_to_s3.py[source]

task_video_data_to_s3 = GoogleApiToS3Operator(
    task_id='video_data_to_s3',
    google_api_service_name='youtube',
    google_api_service_version='v3',
    google_api_endpoint_path='youtube.videos.list',
    google_api_endpoint_params={
        'part': YOUTUBE_VIDEO_PARTS,
        'maxResults': 50,
        'fields': YOUTUBE_VIDEO_FIELDS,
    },
    google_api_endpoint_params_via_xcom='video_ids',
    s3_destination_key=f'{S3_BUCKET_NAME}/youtube_videos.json',
    s3_overwrite=True,
)

Was this entry helpful?