Google API to Amazon S3 Transfer Operator¶
Use the GoogleApiToS3Transfer transfer to call requests to any Google API which supports discovery and save its response on Amazon S3.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'
Detailed information is available Installation
Google Sheets to Amazon S3¶
This example loads data from Google Sheets and save it to an Amazon S3 file.
task_google_sheets_values_to_s3 = GoogleApiToS3Operator(
task_id='google_sheet_data_to_s3',
google_api_service_name='sheets',
google_api_service_version='v4',
google_api_endpoint_path='sheets.spreadsheets.values.get',
google_api_endpoint_params={'spreadsheetId': GOOGLE_SHEET_ID, 'range': GOOGLE_SHEET_RANGE},
s3_destination_key=S3_DESTINATION_KEY,
)
You can find more information about the Google API endpoint used here.
Google Youtube to Amazon S3¶
This is a more advanced example dag for using GoogleApiToS3Transfer
which uses xcom to pass data between
tasks to retrieve specific information about YouTube videos.
Get YouTube Videos¶
It searches for up to 50 videos (due to pagination) in a given time range
(YOUTUBE_VIDEO_PUBLISHED_AFTER
, YOUTUBE_VIDEO_PUBLISHED_BEFORE
) on a YouTube channel (YOUTUBE_CHANNEL_ID
)
saves the response in Amazon S3 and also pushes the data to xcom.
task_video_ids_to_s3 = GoogleApiToS3Operator(
task_id='video_ids_to_s3',
google_api_service_name='youtube',
google_api_service_version='v3',
google_api_endpoint_path='youtube.search.list',
google_api_endpoint_params={
'part': 'snippet',
'channelId': YOUTUBE_CHANNEL_ID,
'maxResults': 50,
'publishedAfter': YOUTUBE_VIDEO_PUBLISHED_AFTER,
'publishedBefore': YOUTUBE_VIDEO_PUBLISHED_BEFORE,
'type': 'video',
'fields': 'items/id/videoId',
},
google_api_response_via_xcom='video_ids_response',
s3_destination_key=f'{S3_BUCKET_NAME}/youtube_search.json',
s3_overwrite=True,
)
It passes over the YouTube IDs to the next request which then gets the
information (YOUTUBE_VIDEO_FIELDS
) for the requested videos and saves them in Amazon S3 (S3_BUCKET_NAME
).
task_video_data_to_s3 = GoogleApiToS3Operator(
task_id='video_data_to_s3',
google_api_service_name='youtube',
google_api_service_version='v3',
google_api_endpoint_path='youtube.videos.list',
google_api_endpoint_params={
'part': YOUTUBE_VIDEO_PARTS,
'maxResults': 50,
'fields': YOUTUBE_VIDEO_FIELDS,
},
google_api_endpoint_params_via_xcom='video_ids',
s3_destination_key=f'{S3_BUCKET_NAME}/youtube_videos.json',
s3_overwrite=True,
)
Reference¶
For further information, look at: