Google API to Amazon S3¶
Use the GoogleApiToS3Operator
transfer to make requests to any Google API which supports discovery and save
its response in an Amazon S3 file.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation of Airflow®
Operators¶
Google Sheets to Amazon S3 transfer operator¶
This example loads data from Google Sheets and save it to an Amazon S3 file.
tests/system/amazon/aws/example_google_api_sheets_to_s3.py
task_google_sheets_values_to_s3 = GoogleApiToS3Operator(
task_id="google_sheet_data_to_s3",
google_api_service_name="sheets",
google_api_service_version="v4",
google_api_endpoint_path="sheets.spreadsheets.values.get",
google_api_endpoint_params={"spreadsheetId": GOOGLE_SHEET_ID, "range": GOOGLE_SHEET_RANGE},
s3_destination_key=f"s3://{s3_bucket}/{s3_key}",
)
You can find more information about the Google API endpoint used here.
Google Youtube to Amazon S3¶
This is a more advanced example dag for using GoogleApiToS3Operator
which uses xcom to pass data between
tasks to retrieve specific information about YouTube videos.
It searches for up to 50 videos (due to pagination) in a given time range
(YOUTUBE_VIDEO_PUBLISHED_AFTER
, YOUTUBE_VIDEO_PUBLISHED_BEFORE
) on a YouTube channel (YOUTUBE_CHANNEL_ID
)
saves the response in Amazon S3 and also pushes the data to xcom.
tests/system/amazon/aws/example_google_api_youtube_to_s3.py
video_ids_to_s3 = GoogleApiToS3Operator(
task_id="video_ids_to_s3",
google_api_service_name="youtube",
google_api_service_version="v3",
google_api_endpoint_path="youtube.search.list",
gcp_conn_id=conn_id_name,
google_api_endpoint_params={
"part": "snippet",
"channelId": YOUTUBE_CHANNEL_ID,
"maxResults": 50,
"publishedAfter": YOUTUBE_VIDEO_PUBLISHED_AFTER,
"publishedBefore": YOUTUBE_VIDEO_PUBLISHED_BEFORE,
"type": "video",
"fields": "items/id/videoId",
},
google_api_response_via_xcom="video_ids_response",
s3_destination_key=f"https://s3.us-west-2.amazonaws.com/{s3_bucket_name}/youtube_search",
s3_overwrite=True,
)
It passes over the YouTube IDs to the next request which then gets the
information (YOUTUBE_VIDEO_FIELDS
) for the requested videos and saves them in Amazon S3 (S3_BUCKET_NAME
).
tests/system/amazon/aws/example_google_api_youtube_to_s3.py
video_data_to_s3 = GoogleApiToS3Operator(
task_id="video_data_to_s3",
google_api_service_name="youtube",
google_api_service_version="v3",
gcp_conn_id=conn_id_name,
google_api_endpoint_path="youtube.videos.list",
google_api_endpoint_params={
"part": YOUTUBE_VIDEO_PARTS,
"maxResults": 50,
"fields": YOUTUBE_VIDEO_FIELDS,
},
google_api_endpoint_params_via_xcom="video_ids",
s3_destination_key=f"https://s3.us-west-2.amazonaws.com/{s3_bucket_name}/youtube_videos",
s3_overwrite=True,
)