MongoDB to Amazon S3¶
Use the MongoToS3Operator
transfer to copy data from a MongoDB collection into an Amazon Simple Storage Service
(S3) file.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AWS Console or AWS CLI.
Install API libraries via pip.
pip install 'apache-airflow[amazon]'Detailed information is available Installation of Apache Airflow®
Operators¶
MongoDB To Amazon S3 transfer operator¶
This operator copies a set of data from a MongoDB collection to an Amazon S3 files.
In order to select the data you want to copy, you need to use the mongo_query
parameter.
To get more information about this operator visit:
MongoToS3Operator
Example usage:
mongo_to_s3_job = MongoToS3Operator(
task_id="mongo_to_s3_job",
mongo_collection=mongo_collection,
# Mongo query by matching values
# Here returns all documents which have "OK" as value for the key "status"
mongo_query={"status": "OK"},
s3_bucket=s3_bucket,
s3_key=s3_key,
mongo_db=mongo_database,
replace=True,
)
You can find more information about PyMongo
used by Airflow to communicate with MongoDB
here.