airflow.providers.amazon.aws.transfers.mongo_to_s3¶
Classes¶
| Move data from MongoDB to S3. | 
Module Contents¶
- class airflow.providers.amazon.aws.transfers.mongo_to_s3.MongoToS3Operator(*, mongo_conn_id='mongo_default', aws_conn_id='aws_default', mongo_collection, mongo_query, s3_bucket, s3_key, mongo_db=None, mongo_projection=None, replace=False, allow_disk_use=False, compression=None, **kwargs)[source]¶
- Bases: - airflow.models.BaseOperator- Move data from MongoDB to S3. - See also - For more information on how to use this operator, take a look at the guide: MongoDB To Amazon S3 transfer operator - Parameters:
- mongo_conn_id (str) – reference to a specific mongo connection 
- aws_conn_id (str | None) – reference to a specific S3 connection 
- mongo_collection (str) – reference to a specific collection in your mongo db 
- mongo_query (list | dict) – query to execute. A list including a dict of the query 
- mongo_projection (list | dict | None) – optional parameter to filter the returned fields by the query. It can be a list of fields names to include or a dictionary for excluding fields (e.g - projection={"_id": 0})
- s3_bucket (str) – reference to a specific S3 bucket to store the data 
- s3_key (str) – in which S3 key the file will be stored 
- mongo_db (str | None) – reference to a specific mongo database 
- replace (bool) – whether or not to replace the file in S3 if it previously existed 
- allow_disk_use (bool) – enables writing to temporary files in the case you are handling large dataset. This only takes effect when mongo_query is a list - running an aggregate pipeline 
- compression (str | None) – type of compression to use for output file in S3. Currently only gzip is supported. 
 
 - template_fields: collections.abc.Sequence[str] = ('s3_bucket', 's3_key', 'mongo_query', 'mongo_collection')[source]¶
 - static transform(docs)[source]¶
- Transform the data for transfer. - This method is meant to be extended by child classes to perform transformations unique to those operators needs. Processes pyMongo cursor and returns an iterable with each element being a JSON serializable dictionary - The default implementation assumes no processing is needed, i.e. input is a pyMongo cursor of documents and just needs to be passed through. - Override this method for custom transformations.