airflow.providers.amazon.aws.transfers.mongo_to_s3
¶
Module Contents¶
Classes¶
Operator meant to move data from mongo via pymongo to s3 via boto. |
- class airflow.providers.amazon.aws.transfers.mongo_to_s3.MongoToS3Operator(*, mongo_conn_id='mongo_default', aws_conn_id='aws_default', mongo_collection, mongo_query, s3_bucket, s3_key, mongo_db=None, mongo_projection=None, replace=False, allow_disk_use=False, compression=None, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Operator meant to move data from mongo via pymongo to s3 via boto.
See also
For more information on how to use this operator, take a look at the guide: MongoDB To Amazon S3 transfer operator
- Parameters
mongo_conn_id (str) – reference to a specific mongo connection
aws_conn_id (str) – reference to a specific S3 connection
mongo_collection (str) – reference to a specific collection in your mongo db
mongo_query (list | dict) – query to execute. A list including a dict of the query
mongo_projection (list | dict | None) – optional parameter to filter the returned fields by the query. It can be a list of fields names to include or a dictionary for excluding fields (e.g
projection={"_id": 0}
)s3_bucket (str) – reference to a specific S3 bucket to store the data
s3_key (str) – in which S3 key the file will be stored
mongo_db (str | None) – reference to a specific mongo database
replace (bool) – whether or not to replace the file in S3 if it previously existed
allow_disk_use (bool) – enables writing to temporary files in the case you are handling large dataset. This only takes effect when mongo_query is a list - running an aggregate pipeline
compression (str | None) – type of compression to use for output file in S3. Currently only gzip is supported.
- template_fields: Sequence[str] = ('s3_bucket', 's3_key', 'mongo_query', 'mongo_collection')[source]¶
- static transform(docs)[source]¶
This method is meant to be extended by child classes to perform transformations unique to those operators needs. Processes pyMongo cursor and returns an iterable with each element being a JSON serializable dictionary
Base transform() assumes no processing is needed ie. docs is a pyMongo cursor of documents and cursor just needs to be passed through
Override this method for custom transformations