airflow.providers.amazon.aws.transfers.mongo_to_s3

Module Contents

Classes

MongoToS3Operator

Move data from MongoDB to S3.

class airflow.providers.amazon.aws.transfers.mongo_to_s3.MongoToS3Operator(*, mongo_conn_id='mongo_default', aws_conn_id='aws_default', mongo_collection, mongo_query, s3_bucket, s3_key, mongo_db=None, mongo_projection=None, replace=False, allow_disk_use=False, compression=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Move data from MongoDB to S3.

See also

For more information on how to use this operator, take a look at the guide: MongoDB To Amazon S3 transfer operator

Parameters
  • mongo_conn_id (str) – reference to a specific mongo connection

  • aws_conn_id (str | None) – reference to a specific S3 connection

  • mongo_collection (str) – reference to a specific collection in your mongo db

  • mongo_query (list | dict) – query to execute. A list including a dict of the query

  • mongo_projection (list | dict | None) – optional parameter to filter the returned fields by the query. It can be a list of fields names to include or a dictionary for excluding fields (e.g projection={"_id": 0} )

  • s3_bucket (str) – reference to a specific S3 bucket to store the data

  • s3_key (str) – in which S3 key the file will be stored

  • mongo_db (str | None) – reference to a specific mongo database

  • replace (bool) – whether or not to replace the file in S3 if it previously existed

  • allow_disk_use (bool) – enables writing to temporary files in the case you are handling large dataset. This only takes effect when mongo_query is a list - running an aggregate pipeline

  • compression (str | None) – type of compression to use for output file in S3. Currently only gzip is supported.

template_fields: Sequence[str] = ('s3_bucket', 's3_key', 'mongo_query', 'mongo_collection')[source]
ui_color = '#589636'[source]
template_fields_renderers[source]
execute(context)[source]

Is written to depend on transform method.

static transform(docs)[source]

Transform the data for transfer.

This method is meant to be extended by child classes to perform transformations unique to those operators needs. Processes pyMongo cursor and returns an iterable with each element being a JSON serializable dictionary

The default implementation assumes no processing is needed, i.e. input is a pyMongo cursor of documents and just needs to be passed through.

Override this method for custom transformations.

Was this entry helpful?