airflow.contrib.operators.mongo_to_s3

Module Contents

class airflow.contrib.operators.mongo_to_s3.MongoToS3Operator(mongo_conn_id, s3_conn_id, mongo_collection, mongo_query, s3_bucket, s3_key, mongo_db=None, replace=False, *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Mongo -> S3 A more specific baseOperator meant to move data from mongo via pymongo to s3 via boto

things to note

.execute() is written to depend on .transform() .transform() is meant to be extended by child classes to perform transformations unique to those operators needs

template_fields = ['s3_key', 'mongo_query'][source]
execute(self, context)[source]

Executed by task_instance at runtime

static _stringify(iterable, joinable='n')[source]

Takes an iterable (pymongo Cursor or Array) containing dictionaries and returns a stringified version using python join

static transform(docs)[source]
Processes pyMongo cursor and returns an iterable with each element being

a JSON serializable dictionary

Base transform() assumes no processing is needed ie. docs is a pyMongo cursor of documents and cursor just needs to be passed through

Override this method for custom transformations