airflow.providers.amazon.aws.transfers.mongo_to_s3
¶
Module Contents¶
-
airflow.providers.amazon.aws.transfers.mongo_to_s3.
_DEPRECATION_MSG
= The s3_conn_id parameter has been deprecated. You should pass instead the aws_conn_id parameter.[source]¶
-
class
airflow.providers.amazon.aws.transfers.mongo_to_s3.
MongoToS3Operator
(*, s3_conn_id: Optional[str] = None, mongo_conn_id: str = 'mongo_default', aws_conn_id: str = 'aws_default', mongo_collection: str, mongo_query: Union[list, dict], s3_bucket: str, s3_key: str, mongo_db: Optional[str] = None, replace: bool = False, allow_disk_use: bool = False, compression: Optional[str] = None, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Operator meant to move data from mongo via pymongo to s3 via boto.
- Parameters
mongo_conn_id (str) -- reference to a specific mongo connection
aws_conn_id (str) -- reference to a specific S3 connection
mongo_collection (str) -- reference to a specific collection in your mongo db
mongo_query (list) -- query to execute. A list including a dict of the query
s3_bucket (str) -- reference to a specific S3 bucket to store the data
s3_key (str) -- in which S3 key the file will be stored
mongo_db (str) -- reference to a specific mongo database
replace (bool) -- whether or not to replace the file in S3 if it previously existed
allow_disk_use (bool) -- in the case you are retrieving a lot of data, you may have to use the disk to save it instead of saving all in the RAM
compression (str) -- type of compression to use for output file in S3. Currently only gzip is supported.
-
static
_stringify
(iterable: Iterable, joinable: str = '\n')[source]¶ Takes an iterable (pymongo Cursor or Array) containing dictionaries and returns a stringified version using python join
-
static
transform
(docs: Any)[source]¶ This method is meant to be extended by child classes to perform transformations unique to those operators needs. Processes pyMongo cursor and returns an iterable with each element being a JSON serializable dictionary
Base transform() assumes no processing is needed ie. docs is a pyMongo cursor of documents and cursor just needs to be passed through
Override this method for custom transformations