airflow.providers.mongo.hooks.mongo

Hook for Mongo DB.

Module Contents

Classes

MongoHook

PyMongo wrapper to interact with MongoDB.

class airflow.providers.mongo.hooks.mongo.MongoHook(mongo_conn_id=default_conn_name, *args, **kwargs)[source]

Bases: airflow.hooks.base.BaseHook

PyMongo wrapper to interact with MongoDB.

Mongo Connection Documentation https://docs.mongodb.com/manual/reference/connection-string/index.html You can specify connection string options in extra field of your connection https://docs.mongodb.com/manual/reference/connection-string/index.html#connection-string-options

If you want use DNS seedlist, set srv to True.

ex.

{“srv”: true, “replicaSet”: “test”, “ssl”: true, “connectTimeoutMS”: 30000}

For enabling SSL, the “ssl”: true option can be used within the connection string options, under extra. In scenarios where SSL is enabled, allow_insecure option is not included by default in the connection unless specified. This is so that we ensure a secure medium while handling connections to MongoDB.

The allow_insecure only makes sense in ssl context and is configurable and can be used in one of the following scenarios:

HTTP (ssl = False) Here, ssl is disabled and using allow_insecure doesn’t make sense. Example connection extra: {“ssl”: false}

HTTPS, but insecure (ssl = True, allow_insecure = True) Here, ssl is enabled, and the connection allows insecure connections. Example connection extra: {“ssl”: true, “allow_insecure”: true}

HTTPS, but secure (ssl = True, allow_insecure = False - default when SSL enabled): Here, ssl is enabled, and the connection does not allow insecure connections (default behavior when SSL is enabled). Example connection extra: {“ssl”: true} or {“ssl”: true, “allow_insecure”: false}

Note: tls is an alias to ssl and can be used in place of ssl. Example: {“ssl”: false} or {“tls”: false}.

Parameters

mongo_conn_id (str) – The Mongo connection id to use when connecting to MongoDB.

conn_name_attr = 'mongo_conn_id'[source]
default_conn_name = 'mongo_default'[source]
conn_type = 'mongo'[source]
hook_name = 'MongoDB'[source]
__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
get_conn()[source]

Fetch PyMongo Client.

get_collection(mongo_collection, mongo_db=None)[source]

Fetch a mongo collection object for querying.

Uses connection schema as DB unless specified.

aggregate(mongo_collection, aggregate_query, mongo_db=None, **kwargs)[source]

Run an aggregation pipeline and returns the results.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.aggregate https://pymongo.readthedocs.io/en/stable/examples/aggregation.html

find(mongo_collection: str, query: dict, find_one: typing_extensions.Literal[False], mongo_db: str | None = None, projection: list | dict | None = None, **kwargs) pymongo.cursor.Cursor[source]
find(mongo_collection: str, query: dict, find_one: typing_extensions.Literal[True], mongo_db: str | None = None, projection: list | dict | None = None, **kwargs) Any | None

Run a mongo find query and returns the results.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.find

insert_one(mongo_collection, doc, mongo_db=None, **kwargs)[source]

Insert a single document into a mongo collection.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.insert_one

insert_many(mongo_collection, docs, mongo_db=None, **kwargs)[source]

Insert many docs into a mongo collection.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.insert_many

update_one(mongo_collection, filter_doc, update_doc, mongo_db=None, **kwargs)[source]

Update a single document in a mongo collection.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.update_one

Parameters
  • mongo_collection (str) – The name of the collection to update.

  • filter_doc (dict) – A query that matches the documents to update.

  • update_doc (dict) – The modifications to apply.

  • mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.

update_many(mongo_collection, filter_doc, update_doc, mongo_db=None, **kwargs)[source]

Update one or more documents in a mongo collection.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.update_many

Parameters
  • mongo_collection (str) – The name of the collection to update.

  • filter_doc (dict) – A query that matches the documents to update.

  • update_doc (dict) – The modifications to apply.

  • mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.

replace_one(mongo_collection, doc, filter_doc=None, mongo_db=None, **kwargs)[source]

Replace a single document in a mongo collection.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.replace_one

Note

If no filter_doc is given, it is assumed that the replacement document contain the _id field which is then used as filters.

Parameters
  • mongo_collection (str) – The name of the collection to update.

  • doc (dict) – The new document.

  • filter_doc (dict | None) – A query that matches the documents to replace. Can be omitted; then the _id field from doc will be used.

  • mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.

replace_many(mongo_collection, docs, filter_docs=None, mongo_db=None, upsert=False, collation=None, **kwargs)[source]

Replace many documents in a mongo collection.

Uses bulk_write with multiple ReplaceOne operations https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.bulk_write

Note

If no filter_docs``are given, it is assumed that all replacement documents contain the ``_id field which are then used as filters.

Parameters
  • mongo_collection (str) – The name of the collection to update.

  • docs (list[dict]) – The new documents.

  • filter_docs (list[dict] | None) – A list of queries that match the documents to replace. Can be omitted; then the _id fields from airflow.docs will be used.

  • mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.

  • upsert (bool) – If True, perform an insert if no documents match the filters for the replace operation.

  • collation (pymongo.collation.Collation | None) – An instance of Collation. This option is only supported on MongoDB 3.4 and above.

delete_one(mongo_collection, filter_doc, mongo_db=None, **kwargs)[source]

Delete a single document in a mongo collection.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.delete_one

Parameters
  • mongo_collection (str) – The name of the collection to delete from.

  • filter_doc (dict) – A query that matches the document to delete.

  • mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.

delete_many(mongo_collection, filter_doc, mongo_db=None, **kwargs)[source]

Delete one or more documents in a mongo collection.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.delete_many

Parameters
  • mongo_collection (str) – The name of the collection to delete from.

  • filter_doc (dict) – A query that matches the documents to delete.

  • mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.

distinct(mongo_collection, distinct_key, filter_doc=None, mongo_db=None, **kwargs)[source]

Return a list of distinct values for the given key across a collection.

https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.distinct

Parameters
  • mongo_collection (str) – The name of the collection to perform distinct on.

  • distinct_key (str) – The field to return distinct values from.

  • filter_doc (dict | None) – A query that matches the documents get distinct values from. Can be omitted; then will cover the entire collection.

  • mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.

Was this entry helpful?