how to ensure no duplicates records in mongodb while inserting without using indexing?

Question

I fetch data daily from a source which gives same record twice or thrice. I want to ensure unique record in my database without using the indexing.

hi, welcome, Please edit your question and provide more information - There is not enough information. Also, write your query and read this before posting stackoverflow.com/help/how-to-ask — Ali
– Ali, Commented Feb 6, 2019 at 7:15

Joe Drumgoole · Accepted Answer · 2019-02-06 20:48:12Z

If the records are in a canonical format and are not too large you could take a hash of each record and store the hashes as the _id key of the document during insertion (this is automatically indexed and you can't turn off the index).

In Python (you need the pymongo driver installed pip install pymongo).

>>> import json
>>> import pymongo
>>> client = pymongo.MongoClient()
>>> db=client['test']
>>> collection=db['test']
>>> a={"a":"b"}
>>> s = json.dumps(a)
>>> hash(s)
7926683998783294113
>>> collection.insert_one({"_id" : hash(s), "data" : a})
<pymongo.results.InsertOneResult object at 0x104ff7bc8>
>>> collection.insert_one({"_id" : hash(s), "data" : a})
...
    raise DuplicateKeyError(error.get("errmsg"), 11000, error)
pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: test.test index: _id_ dup key: { : 7926683998783294113 }
>>>

So to detect your duplicate records just catch DuplicateKeyError. This approach presumes that the dumps output is identical for identical records.

Collectives™ on Stack Overflow

how to ensure no duplicates records in mongodb while inserting without using indexing?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related