1

I fetch data daily from a source which gives same record twice or thrice. I want to ensure unique record in my database without using the indexing.

1
  • hi, welcome, Please edit your question and provide more information - There is not enough information. Also, write your query and read this before posting stackoverflow.com/help/how-to-ask Commented Feb 6, 2019 at 7:15

1 Answer 1

2

If the records are in a canonical format and are not too large you could take a hash of each record and store the hashes as the _id key of the document during insertion (this is automatically indexed and you can't turn off the index).

In Python (you need the pymongo driver installed pip install pymongo).

>>> import json
>>> import pymongo
>>> client = pymongo.MongoClient()
>>> db=client['test']
>>> collection=db['test']
>>> a={"a":"b"}
>>> s = json.dumps(a)
>>> hash(s)
7926683998783294113
>>> collection.insert_one({"_id" : hash(s), "data" : a})
<pymongo.results.InsertOneResult object at 0x104ff7bc8>
>>> collection.insert_one({"_id" : hash(s), "data" : a})
...
    raise DuplicateKeyError(error.get("errmsg"), 11000, error)
pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: test.test index: _id_ dup key: { : 7926683998783294113 }
>>>

So to detect your duplicate records just catch DuplicateKeyError. This approach presumes that the dumps output is identical for identical records.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.