20

I have read somewhere that you can store python objects (more specifically dictionaries) as binaries in MongoDB by using BSON. However right now I cannot find any any documentation related to this.

Would anyone know how exactly this can be done?

5
  • 1
    It's not at all clear what you're trying to do, what you've tried and what didn't work. Please edit the question to include those helpful details. :) Commented Aug 6, 2013 at 20:48
  • 2
    If you're doing that for performance, this benchmark might surprise you. Commented Aug 6, 2013 at 21:15
  • @thg435: Thanks for the link, I will keep it in mind for a project where I/O would be more critical for the performance of my project! Commented Aug 6, 2013 at 22:46
  • @thg435: the major problem for me is that I rely heavily on serialization of numpy data types, which is not supported by the python's json module Commented Feb 23, 2014 at 19:31
  • As a side note, using Pickle (as suggested in the answers) can have some issues: pyvideo.org/video/2566/pickles-are-for-delis-not-software. In summary - problems with security + maintainability of your code. Commented Jul 2, 2015 at 8:53

3 Answers 3

42

There isn't a way to store an object in a file (database) without serializing it. If the data needs to move from one process to another process or to another server, it will need to be serialized in some form to be transmitted. Since you're asking about MongoDB, the data will absolutely be serialized in some form in order to be stored in the MongoDB database. When using MongoDB, it's BSON.

If you're actually asking about whether there would be a way to store a more raw form of a Python object in a MongoDB document, you can insert a Binary field into a document which can contain any data you'd like. It's not directly queryable in any way in that form, so you're potentially loosing a lot of the benefits of using a NoSQL document database like MongoDB.

>>> from pymongo import MongoClient
>>> client = MongoClient('localhost', 27017)
>>> db = client['test-database']
>>> coll = db.test_collection    
>>> # the collection is ready now 
>>> from bson.binary import Binary
>>> import pickle
>>> # create a sample object
>>> myObj = {}
>>> myObj['demo'] = 'Some demo data'
>>> # convert it to the raw bytes
>>> thebytes = pickle.dumps(myObj)
>>> coll.insert({'bin-data': Binary(thebytes)})
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the extensive answer! After all I think I will stick with pickle serialization, to build a JSON object. It outputs identical string for sets containing same strings, which is critical for me. in addition my I/O to the database isn't the most performance-critical part of my code.
There is a typo in the example code: it should read pickle.dumps(myObj) on the before-the-last line
Thanks , pickle.dumps(obj) worked for me (scikit-learn.org/stable/modules/…)
I guess should change the answert as pickle is changed now and it should be pickle.dumps(obj) and not pickle.dump(obj)
so what preprocessing is needed if we want to read the data again from Mongo?
5

Assuming you are not specifically interested in mongoDB, you are probably not looking for BSON. BSON is just a different serialization format compared to JSON, designed for more speed and space efficiency. On the other hand, pickle does more of a direct encoding of python objects.

However, do your speed tests before you adopt pickle to ensure it is better for your use case.

Comments

0

It seems you would still need to serialize using pickle module that would create bytes and de-serializing these bytes with pickle will directly provide python object.

Also, you can store pickled object directly into Mongo.

import pickle as pkl
from uuid import uuid4

from pymongo import MongoClient

data = dict(key='mongo')
picked_data = pkl.dumps(data)
uid = uuid4()

client = MongoClient() # add DB url in the constructor if needed
db = client.test

# insertion
db.data.insert_one({
    'uuid': uid,
    'data': picked_data
})

# retrieval
result = db.data.find_one({'uuid': uid})
assert pkl.loads(result['data']) == data

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.