How to use less memory to store data in dict python

Question

I have some data which is about 1.5GB. Now I want to store those information into a big dict in python. However, it costs much bigger than 1.5GB, maybe 10 times. The machine doesn't have so much memory. Is there any way to use less memory to put those data into a dict structure? The key and value are all integer.

Best Regards,

What are you going to do with the dictionary after loading it from file? — piokuc
– piokuc, Commented Nov 22, 2012 at 14:08

user4815162342 · Accepted Answer · 2012-11-22 14:06:53Z

1

Use a fast database that stores the key-value pairs to disk and allows intelligent retrieval and indexing, such as sqlite.

answered Nov 22, 2012 at 14:06

user4815162342

159k22 gold badges350 silver badges418 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

The Unfun Cat · Accepted Answer · 2012-11-22 14:15:39Z

1

You should try using a database so you do not have to store all the data in memory.

A Berkeley Database is ideal for your use as it only stores key-value pairs. It is a "dict" in database form!

The code would look something like:

from bsddb3 import db
dbdict = DB()
dbdict.open("your database", None, db.DB_HASH, db.DB_CREATE)
dbdict[3]=2 #works just like a dict!

Here are bindings: Python "bindings" for Oracle Berkeley DB

edited Nov 22, 2012 at 14:15

answered Nov 22, 2012 at 14:07

The Unfun Cat

32.5k32 gold badges127 silver badges168 bronze badges

Comments

Darknight · Accepted Answer · 2012-11-22 14:21:33Z

0

Use pickle object to store data in dictionary. Refer this link to use pickle http://wiki.python.org/moin/UsingPickle

answered Nov 22, 2012 at 14:21

Darknight

1,1722 gold badges15 silver badges33 bronze badges

Comments

piokuc · Accepted Answer · 2012-11-22 14:29:31Z

0

If the keys are integers, then, depending on the range of the keys, you can use an array http://docs.python.org/2/library/array.html instead of dictionary. Your keys become indices in the array, that's all. This will be more memory efficient than creating a dictionary.

If you don't have enough RAM to fit all your data into an array, then use something like sqlite or Berkeley DB to, effectively, have a dictionary on file. Of course, it will be much slower.

edited Nov 22, 2012 at 14:29

answered Nov 22, 2012 at 14:09

piokuc

26.3k11 gold badges76 silver badges105 bronze badges

Comments

LtWorf · Accepted Answer · 2012-11-22 14:52:57Z

0

Since your indexes and data are integers, you can save your data in a file and access it as if it was an array, but only the pages you are working on will be in the RAM, the others will stay on the disk.

See http://docs.python.org/2/library/mmap.html

mmap is byte-based which means that an index in that will be like index*sizeof(int) on your architecture, and you will need to read sizeof(int) bytes instead of just one byte, and use the struct module (http://docs.python.org/2/library/struct.html) to convert that into a python integer.

This solution is a bit slower than using an array if all the data fits in the RAM, if your system starts paging out then this solution will be faster than using normal arrays.

answered Nov 22, 2012 at 14:52

LtWorf

7,6786 gold badges36 silver badges48 bronze badges

4 Comments

user4815162342 Over a year ago

Note that the poster has keys and values, which indicates a mapping and not an array. It would of course be possible to implement a hash-table using only the mmap module, but it's much better (and faster) to use an existing on-disk database such as sqlite or Berkley DB.

LtWorf Over a year ago

Yes but both keys and values seems to be "int", and given the size of the whole thing i guess keys aren't so sparse. So it's pretty much an array, wich is waay faster than using a database. Because databases are made to handle a lot of things which are not needed here.

user4815162342 Over a year ago

The keys are integers, which could easily refer to 64-bit ints of unknown sparseness. The poster speaks of "big dict", so it's clear that a dict-like (as opposed to array-like) interface is desired. Without knowing the intended usage pattern it's hard to predict whether a database or a hand-rolled solution based on mmap/struct would be faster.

LtWorf Over a year ago

Yes i agree. I posted the solution as one of the possible options, not as the answer of life universe and everything.

Collectives™ on Stack Overflow

How to use less memory to store data in dict python

5 Answers 5

Comments

Comments

Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related