Sparse array of dicts - efficient representation

Question

In one of my programs I use a sparse array of data, which is currently implemented as an integer-indexed dict like this:

{
   0: {some dict with data},
   1: {some similar but yet different dict},
   10: {...},
   100: {...},
   200: {...},
   etc
}

It turned out that this dict takes too much memory for my purposes. Is there a way to store sparse arrays more efficiently? I'm ready to sacrifice access time milliseconds for the sake of less memory consumption. The key range is 0..0xFFFFFF, the sparseness is about 30%.

Although a 3rd party module might be an option, I'm more interested in a pure python solution.

Thanks.

To clarify, inner dicts are not subject to optimisation, I'm only trying to arrange them in a better way. For simplicity, let's pretend I have strings rather than dicts there:

data = {
   0: "foo",
   1: "bar",
   10: "...",
   100: "...",
   200: "...",
   etc
}

Not sure what you mean, you are using exactly as much memory as you have values, not sure how this "takes too much memory", isn't it just the amount of memory you need? Or maybe you are asking about compression? Maybe this is helpful: stackoverflow.com/questions/10264874/… — bosnjak
– bosnjak, Commented Apr 10, 2014 at 11:17
@Lawrence: That's not necessarily true: dictionaries have some overhead. As a simple example, compare sys.getsizeof([("a", 1), ("b", 2)]) to sys.getsizeof({"a": 1, "b": 2}) — David Robinson
– David Robinson, Commented Apr 10, 2014 at 11:22
@DavidRobinson: yes, but the overhead is quite minimum, and it most likely can not get smaller, not with pure python. Maybe if there is a pattern in the data, something could be achieved by different arrangement or a custom 'compression' algorithm... — bosnjak
– bosnjak, Commented Apr 10, 2014 at 11:26
What kind of data is in the inner dicts? The memory consumption of this structure is dominated by the inner dicts, so without knowing more about them, it's impossible to give a useful answer. — Sven Marnach
– Sven Marnach, Commented Apr 10, 2014 at 11:26
I get that this is what you are hoping for. The point I'm trying to make is that you can't possibly gain anything this way. At least 95 % of your memory consumption is by your inner dicts. Even if you managed to squeeze the consumption of the outer dict by a factor of ten, your total memory consumption would only be reduced to 95.5 % at best, which would hardly help. — Sven Marnach
– Sven Marnach, Commented Apr 10, 2014 at 14:41

aquavitae · Accepted Answer · 2014-04-10 11:50:25Z

3

If the structure is mapping, then a dict-like object is really the right option, and if memory is an issue then the obvious solution is to work off a file instead. The easiest approach may be use a pandas Series, which can used as dict and can work directly through an HDF5 file (see http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables)

Alternatively, for a pure python solution, you could use the shelve module.

answered Apr 10, 2014 at 11:50

aquavitae

19.4k12 gold badges68 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Sparse array of dicts - efficient representation

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related