2

In one of my programs I use a sparse array of data, which is currently implemented as an integer-indexed dict like this:

{
   0: {some dict with data},
   1: {some similar but yet different dict},
   10: {...},
   100: {...},
   200: {...},
   etc
}

It turned out that this dict takes too much memory for my purposes. Is there a way to store sparse arrays more efficiently? I'm ready to sacrifice access time milliseconds for the sake of less memory consumption. The key range is 0..0xFFFFFF, the sparseness is about 30%.

Although a 3rd party module might be an option, I'm more interested in a pure python solution.

Thanks.

To clarify, inner dicts are not subject to optimisation, I'm only trying to arrange them in a better way. For simplicity, let's pretend I have strings rather than dicts there:

data = {
   0: "foo",
   1: "bar",
   10: "...",
   100: "...",
   200: "...",
   etc
}
12
  • Not sure what you mean, you are using exactly as much memory as you have values, not sure how this "takes too much memory", isn't it just the amount of memory you need? Or maybe you are asking about compression? Maybe this is helpful: stackoverflow.com/questions/10264874/… Commented Apr 10, 2014 at 11:17
  • @Lawrence: That's not necessarily true: dictionaries have some overhead. As a simple example, compare sys.getsizeof([("a", 1), ("b", 2)]) to sys.getsizeof({"a": 1, "b": 2}) Commented Apr 10, 2014 at 11:22
  • @DavidRobinson: yes, but the overhead is quite minimum, and it most likely can not get smaller, not with pure python. Maybe if there is a pattern in the data, something could be achieved by different arrangement or a custom 'compression' algorithm... Commented Apr 10, 2014 at 11:26
  • 1
    What kind of data is in the inner dicts? The memory consumption of this structure is dominated by the inner dicts, so without knowing more about them, it's impossible to give a useful answer. Commented Apr 10, 2014 at 11:26
  • 1
    I get that this is what you are hoping for. The point I'm trying to make is that you can't possibly gain anything this way. At least 95 % of your memory consumption is by your inner dicts. Even if you managed to squeeze the consumption of the outer dict by a factor of ten, your total memory consumption would only be reduced to 95.5 % at best, which would hardly help. Commented Apr 10, 2014 at 14:41

1 Answer 1

3

If the structure is mapping, then a dict-like object is really the right option, and if memory is an issue then the obvious solution is to work off a file instead. The easiest approach may be use a pandas Series, which can used as dict and can work directly through an HDF5 file (see http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables)

Alternatively, for a pure python solution, you could use the shelve module.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.