1

I have a huge collection of very small objects. To ensure the data is stored very compactly I rewrote the class to store all information within a byte-array with variable-byte encoding. Most instances of these millions of objects need only 3 to 7 bytes to store all the data.

After memory-profiling I found out that these byte-arrays always take at least 32 bytes.

Is there a way to store the information more compactly than bit-fiddled into a byte[]? Would it be better to point to an unmanaged array?

class MyClass
{
    byte[] compressed;

    public MyClass(IEnumerable<int> data)
    {
        compressed = compress(data);
    }

    private byte[] compress(IEnumerable<int> data)
    {
        // ...
    }

    private IEnumerable<int> decompress(byte[] compressedData)
    {
        // ...
    }

    public IEnumerable<int> Data { get { return decompress(compressed); } }
}
12
  • I added the code. I have to store a few very small integers -- hence the variable-byte-encoding. Commented Jul 6, 2017 at 20:47
  • 1
    Part of your problem is object overhead, which is even worse in a 64 bit build. This has some information about that: Of Memory and Strings Commented Jul 6, 2017 at 20:50
  • 1
    The array is fine -- I actually contains exactly the length I expect. It's the memory profiler that tells me that my byte[]-array of length 3 effectively takes up 32 byte in memory. Commented Jul 6, 2017 at 20:51
  • 1
    It might be simple, and use less memory to just use longs for your little objects, without bothering to make a class. You could write extension methods for packing/unpacking. That will avoid object overhead. Commented Jul 6, 2017 at 21:12
  • 1
    Or you could define a struct that has just one member that is a long value, and have methods for packing/unpacking in that struct. Commented Jul 6, 2017 at 21:18

1 Answer 1

1

There are a couple problems you're facing that eat up memory. One is object overhead, and the other is objects aligning to 32 or 64 bit boundaries (depending on your build). Your current approach suffers from both issues. The following sources describe this in more detail:

I played around with this when I was fiddling with benchmarking sizes.

A solution that is simple would be to simply create a struct that has a single member that is a long value. Its methods would handle packing and unpacking bytes into and out of that long, using shift and mask bit fiddling.

Another idea would be a class that served up objects by ID, and stored the actual bytes in a single backing List<byte>. But this would get complicated and messy. I think the struct idea is much more straightforward.

Sign up to request clarification or add additional context in comments.

1 Comment

I though a lot about a big backing-array and only storing indexes -- but like you said: It's going to be messy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.