4

I have project that is handling a large amount of data that is being written to an excel file. I store this data in a static HashMap in the form Map<List<String>, Integer>, where the size of the list is only ever 3. The number of entries in the Map however can range anywhere from 0 to 11,300.

The flow of this project is:

  • Load Map up with entries

  • Iterate Map and do stuff

  • Clear map for next set of entries

What I recently found out about HashMap though is how it re-sizes when the set size is breached. So not only is my Map re-sizing constantly at dramatic lengths, but it could very well have about 20,000 empty entries by the time I clear the largest set of entries.

So I'm trying to micro-optimize this thing and I'm stuck with a dilemma of how to do this. My two thoughts are to:

  1. Set the default value of the initial HashMap to a value that would allow it to at most ever re-size only once

  2. Reinitialize the HashMap with the average size that is expected for each new entry set to limit re-sizing and allow the garbage collector to do some clean up

My intuition tells me option two might be the most reasonable one, but that could still prove for lots of re-sizing depending the next entry set. But then option one greatly limits re-sizing to a one time operation but then leaves me with literally thousands of null entries.

Are one of my two proposed solutions better than the other, is there not much difference in memory improvement between the two, or could there be some other solution I have overseen (that does not involve changing the data structure)?

EDIT: Just for some context, I'm wanting to do this because occasionally the project runs out of heap memory and I'm trying to determine how much of an impact this gigantic map is or could be.

EDIT2: Just to clarify, the size of the Map itself is the larger value. The key size (i.e. the list) is ONLY ever at 3.

8
  • 1
    Don't do option 2. If you have enough memory to handle worst case, i.e. 11300 List objects as keys to map, then you have enough memory for the entire process. There is nothing real gained by shrinking the map, but you lose performance by the re-expansion. The memory saved by the shrinking is minimal, compare to everything else going on. This is or course assuming it's a continual process. Don't keep the large-but-empty map around for extended periods of time without using it. In that case, remove the map and reallocate it needed next. Commented Jul 6, 2016 at 19:51
  • Any reason you can't use TreeMap? I don't think it'd be noticeably slower (log_2(11300) is only 13) and it won't have any wasted space. Commented Jul 6, 2016 at 19:53
  • @Oliver The map key is a List, which is not Comparable, preventing use of TreeMap. Could supply a custom Comparator, but then you'd still have to decide on an ordering of the lists, which may not be feasible. Besides, a TreeMap uses more memory than a HashMap at it's peak. Commented Jul 6, 2016 at 19:55
  • Are you that much constrained in memory? Does your application run on an extremely low end or embedded system? Or maybe it must be able to run inside a small container? 20000 entries in a map is hardly the reason for running out of heap on a modern desktop or even smartphone. Commented Jul 6, 2016 at 19:58
  • @Leon I'm not that constrained for memory on my local machine, but the server I'm running my app may or may not have alot to play with, which is why I don't just bump the heap size up to say 3GB Commented Jul 6, 2016 at 20:03

2 Answers 2

16

The question and accepted response here were so wrong, that I had to reply.

I have project that is handling a large amount of data that is being written to an excel file. I store this data in a static HashMap in the form Map, Integer>, where the size of the list is only ever 3. The number of entries in the Map however can range anywhere from 0 to 11,300.

Please don't take me wrong, but this is tiny!!! Don't even bother to optimize something like this! I quickly made a test, filling "11300" elements in a hashmap is less than a dozen of milliseconds.

What I recently found out about HashMap though is how it re-sizes when the set size is > breached. So not only is my Map re-sizing constantly at dramatic lengths, but it could > very well have about 20,000 empty entries by the time I clear the largest set of
entries.

...just to be clear. Empty entries consume almost no space, these are just empty pointers. 8 bytes per slot on 64bit machines, or 4 bytes per slot on 32bit. We're talking about a few kilobytes at most here.

Reinitialize the HashMap with the average size that is expected for each new entry set > to limit re-sizing and allow the garbage collector to do some clean up.

It's not the average "size" of the entries, it's the average amount of entries to be expected.

EDIT: Just for some context, I'm wanting to do this because occasionally the project runs out of heap memory and I'm trying to determine how much of an impact this gigantic map is or could be.

It's unlikely to be the map. Use a profiler! You can store millions of elements without a fuss.


The accepted answer is bad

You could change these values on initialisation, so the size of 11300 and a factorLoad of 1, Meaning the map will not increase in size until your maximum has been met, which in your case, as I understand it, will be never.

This is not good advice. Using the same capacity as the expected number of items inserted and a load factor of "one", you are bound to have really large amounts of hash collisions. This will be a performance disaster.


Conclusion

If you don't know how stuff works, don't try to micro-optimize.

Sign up to request clarification or add additional context in comments.

1 Comment

I asked this back in what feels like ages ago so yes, naivety got the best of me. If I recall correctly, the company I was working for had their server basically maxed and I was looking for anyway to free up memory usage on this app to not kill anything. That being said though, I still appreciate looking back at information I can still use. You're getting downvoted probably cause your answer was more of a criticism (at first) and seems a little stand offish. But reading again, I didn't get that, and get why you replied. Just wanted to let you know that I, as OP, appreciate the nuggets of info.
1

I did some research, by ending up on this page: How does a HashMap work in Java

The second last heading has to do with resizing overhead, stating the defaults for a HashMap is a size of 16, and a factorLoad of 0.75.

You could change these values on initialisation, so the size of 11300 and a factorLoad of 1, Meaning the map will not increase in size until your maximum has been met, which in your case, as I understand it, will be never.

I did a quick experiment, using this code:

public static void main(String[] args) throws Exception {
    Map<String, Integer> map = new HashMap<>(11000000, 1);
    //        Map<String, Integer> map = new HashMap<>();
    for (int i = 0; i < 11000000; i++) {
        map.put(i + "", i);
    }
    System.out.println(map.size());
    Thread.sleep(9000);
}

Swapping the two Map initialisations, and then checking the memory it consumes in Task Manager.

With the initial size and and factorLoad set, it uses ~1.45GB of memory. Without the values set, it uses ~1.87GB of memory.

Re-initialising the Map every time instead of clearing it for a potentially smaller Map to take its place will be slower, but you would possibly end up with more memory temporarily.

You could also do both. Re-initialise to set the initial size and the factorLoad properites, should you know the amount of List objects for each cycle.

The article also suggests that the Java 8 HashMap, though potentially faster, could also potentially have more memory overhead than in Java 7. It might be worth trying to compile the program in both versions and see which provides an improved memory solution. Would be interesting if nothing else.

3 Comments

That is a very good find. Coincidentally, I am running Java 8, so that is definitely worth trying. I will tamper with both suggestions you have just made and see how that affects my performance.
It would seem this freed up enough memory for the application to run to completion. I'm "supposed" to refrain from saying thanks, but thanks. This was a good start in improving the memory usage
Now this is a really bad idea. Using the same upper limit as the expected number of items inserted and a load factor of "one", you are bound to have really large amounts of hash collisions. Which will be a performance disaster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.