0

My conceptual understanding of a java.util.HashMap is as follows:

  1. Its main asset over other Map implementations is constant lookup time, assuming there are no collisions. For this reason the underlying implementation uses an array of fixed length - the only data structure in computer science that has O(1) lookup.

  2. The fixed length array used to store the Map entries is initialised to a given size upon instantiation and expanded (by expanded, I mean a larger array is created and the values copied across) as the size of the Map approaches the length of the fixed length array.

  3. When a value is put into the Map, the key value pair are put into an internal linked list implementation for the given key. When there is a collision subsequent key value pairs are appended to the list.

  4. When getting from the Map, the hashCode() of the key is used to derive the array index of the internal linked list implementation and you either have your value if the list has size 1, or you iterate through the list calling equals() on the key of each element until you find your values.

Based on point 2, HashMap has to expand an array, an operation which is surely linear. Why does it use an internal linked list implementation (O(n) lookup) for collision resolution? Why doesn't it use a datastructure with O(log n) lookup, like a binary or red black tree, to enhance performance?

4
  • 2
    Because one is expecting each bucket to contain at most just a handful of entries. Commented Dec 6, 2014 at 20:25
  • 3
    Because you hope to get only very few collisions in the first place. For only a handful of items, a linear search is not dramatic. Commented Dec 6, 2014 at 20:26
  • 1
    If i recall correctly, Java 8 does fall back to a binary search tree if the number of collisions in one bucket exceeds some threshold. Commented Dec 7, 2014 at 0:11
  • @LouisWasserman, thanks for your input, really interesting stuff! Any chance of a link to some documentation or source code? It would be great to formalise an answer! Commented Dec 7, 2014 at 0:22

2 Answers 2

3

http://openjdk.java.net/jeps/180

As of Java 8, HashMap does fall back to a binary tree if there are enough collisions.

Sign up to request clarification or add additional context in comments.

1 Comment

It is a balanced binary search tree, to be more specific
2

Although it doesn't guarantee O(1) insertion time, it does have amortized O(1) insertion time, which is to say that if you insert a large number of elements one by one, the total time taken to insert them will be proportional to the number of elements you insert.

It won't improve this to alter the data structure used for the buckets. The point of the array expansion is to ensure that the expected number of entries in each bucket is constant; this means that there's still constant-time insertion and lookup, even with a linked list.

The numbers are very carefully worked out, in terms of when to expand, and how much to expand by (doubling the size of the array). It is a very similar technique to that used in ArrayList, to guarantee amortized O(1) addition to the list.

12 Comments

I'm not suggesting improving the insertion time. I'm suggesting improving the lookup time for an entry in a bucket with collisions. Searching for one entry in a linked list is O(n/2) but there are data structures such as some tree implementations with O(log n) lookup. I think the comments above have nailed it - you just don't expect enough collisions to make it worthwhile.
@RobertBain You said that Based on point 2, HashMap doesn't guarantee insertion time. Expanding an array is surely linear. The first paragraph is the answer to this: it does have amortized insertion time. The rest answers your main question: the expected number of entries in a bucket is constant, so the lookup is still constant. I've correctly answered your main question, and also dealt with a misunderstanding in your post.
Firstly thanks for your answer, I appreciate your time but I still don't feel I'm quite satisfied with my understanding. Can you explain the sentence "The point of the array expansion is to ensure that the expected number of entries in each bucket is constant"? In my mind the point of array expansion is because there are more separate hash values for keys than there are places to put the values in the HashMap. My understanding is that key value pairs are assigned an array index based on the hashCode() value. Appologies If I'm missing something.
@RobertBain you're right on both counts. The point of expanding the array is to make more space for the entries, and it's expanded at just the right rate to make sure that, if the hash function returns effectively random values, then, on average, there will be a small (constant) number of entries in any given bucket. In other words, if you add a million elements, then the array will expand to the point where the expected (average) number of entries in a bucket is the same as it was before expansion. (Of course, it is up to the programmer to ensure that the hash function does its job...)
I see where you're coming from but disagree with (it's expanded at just the right rate to make sure that, if the hash function returns effectively random values, then, on average, there will be a small (constant) number of entries in any given bucket) - that's irrelevant to the HashMap implementation, that's purely down to the hashCode() implementation of the key. All the HashMap does is expand the array when it's getting full.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.