1

Assuming one needs to store a list of items, but it can be stored in any variable type; what would be the most efficient type, if used mostly for matching?

To clarify, a list of items needs to be contained, but the form it's contained in doesn't matter (enum, list, hashmap, Arraylist, etc..) This list of items would be matched against on a regular basis, but not edited. What would the most efficient storage method be, assuming you only need to write to the list once, but could be matching multiple times per second?

Note: No multi-threading

3
  • What exactly do you mean by "matched"? You need to determine whether a given item is present in the set? Commented Sep 17, 2014 at 8:35
  • 4
    You want something with the lowest read time complexity. HashMap has O(1) complexity, which is about as low as it gets. Commented Sep 17, 2014 at 8:36
  • Yeah basically the idea is to determine whether an item in the set is contained in that list Commented Sep 17, 2014 at 8:38

3 Answers 3

1

A HashSet (and HashMap) offers O(1) complexity. Also note that you should create a large enough HashSet with small loadfactor which means that after a hashcode check the elements in the result bucket will also be found very quickly (in a bucket there is a sequential search). Optimally each bucket should contain 1 element at the most.

You can read more about the concept of capacity and load factor in the Javadoc of HashMap.

An even faster solution would be if the number of items is no more than 64 is to create an Enum for them and use EnumSet or EnumMap which stores the elements in a long and uses simple and very fast bit operations to test if an element is in the set or map (a contains operation is just a simple bitmask test).

If you choose to go with the HashSet and not with the Enum approach, know that HashSet uses the hashCode() and equals() methods of the elements. You might consider overriding them to provide a faster implementation knowing the internals of the items you wish to store.
A trivial optimization of overriding the hashCode() can be for example to cache a once computed hash code in the item itself if it doesn't change (and subsequent calls to hashCode() should just return the cached value).

Sign up to request clarification or add additional context in comments.

Comments

0

From your description it seems that order doesn't matter. If this is so, use a Set. Java's standard implementation is the HashSet.

2 Comments

How can you recommend a HashSet without knowing what the data item is? What if it can't be hashed? The performance of a hash-based data structure tends towards that of a list when the hashcode implementation gives a poor distribution.
Everything can be hashed in Java. If the hash code implementation is bad, it tends towards being a List, but won't perform worse than that.
0

Most efficient for repeated lookup would almost certainly be an EnumSet

... Enum sets are represented internally as bit vectors. This representation is extremely compact and efficient. The space and time performance of this class should be good enough to allow its use as a high-quality, typesafe alternative to traditional int-based "bit flags." Even bulk operations (such as containsAll and retainAll) should run very quickly if their argument is also an enum set.

...

Implementation note: All basic operations execute in constant time. They are likely (though not guaranteed) to be much faster than their HashSet counterparts. Even bulk operations execute in constant time if their argument is also an enum set.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.