I have a very simple problem: I need to check if a large (150k) list of strings contains a certain string. Order does not matter, and I only need to check if the list contains a string. What is the most efficient data structure to use?
-
3A list is a data structure. Are you instead asking what is the most efficient approach to using a list data structure to find a matching string?sapbucket– sapbucket2015-06-19 17:52:54 +00:00Commented Jun 19, 2015 at 17:52
-
If you have list already ---> Set<String> set = new HashSet<String>(list);John– John2015-06-19 17:55:13 +00:00Commented Jun 19, 2015 at 17:55
-
I would consider a Trie - Apache commons has a compressed Trie that performs well PatriciaTrie<E>Amir Afghani– Amir Afghani2015-06-19 17:58:05 +00:00Commented Jun 19, 2015 at 17:58
Add a comment
|
3 Answers
look at set (Hashset, enumset) and hash (HashMap,linkedhash...,idnetityhash..) based implementations, they have a speed complexity of O(1) for the contains() method.
this is a great link to use
Comments
You want some structure that uses a hash function to insert, retrieve and delete elements. They usually have a theoretical O(1) complexity in those operations.
If all the strings are different, then you can use a HashSet. If you can have repeated elements, then you can use a HashMap that maps a String to an Integer that has how many of that elements you have.