I am currently working in a java application using a lot of strings (+2000). I want to store these strings in an proper structure, so when i want to store a new string, i can check in a fast way if a same string was already there. If no same string was in the structure, i proceed to store the new one (basically store without repeating strings.).
//PSEUDOCODE
private ?????? myCollectionOfStrings;
public void store_If_Not_Exist(String aNewString){
if (!exist_in_Collection(aNewString)){ //this must be fast.
store_in_Collection(aNewString);
}
}
I am currently using a naive implementation, but i know that is really inefficient:
private List<String> myCollectionOfStrings;
public void store_If_Not_Exist(String aNewString){
boolean existInCollection = false;
for (String s: myCollectionOfStrings){
if (s.equals(aNewString)){
existInCollection = true;
break;
}
}
if(!existInCollection)
store_in_Collection(aNewString);
}
The question is : What kind of method/Structure/Algorithm can i use to store the strings, so the check for existence can be implemented in a fast way? Maybe a Trie Tree, or a HashMap???. Thanks!
Set<String>. But anything that looks up by hashcode is relatively efficient. 2000 is not that large. I assume, of course, that you are looking for a direct match and not things like stemming, plurals, etc. Actually, usingSetwould allow by-passing the check, since only one instance will be present.HashSet. It has O(1) lookup time for an element.HashSet, which is very fastSet.