I dont know how to put this. I will give you one example. I am having a list of strings, say {"abc-1","abc-2","abc-3",abc-4", "xyz-98","xyz-76","xyz-34","xyz-87" "foo-1a","foo-1b","foo-1c"} I hope you got a picture of the problem. So which algorithm will be best to this kind of scenario where we can have similar strings of a large number. or May be How can i optimize existing algorithm to achieve best performance?
-
1What is the final desired output you need? Can you please elaborate?CodeHunter– CodeHunter2017-07-25 21:42:37 +00:00Commented Jul 25, 2017 at 21:42
-
Oh my bad. Basically I want to do is Optimal performance on Search. List is going to be static and will be having approx 100K strings.grigory mendel– grigory mendel2017-07-26 06:18:49 +00:00Commented Jul 26, 2017 at 6:18
-
That still doesn't answer my question. What do you want to search and what should you get in the output? Can you work out a use case?CodeHunter– CodeHunter2017-07-26 06:28:00 +00:00Commented Jul 26, 2017 at 6:28
-
Search is Basically exact String matching. In above example I want to see whether string "abc-2" exists in the list.grigory mendel– grigory mendel2017-07-26 06:33:41 +00:00Commented Jul 26, 2017 at 6:33
-
and by "similar strings of large number", you mean that length of the pattern string you want to search might be very large in the given pool of strings?CodeHunter– CodeHunter2017-07-26 06:36:24 +00:00Commented Jul 26, 2017 at 6:36
2 Answers
Optimize it for what? speed or size. If the list is small enough and you are only looking for exact matches a map (HashMap / HashTable) would work but would take up a good amount of space. You could use a Trie (prefix tree) which would space on some space and also allow prefix matching but slightly slower then a map.
1 Comment
As per your requirement, you can use a Prefix Tree (Trie) with a little pre processing for your strings in the list using a Trie and a boolean array. Here is the idea:
1. Create a pool of already existing strings from the List of Strings
present and create a lookup based on Trie.
2. While creating the pool, split the string using "-" as delimiter.
3. The String part goes to searchin the Trie you have created.
If you find the search string in Trie, then search the Integer
part in the boolean array.
4. The boolean array is an array that would store true at the index
of number that is the post fix of the search string and is
attached to the last node of the trie prefix.
In short, suppose you want to search for String s = abc-2.
String[] inputStr = s.split("-");
if(searchTrieNode(inputStr[0])
if(boolArr[inputStr[1]])
return true;
Edit: If the size of the array is large, we can also use a bit string to store the number information of the pattern string and attach it to the next pointer of the last trie node found. We simply need to set the nth bit. For e.g., if we have abc-12, we can set 12th bit to 1 and attach it to next pointer of the abc trie structure. This way, we won't have any memory wastage too. And while searching, we would just need to retrieve the nth bit and check if it is set to 1.
11 Comments
abc-12, and you also pointed out that post string is from limited source, then we can Store a list of next pointer as values mapped to these strings. At any time, We would just fetch the List of values associated with the key inputStr[1] and iterate in that list to check if the next pointer is present in that list or not.