1

I dont know how to put this. I will give you one example. I am having a list of strings, say {"abc-1","abc-2","abc-3",abc-4", "xyz-98","xyz-76","xyz-34","xyz-87" "foo-1a","foo-1b","foo-1c"} I hope you got a picture of the problem. So which algorithm will be best to this kind of scenario where we can have similar strings of a large number. or May be How can i optimize existing algorithm to achieve best performance?

6
  • 1
    What is the final desired output you need? Can you please elaborate? Commented Jul 25, 2017 at 21:42
  • Oh my bad. Basically I want to do is Optimal performance on Search. List is going to be static and will be having approx 100K strings. Commented Jul 26, 2017 at 6:18
  • That still doesn't answer my question. What do you want to search and what should you get in the output? Can you work out a use case? Commented Jul 26, 2017 at 6:28
  • Search is Basically exact String matching. In above example I want to see whether string "abc-2" exists in the list. Commented Jul 26, 2017 at 6:33
  • and by "similar strings of large number", you mean that length of the pattern string you want to search might be very large in the given pool of strings? Commented Jul 26, 2017 at 6:36

2 Answers 2

1

Optimize it for what? speed or size. If the list is small enough and you are only looking for exact matches a map (HashMap / HashTable) would work but would take up a good amount of space. You could use a Trie (prefix tree) which would space on some space and also allow prefix matching but slightly slower then a map.

Sign up to request clarification or add additional context in comments.

1 Comment

this list going to be static. So Insert operation is not a priority. List is going to be big as well(~100K). If i am going to use a Trie(Assuming its a compact Trie), its going to be a Trie with Very low depth because the names may not be having common prefixes other than the part after "-" and with a lot of branches(may be greater than 1000)
1

As per your requirement, you can use a Prefix Tree (Trie) with a little pre processing for your strings in the list using a Trie and a boolean array. Here is the idea:

1. Create a pool of already existing strings from the List of Strings
   present and create a lookup based on Trie.
2. While creating the pool, split the string using "-" as delimiter.
3. The String part goes to searchin the Trie you have created. 
   If you find the search string in Trie, then search the Integer 
   part in the boolean array.
4. The boolean array is an array that would store true at the index 
   of number that is the post fix of the search string and is
   attached to the last node of the trie prefix.

In short, suppose you want to search for String s = abc-2.

String[] inputStr = s.split("-");
if(searchTrieNode(inputStr[0])
    if(boolArr[inputStr[1]])
        return true;

Edit: If the size of the array is large, we can also use a bit string to store the number information of the pattern string and attach it to the next pointer of the last trie node found. We simply need to set the nth bit. For e.g., if we have abc-12, we can set 12th bit to 1 and attach it to next pointer of the abc trie structure. This way, we won't have any memory wastage too. And while searching, we would just need to retrieve the nth bit and check if it is set to 1.

11 Comments

The characters after "-" is not limited to numbers. It can be strings as well(But from a limited pool of strings)
why dont we include the part after "-" also in trie?
Including the part after "-" would unnecessarily increase the depth of the trie which is unwanted. We can rather have HashMap then to map the next pointer of each trie node with the List of values associated with that particular Trie node? Like suppose you have abc-12, and you also pointed out that post string is from limited source, then we can Store a list of next pointer as values mapped to these strings. At any time, We would just fetch the List of values associated with the key inputStr[1] and iterate in that list to check if the next pointer is present in that list or not.
Including post string wouldn't increase the depth that much (max may be 5 or 6) But might increase the width(or we call it branches?). Basically whatever comes after the "-" is keys to uniquely identify.. lets say a db record. (we can say that "abc" is a db table and "-key1_key2" are the primary keys.). and we are searching for db records
And these keys are at most 5 or 6
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.