1

There has about 100K strings - prefixes, now we need to know does a given string is matched with one of these prefixes or not. For example, the prefixes are:

12
123
1234
12345

Now the given string is 123abc, it will matched with "123" prefix; If the given string is 12340098, it will matched with "1234" prefix.

Since there has 100K prefixes, therefore we need a very fast way to match it, how could we use the C++ to implement it ?

5
  • 1
    Transform the prefixes into a deterministic finite automaton structure, then use each string to execute the automaton; basically a state machine. For more information, see a good computer science algorithm book. There is a reason it takes four years of college to earn a degree in computer science. This is an advanced subject that cannot be fully explained in a brief answer on stackoveflow.com Commented Apr 18, 2019 at 2:04
  • Put all prefixes in a hashtable, then try to find the given strings first N chars, increasing N, until it fails? Commented Apr 18, 2019 at 2:05
  • 1
    There is no built in type that will be the "fastest", one algorithm that does meet that bill is a ternary search tree which requires 1/25 of the pointer memory needed by a traditional TRIE. Commented Apr 18, 2019 at 2:35
  • @DavidC.Rankin Thanks, it's seems that github project is good to me, I will learn it. Commented Apr 18, 2019 at 3:25
  • It's a blistering fast prefix search written to implement word-completion for a text editor. The only complexity is on deleting a reordering nodes on word removal -- similar to balancing a Red-Black or AVL tree. (though I'm a bit biased having spent the time to write it :) Commented Apr 18, 2019 at 3:27

1 Answer 1

4

I think you're looking for the trie data structure, which is optimized for queries of the form "are any of these strings prefixes of a given string?" or "is this given string a prefix of any of these other strings?" (This is related to the deterministic finite automaton that @Sam Varshavchik mention in the comment, though that connection requires a bit of CS theory to fully understand).

There are many ways to implement a trie in C++. I'd advise starting off by reading up on the data structure to get a better sense for how it works, then using that to guide your implementation. If in the course of coding it up you run into some issues, feel free to post a follow-up question.

Sign up to request clarification or add additional context in comments.

2 Comments

This answer probably could be made better by adding at least some explanation, even if it's very basic about what's going on, in case the link breaks in the future or something.
Thanks, I will try the Ternary Search Tree

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.