2

This question is a little complicated, so I try to describe it through an example.

First, we get a string foo, and put it into collection S.

Then we get a string sample, and put it into S too.

Next, we get a string oo, obviously oo is a substring of foo, so now collection S contains three members: foo, sample, oo. And foo and oo is in the same group.

The next string in S is food, which is in the same group as foo and oo.

And so on.

Finally we get a large collection in which members are all grouped.

I want to use this algorithm or these algorithms to process duplicate files, but there are some obvious roadblocks:

  • dynamic collection
  • unicode
  • no specific pattern

Simply, I want to find a data structure and algorithm that can group the members of the growing string collection. In other words, in my expectation, this string collection should be composed of trees, each tree contains a longest string and its substrings in the string collection.

Any suggestions?

14
  • 2
    See Trie. Commented Feb 8 at 5:53
  • 2
    What do you intend to do when a new word can fit into two groups? Commented Feb 8 at 6:48
  • @4386427 Simply, I want to find a data structure and algorithm that can group the members of the growing string collection. In other words, in my expectation, this string collection should be composed of trees, each tree contains a longest string and its substrings in the string collection. Commented Feb 8 at 7:13
  • @aaa Trie tree seems good, but as I know, for unicode char, it will take more space. May be I could treat an unicode char as multiple asc2 chars? I am not sure, because I have to record a “valid char”. Commented Feb 8 at 7:17
  • 2
    My take. By an unknown algorithm we're to group things in a complicated way to gain an unclear benefit in some unspecified operations. As much as the pieces kinda look like they could go together in this way, I'd need to see some more detail on why to try it before I'd see the point in putting effort into the approach. Sorry. Commented Feb 8 at 18:48

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.