0

If I have a list of string like

var MyList = new List<string>
{
    "substring1", "substring2", "substring3", "substring4", "substring5"
};

is there any efficient way to determine which elements of that list are contained in the following string

"substring1 is the substring2 document that was processed electronically"

In this case the result should be

var MySubList = new List<string>
{
    "substring1", "substring2"
};
3
  • 1
    Aho–Corasick algorithm (en.wikipedia.org/wiki/…) Commented May 24, 2022 at 4:07
  • var'd variables shouldn't be PascalCase Commented May 24, 2022 at 6:08
  • 1
    Are you matching whole words, or should hello asubstring1z world match too? Commented May 24, 2022 at 6:12

2 Answers 2

1

We can use LINQ Where to query, for every substring, whether the large string Contains the substring:

var MyList = new List<string>
{
    "substring1", "substring2", "substring3", "substring4", "substring5"
};

var Text = "substring1 is the substring2 document that was processed electronically";

var output = MyList.Where(x => Text.Contains(x)).ToList();
Sign up to request clarification or add additional context in comments.

Comments

0
  1. Split the Text by whitespaces
  2. Sort the words alphabetically
  3. Create a unique list from that
var words = Text.Split(" ").OrderBy(word => word).Distinct().ToList();
  1. Create an accumulator collection for the matches
  2. Create two index variables (one for the words, one for the patterns)
List<string> matches = new();
int patternIdx = 0, wordIdx = 0;
  1. Iterate through the lists until you reach one of the collections' end
while(patternIdx < patterns.Count && wordIdx < words.Count)
{

}
  1. Perform a string comparison
  2. Advance index variable(s) based on the comparison result
int comparison = string.Compare(patterns[patternIdx],words[wordIdx]);
switch(comparison)
{
    case > 0: wordIdx++; break;
    case < 0: patternIdx++; break;
    default: 
    {
        matches.Add(patterns[patternIdx]); 
        wordIdx++;
        patternIdx++;
        break;
    }
}

Here I've used C# 9 new feature switch + pattern matching.
If you can't use C# 9 then a if ... else if .. else block would be fine as well.


For the sake of completeness here is the whole code

var Text = "substring1 is the substring2 document that was processed electronically";
var words = Text.Split(" ").OrderBy(x => x).Distinct().ToList();
var patterns = new List<string> {  "substring1", "substring2", "substring3", "substring4", "substring5" };

List<string> matches = new();
int patternIdx = 0, wordIdx = 0;
while(patternIdx < patterns.Count && wordIdx < words.Count)
{
    int comparison = string.Compare(patterns[patternIdx], words[wordIdx]);
    switch(comparison)
    {
        case > 0: wordIdx++; break;
        case < 0: patternIdx++; break;
        default: 
        {
            matches.Add(patterns[patternIdx]); 
            wordIdx++;
            patternIdx++;
            break;
        }
    }
}

Dotnetfiddle link

6 Comments

If I change the text to "substring1 is the substring2, document that was processed electronically" then this code doesn't work anymore.
@Enigmativity Did you just add a comma after substring2?
@Enigmativity In that case all you need to do is to replace Split("") call to Split(' ', ',')
I could have added any character after the substring.
It was a poor specification from the OP. I took it as literally a substring.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.