1

I am looking in a string for operators. I need the actual operator and its index in the string

For example: x>10&y>=10

Operators

>
&
>=
=

So I need results like

>  1
&  4
>= 6

So I wrote the code like this

string substr= "x>10&y>=10";
List<string> substringList = new List<string>{">", "&", ">=", "="};

 var orderedOccurances = substringList
      .Where((substr) => str.IndexOf(substr, StringComparison.Ordinal) >= 0)
      .Select((substr, inx) => new 
          { substr, inx = str.IndexOf(substr, StringComparison.Ordinal) })
      .OrderBy(x => x.inx).ToList();

However I got results like this(obviously)

 > 1
 & 4
 > 6
 = 7

I can use a for loop for the search and cover this error scenario. But I like the linq short hand code. Is there anyway that I can cover the error condition using lambdas/linq?

11
  • 1
    Where is this monstreous LINQ-statement shorter and in particular better readable than a simple foreach-loop? Commented Feb 21, 2017 at 14:46
  • I sort of agree to that point. I still would like to know if linq can be done for this scenario. Commented Feb 21, 2017 at 14:49
  • 2
    Don't do any of this. You need a lexer, so write a lexer. Commented Feb 21, 2017 at 14:53
  • this is the sort of thing regular expressions handle best, see this question stackoverflow.com/questions/1851795/… Commented Feb 21, 2017 at 14:53
  • @David sure, now add support for parenthesis. Still convinced regular expresisons are best? Commented Feb 21, 2017 at 14:57

3 Answers 3

1

Here is more general alternative:

string str = "x>10&y>=10";

var result = Regex.Matches(str, @">=|>|&|=").Cast<Match>()
    .Select(m => new { s = m.Value, i = m.Index }).ToList();

Result:

>   1
&   4
>=  6

or a bit shorter if there aren't any other operators in the string:

var d = Regex.Matches(str, @"\W+").Cast<Match>().ToDictionary(m => m.Index, m => m.Value);
Sign up to request clarification or add additional context in comments.

Comments

1

So basically what you want is to scan your sequence for the characters '<', '>', '=' and '&', and if any of them found remember the index and the found character, if '<' or '>' is found you want to know if '=' is after it, and if so, the next search should start after the '='.

Note that you didn't specify what you want with &= or ==.

Whenever you have to scan strings for some syntax, it is always wise to at least consider the use of regular expressions.

According to the specification above you want a regular expression that matches if you find any of the following:

  • '<='
  • '>='
  • '='
  • '&'
  • '<' followed by something else than '='
  • '>' followed by something else than '='

Code would be simple:

using System.Text.RegularExpressions;

string expression = ...;
var regex = new RegularExpression("&|<=|>=|[<>][^=]");
var matches = regex.Matches(expression);

Object matches is an array of Match objects. Every match object has properties Index, Length and Value; exactly the properties you want.

foreach (var match in matches)
{
    Console.WriteLine($"Match {match.Value} found"
        + " at index {match.Index} with length {match.Length}");
}

The vertical bar | in the regular expression means an OR; the [ ] means any of the characters between the brackets,; the [^ ] means NOT any of the characters between the brackets.

So a match is found if either & or <= or >= or any character in <> which is not followed by =.

If you also want to find &= and ==, then your reguilar expression would be even easier:

  • find any <>&= that is followed by =
  • or find any <>&= that is not followed by =

Code:

var regex = new Regex("[<>&=]|[<>&=][^=]");

A good online regex tester where you can check your regular expression can be found here. This shows also which matches are found and a description of the syntax of regular expressions.

Comments

0

Well, if you are bent on using LINQ you could do the following:

public static IEnumerable<(int Index, string Substring)> GetAllIndicees(this string str, IEnumerable<string> subtrings)
{
    IEnumerable<(int Index, string Substring)> GetAllIndicees(string substring)
    {
        if (substring.Length > str.Length)
            return Enumerable.Empty<(int, string)>();

        if (substring.Length == str.Length)
            return Enumerable.Repeat((0, str), 1);

        return from start in Enumerable.Range(0, str.Length - substring.Length + 1)
               where str.Substring(start, substring.Length).Equals(substring)
               select (start, substring);
    }

    var alloperators = subtrings.SelectMany(s => GetAllIndicees(s));
    return alloperators.Where(o => !alloperators.Except(new[] { o })
                                                .Any(other => o.Index >= other.Index &&
                                                              o.Index < other.Index + other.Substring.Length &&
                                                              other.Substring.Contains(o.Substring)));    
}

using c#7 syntax here becuase code is more concise and readable but its easily translatable to previous versions.

And now if you do:

var substr = "x>10&y>=10";
var operators = new HashSet<string>(new[] { ">", "&", ">=", "=" });
Console.WriteLine(string.Join(", ", filteredOperators.Select(o => $"[{o.Operator}: {o.Index}]")));

You'll get the expected result:

[>: 1], [&: 4], [>=: 6]

Is this "better" than using other tools? I'm not so sure.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.