5

A friend asked me how to improve some code with LINQ. How would you do a character by character comparison between two strings to count the number of matches at an index? Here's the original code, could it be improved with LINQ?

private int Fitness(string individual, string target)
   {
       int sum = 0;
       for (int i = 0; i < individual.Length; i++)
           if (individual[i] == target[i]) sum++;
       return sum;
   }
1
  • What's the meaning of "improved with LINQ"? Commented Jan 7, 2022 at 18:57

5 Answers 5

7
return Enumerable.Range(0, individual.Length)
                 .Count(i => individual[i] == target[i]);

A more fool-proof way would be (the above snippet will fail if target is shorter than individual):

return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                 .Count(i => individual[i] == target[i]);

I believe the code is correct as is. Enumerable.Range method takes two arguments. The first of which is the start index (should be 0), the second is the count of items. The complete code snippet to test and make sure:

class Program {
  static void Main(string[] args) {
      Console.WriteLine(Fitness("hello", "world"));
  }
  static int Fitness(string individual, string target) {
      return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                       .Count(i => individual[i] == target[i]);
  }
}
Sign up to request clarification or add additional context in comments.

7 Comments

Need to do Length-1 since arrays are zero-based
The second argument to Range is count, not last item.
That's why you need to do Length-1. You will get an "Index out of range exception" when you evaluate individual[individual.Length]. e.g. If the array length is 1 then the only element is foo[0], and foo[1] does not exist.
@Kirk: As I said, it's correct since Enumerable.Range expects the count, not length. It never tries to evaluate individual[individual.Length]. Did you actually test the code or just speculating?
When I tested, I got index out of range unless I used Length-1
|
2

You could write something similar with LINQ, but since "Zip" isn't built in until .NET 4.0 it will be more code than we'd like, and/or not as efficient. I'd be tempted to leave it "as is", but I'd probably check target.Length to avoid an out-of-range exception.

Perhaps I'd make an extension method, though:

public static int CompareFitness(this string individual, string target)
{
   int sum = 0, len = individual.Length < target.Length
        ? individual.Length : target.Length;
   for (int i = 0; i < len; i++)
       if (individual[i] == target[i]) sum++;
   return sum;
}

Then you can use:

string s = "abcd";
int i = s.CompareFitness("adc"); // 2

5 Comments

Agreed on length check. How it's related to Zip though?
Zip takes two sequences and brings them together term-by-term...?
Agree, Marc, but I'm accepting Mehrdad's answer because I wanted to see how this would be done in LINQ. Voted for your answer, too. Thanks!
I understand but how can Zip reduce code size in this case (relative to my answer)? At least for collections that support direct access with an index, it won't make much difference in code size.
I don't have the 4.0 version of Zip to hand, but wouldn't it be something like individual.Zip(target).Count(pair=>pair.X==pair.Y); - but importantly we haven't had to mess with the Min (i.e. this replaces your second version)
0

The currently selected answer doesn't handle different length strings well. It only compares a maximum substring of the input strings if the lengths differ.

For example:

Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> 3

This can be easily fixed by subtracting the differing lengths from your score. The improvement:

return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                 .Count(i => individual[i] == target[i])
                 - Math.Abs(individual.Length - target.Length);

Now:

Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> -23

Technically, there are 23 characters difference between those two inputs.

Comments

0

I am a little puzzled. The PO asked to improve the solution with Linq. Benchmarking the solutions I received the following result in the table (code see below). Unless I am missing something (in which case I'd appreciate comments) I cannot see that Linq improves the simple solution with the for-loop.

Result: the for loop is a lot faster and needs less memory.

Consequence: I would keep the simple for-loop. The argument of Conciseness that favors Linq does not apply here because the code is hidden away in a method. And the performance is not better but worse.

I like Linq. Tried to work without it, recently, and failed. But it is not always better.

Method Mean Error StdDev Gen 0 Allocated
RunLinqZip 1,179.16 ns 2.362 ns 1.844 ns 0.1831 1,152 B
RunLinqRange 556.54 ns 1.359 ns 1.135 ns 0.1726 1,088 B
RunForLoop 63.84 ns 0.101 ns 0.094 ns - -
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;


namespace Benchmarks
{
    public class Program
    {
        static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<MemoryBenchmarkerDemo>();
        }
    }


    [MemoryDiagnoser]
    public class MemoryBenchmarkerDemo
    {
        public string[] Id = new string[] { "ATTR", "TAL", "SPELL", "LITURGY", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };
        public string[] Att = new string[] { "ATTR_2", "TAL_99", "SPELL_232", "LITURGY_123", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };


        public int FitnessFor(string individual, string target)
        {
            int sum = 0;
            for (int i = 0; i < Math.Min(individual.Length, target.Length); i++)
                if (individual[i] == target[i]) sum++;
            return sum;
        }

        public int FitnessRange(string individual, string target)
        {
            return Enumerable.Range(0, individual.Length)
                     .Count(i => individual[i] == target[i]);
        }

        public int FitnessZip(string individual, string target)
        {
            return individual.Zip(target).Count(pair => pair.Item1 == pair.Item2);
        }



        [Benchmark]
        public void RunLinqZip()
        {
            for(int i = 0; i< Id.Length; i++)
            {
                FitnessZip(Id[i], Att[i]);
            }
        }

        [Benchmark]
        public void RunLinqRange()
        {
            for (int i = 0; i < Id.Length; i++)
            {
                FitnessRange(Id[i], Att[i]);
            }
        }


        [Benchmark]
        public void RunForLoop()
        {
            for (int i = 0; i < Id.Length; i++)
            {
                FitnessFor(Id[i], Att[i]);
            }
        }

    }
}

1 Comment

Well, nobody mentioned performance as point of improvement. What does "improved with LINQ" mean anyway? Nowadays the question would probably get closed as too vague. The check on lengths in the accepted answer is one improvement in robustness, which is probably why it was accepted.
-1

How about a join, or did I misunderstand?

    static void Main(string[] args)
    {
        var s1 = "abcde";
        var s2 = "hycdh";
        var count = s1.Join(s2, c => c, c => c, (a,b) => a).Count();
        Console.WriteLine(count);
        Console.ReadKey();
    }

2 Comments

Now compare "ababa" to "babab" - should be zero, but returns 12 ;-p
Aww. Thought I'd leave it as an example of the diff between zip and join.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.