Count matching characters between two strings using LINQ

Question

A friend asked me how to improve some code with LINQ. How would you do a character by character comparison between two strings to count the number of matches at an index? Here's the original code, could it be improved with LINQ?

private int Fitness(string individual, string target)
   {
       int sum = 0;
       for (int i = 0; i < individual.Length; i++)
           if (individual[i] == target[i]) sum++;
       return sum;
   }

What's the meaning of "improved with LINQ"?

Gert Arnold
– Gert Arnold

2022-01-07 18:57:51 +00:00
Commented Jan 7, 2022 at 18:57 — Gert Arnold
– Gert Arnold, Commented Jan 7, 2022 at 18:57

Mehrdad Afshari · Accepted Answer · 2009-08-16 22:18:20Z

7

return Enumerable.Range(0, individual.Length)
                 .Count(i => individual[i] == target[i]);

A more fool-proof way would be (the above snippet will fail if target is shorter than individual):

return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                 .Count(i => individual[i] == target[i]);

I believe the code is correct as is. Enumerable.Range method takes two arguments. The first of which is the start index (should be 0), the second is the count of items. The complete code snippet to test and make sure:

class Program {
  static void Main(string[] args) {
      Console.WriteLine(Fitness("hello", "world"));
  }
  static int Fitness(string individual, string target) {
      return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                       .Count(i => individual[i] == target[i]);
  }
}

edited Aug 16, 2009 at 22:18

answered Aug 16, 2009 at 21:15

Mehrdad Afshari

424k93 gold badges864 silver badges796 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Jon Galloway Over a year ago

Need to do Length-1 since arrays are zero-based

Mehrdad Afshari Over a year ago

The second argument to Range is count, not last item.

Kirk Broadhurst Over a year ago

That's why you need to do Length-1. You will get an "Index out of range exception" when you evaluate individual[individual.Length]. e.g. If the array length is 1 then the only element is foo[0], and foo[1] does not exist.

Mehrdad Afshari Over a year ago

@Kirk: As I said, it's correct since Enumerable.Range expects the count, not length. It never tries to evaluate individual[individual.Length]. Did you actually test the code or just speculating?

Jon Galloway Over a year ago

When I tested, I got index out of range unless I used Length-1

|

Marc Gravell · Accepted Answer · 2009-08-16 21:16:53Z

2

You could write something similar with LINQ, but since "Zip" isn't built in until .NET 4.0 it will be more code than we'd like, and/or not as efficient. I'd be tempted to leave it "as is", but I'd probably check target.Length to avoid an out-of-range exception.

Perhaps I'd make an extension method, though:

public static int CompareFitness(this string individual, string target)
{
   int sum = 0, len = individual.Length < target.Length
        ? individual.Length : target.Length;
   for (int i = 0; i < len; i++)
       if (individual[i] == target[i]) sum++;
   return sum;
}

Then you can use:

string s = "abcd";
int i = s.CompareFitness("adc"); // 2

answered Aug 16, 2009 at 21:16

Marc Gravell

1.1m273 gold badges2.6k silver badges3k bronze badges

5 Comments

Mehrdad Afshari Over a year ago

Agreed on length check. How it's related to Zip though?

Marc Gravell Over a year ago

Zip takes two sequences and brings them together term-by-term...?

Jon Galloway Over a year ago

Agree, Marc, but I'm accepting Mehrdad's answer because I wanted to see how this would be done in LINQ. Voted for your answer, too. Thanks!

Mehrdad Afshari Over a year ago

I understand but how can Zip reduce code size in this case (relative to my answer)? At least for collections that support direct access with an index, it won't make much difference in code size.

Marc Gravell Over a year ago

I don't have the 4.0 version of Zip to hand, but wouldn't it be something like individual.Zip(target).Count(pair=>pair.X==pair.Y); - but importantly we haven't had to mess with the Min (i.e. this replaces your second version)

jamesrom · Accepted Answer · 2015-10-20 02:33:43Z

0

The currently selected answer doesn't handle different length strings well. It only compares a maximum substring of the input strings if the lengths differ.

For example:

Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> 3

This can be easily fixed by subtracting the differing lengths from your score. The improvement:

return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                 .Count(i => individual[i] == target[i])
                 - Math.Abs(individual.Length - target.Length);

Now:

Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> -23

Technically, there are 23 characters difference between those two inputs.

answered Oct 20, 2015 at 2:33

jamesrom

8663 gold badges10 silver badges19 bronze badges

Comments

Jan · Accepted Answer · 2022-01-07 18:46:05Z

I am a little puzzled. The PO asked to improve the solution with Linq. Benchmarking the solutions I received the following result in the table (code see below). Unless I am missing something (in which case I'd appreciate comments) I cannot see that Linq improves the simple solution with the for-loop.

Result: the for loop is a lot faster and needs less memory.

Consequence: I would keep the simple for-loop. The argument of Conciseness that favors Linq does not apply here because the code is hidden away in a method. And the performance is not better but worse.

I like Linq. Tried to work without it, recently, and failed. But it is not always better.

Method	Mean	Error	StdDev	Gen 0	Allocated
RunLinqZip	1,179.16 ns	2.362 ns	1.844 ns	0.1831	1,152 B
RunLinqRange	556.54 ns	1.359 ns	1.135 ns	0.1726	1,088 B
RunForLoop	63.84 ns	0.101 ns	0.094 ns	-	-

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;


namespace Benchmarks
{
    public class Program
    {
        static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<MemoryBenchmarkerDemo>();
        }
    }


    [MemoryDiagnoser]
    public class MemoryBenchmarkerDemo
    {
        public string[] Id = new string[] { "ATTR", "TAL", "SPELL", "LITURGY", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };
        public string[] Att = new string[] { "ATTR_2", "TAL_99", "SPELL_232", "LITURGY_123", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };


        public int FitnessFor(string individual, string target)
        {
            int sum = 0;
            for (int i = 0; i < Math.Min(individual.Length, target.Length); i++)
                if (individual[i] == target[i]) sum++;
            return sum;
        }

        public int FitnessRange(string individual, string target)
        {
            return Enumerable.Range(0, individual.Length)
                     .Count(i => individual[i] == target[i]);
        }

        public int FitnessZip(string individual, string target)
        {
            return individual.Zip(target).Count(pair => pair.Item1 == pair.Item2);
        }



        [Benchmark]
        public void RunLinqZip()
        {
            for(int i = 0; i< Id.Length; i++)
            {
                FitnessZip(Id[i], Att[i]);
            }
        }

        [Benchmark]
        public void RunLinqRange()
        {
            for (int i = 0; i < Id.Length; i++)
            {
                FitnessRange(Id[i], Att[i]);
            }
        }


        [Benchmark]
        public void RunForLoop()
        {
            for (int i = 0; i < Id.Length; i++)
            {
                FitnessFor(Id[i], Att[i]);
            }
        }

    }
}

Well, nobody mentioned performance as point of improvement. What does "improved with LINQ" mean anyway? Nowadays the question would probably get closed as too vague. The check on lengths in the accepted answer is one improvement in robustness, which is probably why it was accepted.

spender · Accepted Answer · 2009-08-16 21:24:11Z

-1

How about a join, or did I misunderstand?

    static void Main(string[] args)
    {
        var s1 = "abcde";
        var s2 = "hycdh";
        var count = s1.Join(s2, c => c, c => c, (a,b) => a).Count();
        Console.WriteLine(count);
        Console.ReadKey();
    }

answered Aug 16, 2009 at 21:24

spender

121k36 gold badges245 silver badges369 bronze badges

2 Comments

Marc Gravell Over a year ago

Now compare "ababa" to "babab" - should be zero, but returns 12 ;-p

spender Over a year ago

Aww. Thought I'd leave it as an example of the diff between zip and join.

Collectives™ on Stack Overflow

Count matching characters between two strings using LINQ

5 Answers 5

7 Comments

5 Comments

Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

7 Comments

5 Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related