3

To compare two List<String> and extract their differences, I use Linq's Except.

i.e.:

Say I want to compare the following two lists for equality using Linq:

List1 = "0,1,2,2,3"
List2 = "0,1,2,3"

List<string> differences1 = List1.Except(List2).ToList();
List<string> differences2 = List2.Except(List1).ToList();

differences1 and differences2 will have no items as 2 exists in both lists, but both lists are NOT equal. I want to be able to extract all differences between the lists, including duplicate information one has that the other does not.

What is the best method of extracting all differences between two List<string> objects?

4
  • Have you tried using Distinct() to remove duplicates and comparing resulting lists? Commented Dec 17, 2013 at 15:24
  • You should really include same input/output. Commented Dec 17, 2013 at 15:35
  • I'm confused after the edit. You say you want "0,1,2,2,3" and "0,1,2,3" to be equal and you want to know that they are different. Huh? Commented Dec 17, 2013 at 15:37
  • @Becuzz No. List1 and List2 SHOULD be equal, which is why I am comparing them for validation. If they are different (including one having duplicate items) then I want to know WHY they are different. I am sorry if that was unclear. Not sure it warranted a downvote, however... Commented Dec 17, 2013 at 15:41

5 Answers 5

5

So what you're looking for is an Except that works on bags, not on sets. So if one sequence has 2 copies of an item and you subtract a set with one copy, there should be one copy left, rather than reducing all sequences into distinct sets before performing the subtraction, as Except does.

This makes it slightly less elegant to handle, but it's still not terrible. Rather than having a HashSet to represent the items in the other set, you simply need to have a dictionary mapping the item to the number of copies. Then for each item, if it's in the dictionary, remove one from the count and don't yield it, and if it isn't in the dictionary then it should be yielded.

public static IEnumerable<T> BagDifference<T>(IEnumerable<T> first
    , IEnumerable<T> second)
{
    var dictionary = second.GroupBy(x => x)
        .ToDictionary(group => group.Key, group => group.Count());

    foreach (var item in first)
    {
        int count;
        if (dictionary.TryGetValue(item, out count))
        {
            if (count - 1 == 0)
                dictionary.Remove(item);
            else
                dictionary[item] = count - 1;
        }
        else
            yield return item;
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

@TestK Your question is still not very clear; you should edit it to clarify it.
@TestK It does not, although it's better than it was before. As I said a while ago, you really should have some example input/output.
0

You could group by the key and then compare the groups using Except()

It would look like this (not tested might have typos):

var groupList1 = List1.GroupBy(x => x).ToList();
var groupList2 = List2.GroupBy(x => x).ToList();

var differences1 = groupList1.Except(groupList2).ToList();
var differences2 = groupList2.Except(groupList1).ToList();

Comments

0

You could call .Distinct() on the lists before comparing them:

List<string> differences1 = List1.Distinct().Except(List2).ToList();
List<string> differences2 = List2.Distinct().Except(List1).ToList();

1 Comment

There's no point in calling Distinct before using Except; Except itself will remove duplicates.
0

You could use Distinct to eliminate the duplicates then do the comparison.

var distinctList1 = List1.Distinct().ToList();
var distinctList2 = List2.Distinct().ToList();

var differences1 = distinctList1.Except(distinctList2).ToList();
var differences2 = distinctList2.Except(distinctList1).ToList();

1 Comment

There's no point in calling Distinct before using Except; Except itself will remove duplicates.
0

You could create duplicates of the list and then remove all that exists in the other:

var diff1 = list1.ToList();
var diff2 = list2.ToList();
diff1.RemoveAll(diff2.Remove);

2 Comments

1) This is mutating both collections, rather than simply determining what the difference is. 2) This will perform really horribly, as searching for and then removing items from lists is not cheap.
Bad performance indeed! But that must not always be an issue. Though a simple solution to get the difference.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.