2

Similar to this question, I'm trying to iterate only distinct values of sub-string of given strings, for example:

List<string> keys = new List<string>()
{
    "foo_boo_1",
    "foo_boo_2,
    "foo_boo_3,
    "boo_boo_1"
}

The output for the selected distinct values should be (select arbitrary the first sub-string's distinct value):

foo_boo_1 (the first one)
boo_boo_1

I've tried to implement this solution using the IEqualityComparer with:

public class MyEqualityComparer : IEqualityComparer<string>
{
    public bool Equals(string x, string y)
    {            
        int xIndex = x.LastIndexOf("_"); 
        int yIndex = y.LastIndexOf("_");
        if (xIndex > 0 && yIndex > 0)
            return x.Substring(0, xIndex) == y.Substring(0, yIndex);
        else
            return false;
    }

    public int GetHashCode(string obj)
    {
        return obj.GetHashCode();
    }
}

foreach (var key in myList.Distinct(new MyEqualityComparer()))
{
    Console.WriteLine(key)    
}

But the resulted output is:

foo_boo_1
foo_boo_2
foo_boo_3
boo_boo_1

Using the IEqualityComparer How do I remove the sub-string distinct values (foo_boo_2 and foo_boo_3)?

*Please note that the "real" keys are a lot longer, something like "1_0_8-B153_GF_6_2", therefore I must use the LastIndexOf.

0

3 Answers 3

1

Your current implementation has some flaws:

  1. Both Equals and GetHashCode must never throw exception (you have to check for null)
  2. If Equals returns true for x and y then GetHashCode(x) == GetHashCode(y). Counter example is "abc_1" and "abc_2".

The 2nd error can well cause Distinct return incorrect results (Distinct first compute hash).

Correct code can be something like this

public class MyEqualityComparer : IEqualityComparer<string> {
  public bool Equals(string x, string y) {            
    if (ReferenceEquals(x, y))
      return true;
    else if ((null == x) || (null == y))
      return false;

    int xIndex = x.LastIndexOf('_'); 
    int yIndex = y.LastIndexOf('_');

    if (xIndex >= 0)         
      return (yIndex >= 0) 
        ? x.Substring(0, xIndex) == y.Substring(0, yIndex)
        : false;
    else if (yIndex >= 0)         
      return false;
    else
      return x == y; 
  }

  public int GetHashCode(string obj) {
    if (null == obj)  
      return 0;

    int index = obj.LastIndexOf('_');

    return index < 0 
      ? obj.GetHashCode() 
      : obj.Substring(0, index).GetHashCode();
  }
}

Now you are ready to use it with Distinct:

   foreach (var key in myList.Distinct(new MyEqualityComparer())) {
     Console.WriteLine(key)    
   }
Sign up to request clarification or add additional context in comments.

2 Comments

Hey @dmitry, great answer. can you please explain why the breakpoint does not break inside the Equals(...)?
@Shahar Shokrani: Distinct compares x and y in 2 stages: 1st it compares hashes; if hash codes are different there's not need to call Equals and only if hash codes are equal it runs Equals. Our hashes are good, and it can well appear we don't need Equals at all
1

Your GetHashCode method in your equality comparer is returning the hash code for the entire string, just make it hash the substring, for example:

public int GetHashCode(string obj)
{
    var index = obj.LastIndexOf("_");
    return obj.Substring(0, index).GetHashCode();
}

5 Comments

Hey @DavidG, thanks, What about the Equals(...)? can I leave it as NotImplementedException?
The Equals method won't even be called because all your strings are different. Make this change and it will be.
When the Equals get called?
You know, you can debug this code and work that out for yourself right?
I've already tried to put a breakpoint inside the Equals(...) but it won't break. when I make it as NotImplementedException the program crash. It seem to work only when the Equals(...) refer the GetHashCode(...), thanks.
1

For a more succinct solution that avoids using a custom IEqualityComparer<>, you could utilise GroupBy. For example:

var keys = new List<string>()
{
    "foo_boo_1",
    "foo_boo_2",
    "foo_boo_3",
    "boo_boo_1"
};

var distinct = keys
    .Select(k => new
    {
        original = k,
        truncated = k.Contains("_") ? k.Substring(0, k.LastIndexOf("_")) : k
    })
    .GroupBy(k => k.truncated)
    .Select(g => g.First().original);

This outputs:

foo_boo_1

boo_boo_1

1 Comment

Nice but what about index is -1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.