2

I want to remove duplicates from a list of strings. I do this by using distinct, but i want to ignore the first char when comparing.

I already have a working code that deletes the duplicates, but my code also delete the first char of every string.

List<string> mylist = new List<string>();

List<string> newlist = 
  mylist.Select(e => e.Substring(1, e.Length - 1)).Distinct().ToList();

Input: "1A","1B","2A","3C","4D"

Output: "A","B","C","D"

Right Output: "1A","2B","3C","4D" it doesn't matter if "1A" or "2A" will be deleted

I guess I am pretty close but.... any input is highly appreciated!

As always a solution should work as fast as possible ;)

3
  • 1
    that code doesn't delete duplicates, it selects non-recurring sequences Commented Aug 21, 2014 at 8:06
  • If you want the fastest solution possible, we need to know how long the string lists will be. The fastest solution for lists of less than, say, 10 elements is likely to be different from the fastest solution for lists of a million elements. Commented Aug 21, 2014 at 8:22
  • the list will have 7 elements, between 2-3 char Commented Aug 21, 2014 at 8:25

4 Answers 4

5

You can implement an IEqualityComparer<string> that will compare your strings by ignoring the first letter. Then pass it to Distinct method.

myList.Distinct(new MyComparer());

There is also an example on MSDN that shows you how to implement and use a custom comparer with Distinct.

Sign up to request clarification or add additional context in comments.

3 Comments

I guess this solution is way to slow, as i have to do this a billion times. I am searching for a one liner ;)
so you think one liner solutions are always faster ?
@user3868224: how slow it is depends on the way you write the GetHashCode. If it's implemented as return str.Substring(1).GetHashCode() it's pretty efficient (you should take care of strings which are shorter than 2).
4

You can GroupBy all but the first character and take the first of every group:

List<string> result= mylist.GroupBy(s => s.Length < 2 ? s : s.Substring(1))
                           .Select(g => g.First())
                           .ToList();

Result:

Console.Write(string.Join(",", result)); // 1A,1B,3C,4D

it doesn't matter if "1A" or "2A" will be deleted

If you change your mind you have to replace g.First() with the new logic.

However, if performance really matters and it is never important which duplicate you want to delete you should prefer Selman's approach which suggests to write a custom IEqualityComparer<string>. That will be more efficient than my GroupBy approach if it's GetHashCode is implemented like:

return (s.Length < 2 ? s : s.Substring(1)).GetHashCode();

2 Comments

A question of curiosity, how do you do the link to Selman's answer? (I know the "hard way" going via his profile, but I get the impression that you used a smart trick?)
@flindeberg: There is a "share"-link below every answer/question.
1

I'm going to suggest a simple extension that you can reuse in similar situations

public static IEnumerable<T> DistinctBy<T, U>(this IEnumerable<T> This, Func<T, U> keySelector)
{
    var set = new HashSet<U>();
    foreach (var item in This)
    {
        if (set.Add(keySelector(item)))
            yield return item;
    }
}

This is basically how Distinct is implemented in Linq.

Usage:

List<string> newlist = 
  mylist.DistinctBy(e => e.Substring(1, e.Length - 1)).ToList();

Comments

0

I realise the answer has already been given, but since I was working on this answer anyway I'm still going to post it, in case it's any use.

If you really want the fastest solution for large lists, then something like this might be optimal. You would need to do some accurate timings to be sure, though!

This approach does not make any additional string copies when comparing or computing the hash codes:

using System;
using System.Collections.Generic;
using System.Linq;

namespace Demo
{
    internal static class Program
    {
        static void Main()
        {
            var myList = new List<string>
            {
                "1A",
                "1B",
                "2A",
                "3C",
                "4D"
            };

            var newList = myList.Distinct(new MyComparer());
            Console.WriteLine(string.Join("\n", newList));
        }

        sealed class MyComparer: IEqualityComparer<string>
        {
            public bool Equals(string x, string y)
            {
                if (x.Length != y.Length)
                    return false;

                if (x.Length == 0)
                    return true;

                return (string.Compare(x, 1, y, 1, x.Length) == 0);
            }

            public int GetHashCode(string s)
            {
                if (s.Length <= 1)
                    return 0;

                int result = 17;

                unchecked
                {
                    bool first = true;

                    foreach (char c in s)
                    {
                        if (first)
                            first = false;
                        else
                            result = result*23 + c;
                    }
                }

                return result;
            }
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.