1

Excuse my pseudo code below. I'm pretty sure there is a magical way to write this in a single linq statement that will also dramatically improve the performance. Here I have a list of millions of records in AList. The id may not be unique. What I'm after is the original list removing all duplicates (based on the id), but always grabbing the record with the earliest date. mystring is almost always a different value when there is a duplicate id.

public class A
{
    public string id { get; set; }
    public string mystring { get; set; }
    public DateTime mydate { get; set; }
}

List<A> aListNew = new List<A>();
foreach (var v in AList)
{
    var first = AList.Where(d => d.id == v.id).OrderBy(d => d.mydate).First();

    // If not already added, then we add
    if (!aListNew.Where(t => t.id == first.id).Any())
        aListNew.Add(first);
}
0

3 Answers 3

5

You could use grouping directly to accomplish this in one LINQ statement:

List<A> aListNew = AList
                   .GroupBy(d => d.id)
                   .Select(g => g.OrderBy(i => i.mydate).First())
                   .ToList();
Sign up to request clarification or add additional context in comments.

5 Comments

I'm not sure I see where you're checking aListNew for membership, or how you're adding the new element to aListNew.
@ReacherGilt I'm doing it by grouping on the original list instead. I'm using GroupBy to get the items by ID up front, and pulling out the right one, then converting the results to a list. There's no need for that check with this implementation.
@ReacherGilt This just makes the entire operation far more efficient and easier to follow.
Ah, I'd assumed that aListNew could have other members in it prior to this statement. That's clearly not possible from the question's sample.
@ReedCopsey - good stuff. This takes seconds to run (as opposed to 15 min!). Thanks to the others for the Dictionary suggestions as well, I can live with the current execution speed and find the code easy to read.
4

The fastest is probably going to be a straight foreach loop with a dictionary:

Dictionary<int, A> lookup = Dictionary<int, A>();

foreach (var v in AList)
{
    if(!lookup.ContainsKey(v.id))
        // add it
        lookup[id] = v;
    else if (lookup[id].mydate > v.mydate)
        // replace it
        lookup[id] = v;    
}

// convert to list
List<A> aListNew = lookup.Values.ToList();

A Linq GroupBy / First() query might be comparable if there are few collisions, but either one is going to be O(N) since it has to traverse the whole list.

Comments

0

This should be easiest. No LINQ involved anyway.

var lookup = Dictionary<int, A>();
foreach(var a in aListNew.OrderByDescending(d => d.mydate)) {
    lookup[a.id] = a;
} 
var result = lookup.Values.ToList();

Note that sub-LINQ will hurt performance, and that's why I choose not to use it. Remember that LINQ is there to make your task easier, not to make the execution faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.