C# Linq help improve performance?

Question

Excuse my pseudo code below. I'm pretty sure there is a magical way to write this in a single linq statement that will also dramatically improve the performance. Here I have a list of millions of records in AList. The id may not be unique. What I'm after is the original list removing all duplicates (based on the id), but always grabbing the record with the earliest date. mystring is almost always a different value when there is a duplicate id.

public class A
{
    public string id { get; set; }
    public string mystring { get; set; }
    public DateTime mydate { get; set; }
}

List<A> aListNew = new List<A>();
foreach (var v in AList)
{
    var first = AList.Where(d => d.id == v.id).OrderBy(d => d.mydate).First();

    // If not already added, then we add
    if (!aListNew.Where(t => t.id == first.id).Any())
        aListNew.Add(first);
}

Reed Copsey · Accepted Answer · 2013-10-03 19:05:40Z

5

You could use grouping directly to accomplish this in one LINQ statement:

List<A> aListNew = AList
                   .GroupBy(d => d.id)
                   .Select(g => g.OrderBy(i => i.mydate).First())
                   .ToList();

answered Oct 3, 2013 at 19:05

Reed Copsey

567k80 gold badges1.2k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Reacher Gilt Over a year ago

I'm not sure I see where you're checking aListNew for membership, or how you're adding the new element to aListNew.

Reed Copsey Over a year ago

@ReacherGilt I'm doing it by grouping on the original list instead. I'm using GroupBy to get the items by ID up front, and pulling out the right one, then converting the results to a list. There's no need for that check with this implementation.

Reed Copsey Over a year ago

@ReacherGilt This just makes the entire operation far more efficient and easier to follow.

Reacher Gilt Over a year ago

Ah, I'd assumed that aListNew could have other members in it prior to this statement. That's clearly not possible from the question's sample.

downatone Over a year ago

@ReedCopsey - good stuff. This takes seconds to run (as opposed to 15 min!). Thanks to the others for the Dictionary suggestions as well, I can live with the current execution speed and find the code easy to read.

D Stanley · Accepted Answer · 2013-10-03 19:09:41Z

4

The fastest is probably going to be a straight foreach loop with a dictionary:

Dictionary<int, A> lookup = Dictionary<int, A>();

foreach (var v in AList)
{
    if(!lookup.ContainsKey(v.id))
        // add it
        lookup[id] = v;
    else if (lookup[id].mydate > v.mydate)
        // replace it
        lookup[id] = v;    
}

// convert to list
List<A> aListNew = lookup.Values.ToList();

A Linq GroupBy / First() query might be comparable if there are few collisions, but either one is going to be O(N) since it has to traverse the whole list.

answered Oct 3, 2013 at 19:09

D Stanley

153k12 gold badges189 silver badges257 bronze badges

Comments

tia · Accepted Answer · 2013-10-03 19:19:59Z

0

This should be easiest. No LINQ involved anyway.

var lookup = Dictionary<int, A>();
foreach(var a in aListNew.OrderByDescending(d => d.mydate)) {
    lookup[a.id] = a;
} 
var result = lookup.Values.ToList();

Note that sub-LINQ will hurt performance, and that's why I choose not to use it. Remember that LINQ is there to make your task easier, not to make the execution faster.

answered Oct 3, 2013 at 19:19

tia

9,7481 gold badge33 silver badges47 bronze badges

Collectives™ on Stack Overflow

C# Linq help improve performance?

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related