C# - Linq optimize code with List and Where clause

Question

I have a following code:

        var tempResults = new Dictionary<Record, List<Record>>();            
        errors = new List<Record>();
        foreach (Record record in diag)
        {
            var code = Convert.ToInt16(Regex.Split(record.Line, @"\s{1,}")[4], 16);                
            var cond = codes.Where(x => x.Value == code && x.Active).FirstOrDefault();
            if (cond == null)
            {
                errors.Add(record);
                continue;
            }

            var min = record.Datetime.AddSeconds(downDiff);
            var max = record.Datetime.AddSeconds(upDiff);

            //PROBLEM PART - It takes around 4,5ms
            var possibleResults = cas.Where(x => x.Datetime >= min && x.Datetime <= max).ToList();

            if (possibleResults.Count == 0)
                errors.Add(record);
            else
            {                    
                if (!CompareCond(record, possibleResults, cond, ref tempResults, false))
                {                        
                        errors.Add(record);
                }
            }
        }

variable diag is List of Record

variable cas is List of Record with around 50k items.

The problem is that it's too slowly. The part with the first where clause needs around 4,6599ms, e.g. for 3000 records in List diag it makes 3000*4,6599 = 14 seconds. Is there any option to optimize the code?

You could prefilter codes outside the forEach loop to only contain items that are active. This would reduce the number you'd have to search through inside your loop. — Marco
– Marco, Commented May 4, 2018 at 8:56
Yes, that's true, thank you for the notice. I have already done it. Now, it is a bit better but the main problem is still the where clause with time. — Filip Procházka
– Filip Procházka, Commented May 4, 2018 at 9:19

Evk · Accepted Answer · 2018-05-04 12:37:27Z

7

You can speed up that specific statement you emphasized

cas.Where(x => x.Datetime >= min && x.Datetime <= max).ToList();

With binary search over cas list. First pre-sort cas by Datetime:

cas.Sort((a,b) => a.Datetime.CompareTo(b.Datetime));

Then create comparer for Record which will compare only Datetime properties (implementation assumes there are no null records in the list):

private class RecordDateComparer : IComparer<Record> {
    public int Compare(Record x, Record y) {
        return x.Datetime.CompareTo(y.Datetime);
    }
}

Then you can translate your Where clause like this:

var index = cas.BinarySearch(new Record { Datetime = min }, new RecordDateComparer());
if (index < 0)
    index = ~index;
var possibleResults = new List<Record>();    
// go backwards, for duplicates            
for (int i = index - 1; i >= 0; i--) {
    var res = cas[i];
    if (res.Datetime <= max && res.Datetime >= min)
        possibleResults.Add(res);
    else break;
}
// go forward until item bigger than max is found
for (int i = index; i < cas.Count; i++) {
    var res = cas[i];
    if (res.Datetime <= max &&res.Datetime >= min)
        possibleResults.Add(res);
    else break;
}

Idea is to find first record with Datetime equal or greater to your min, with BinarySearch. If exact match is found - it returns index of matched element. If not found - it returns negative value, which can be translated to the index of first element greater than target with ~index operation.

When we found that element, we can just go forward the list and grab items until we find item with Datetime greater than max (because list is sorted). We need to go a little backwards also, because if there are duplicates - binary search will not necessary return the first one, so we need to go backwards for potential duplicates.

Additional improvements might include:

Putting active codes in a Dictionary (keyed by Value) outside of for loop, and thus replacing codes Where search with Dictionary.ContainsKey.
As suggested in comments by @Digitalsa1nt - parallelize foreach loop, using Parallel.For, PLINQ, or any similar techniques. It's a perfect case for parallelization, because loop contains only CPU bound work. You need to make a little adjustments to make it thread-safe of course, such as using thread-safe collection for errors (or locking around adding to it).

edited May 4, 2018 at 12:37

answered May 4, 2018 at 9:27

Evk

102k8 gold badges156 silver badges204 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Filip Procházka Over a year ago

Thank you! It's great. 4,5 ms was decreased to around 0,5ms and it's much better. This is the first time I have heard of BinarySearch, good to know it. :)

Evk Over a year ago

@FilipProcházka so that's fast enough or you want even faster? :)

Filip Procházka Over a year ago

It's enough. :) But if there is another option to optimize it and you have time to describe/explain how to do it I like to learn them. :) Or just tell me keywords what I should search for and I'll try to learn/do them myself.

Evk Over a year ago

@FilipProcházka I have a feeling it can be faster, but cannot provide more guidance with just code, without the goal of that code. But you can move active codes out of the loop and put into HashSet. If you have a lot of codes - it might speed up things a bit. But otherwise I have no more ideas.

Evk Over a year ago

@Digitalsa1nt yes actually this is a perfect case for parallization. Almost no mutable shared state, CPU bound work.

|

Fenrir · Accepted Answer · 2018-05-04 08:41:15Z

1

Try adding AsNoTracking in the list

The AsNoTracking method can save both execution times and memory usage. Applying this option really becomes important when we retrieve a large amount of data from the database.

var possibleResults = cas.Where(x => x.Datetime >= min && x.Datetime <= max).AsNoTracking().ToList(); //around 4,6599ms

answered May 4, 2018 at 8:41

Fenrir

1381 silver badge11 bronze badges

1 Comment

Filip Procházka Over a year ago

Unfortunately, It has no effect.

RichardMc · Accepted Answer · 2018-05-04 09:27:01Z

There a few improvements you can make here. It might only be a minor performance increase but you should try using groupby instead of where in this circumstance.

So instead you should have something like this:

cas.GroupBy(x => x.DateTime >= min && x.DateTime <= max).Select(h => h.Key == true);

This ussually works for seaching through lists for distinct values, but in you case I'm unsure if it will provide you any benefit when using a clause.

Also a few other things you can do throughout you code:

Avoid using ToList when possible and stick to IEnumerable. ToList performs an eager evaluation which is probably causing a lot of slowdown in your query.
use .Any() instead of Count when checking if values exist (This only applies if the list is IEnumerable)

Collectives™ on Stack Overflow

C# - Linq optimize code with List and Where clause

3 Answers 3

8 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related