7

I have a linq statement like this:

var records = from line in myfile 
              let data = line.Split(',')
              select new { a=int.Parse(data[0]), b=int.Parse(data[1]) };
var average = records.Sum(r => r.b)!=0?records.Sum(r => r.a) / records.Sum(r => r.b):0;

My question is: How many times records.Sum(r => r.b) is computed in the last line? Does LINQ loop over all the records each time when it needs to compute a sum (in this case, 3 Sum() so loop 3 times)? Or does it smartly loop over all the records just once andcompute all the sums?


Edit 1:

  1. I wonder if there is any way to improve it by only going through all the records just once (as we only need to do it in a single loop when use a plain for loop)?

  2. And there is really no need to load everything into memory before we can do the sum and average. Surely we can sum each element while loading it from the file. Is there any way to reduce the memory consumption as well?


Edit 2

Just to clarify a bit, I didn't use LINQ before I ended up like above. Using plain while/for loop can achieve all the performance requirements. But I then tried to improve the readability and also reduce the lines of code by using LINQ. It seems that we can't get both at the same time.

2
  • 1
    @AndyWiesendanger, the query doesn't seem to be executed against a database... Commented Dec 30, 2015 at 14:48
  • 2
    It will execute three times (or once, if the sum == 0), regardless of whether it's against a database or not. Commented Dec 30, 2015 at 14:49

6 Answers 6

9

Twice, write it like this and it will be once:

var sum = records.Sum(r => r.b);

var avarage = sum != 0 ? records.Sum(r => r.a)/sum: 0;
Sign up to request clarification or add additional context in comments.

2 Comments

Your are iterating twice, not once. Use Aggregate.
Hey, twice is still better than three times, amirite?
6

There are plenty of answers, but none that wrap all of your questions up.

How many times records.Sum(r => r.b) is computed in the last line?

Three times.

Does LINQ loop over all the records each time when it needs to compute a sum (in this case, 3 Sum() so loop 3 times)?

Yes.

Or does it smartly loop over all the records just once andcompute all the sums?

No.

I wonder if there is any way to improve it by only going through all the records just once (as we only need to do it in a single loop when use a plain for loop)?

You can do that, but it requires you to eagerly-load all the data which contradicts your next question.

And there is really no need to load everything into memory before we can do the sum and average. Surely we can sum each element while loading it from the file. Is there any way to reduce the memory consumption as well?

That's correct. In your original post you have a variable called myFile and you're iterating over it and putting it into a local variable called line (read: basically a foreach). Since you didn't show how you got your myFile data, I'm assuming that you're eagerly loading all the data.

Here's a quick example of lazy-loading your data:

public IEnumerable<string> GetData()
{
    using (var fileStream = File.OpenRead(@"C:\Temp\MyData.txt"))
    {
        using (var streamReader = new StreamReader(fileStream))
        {
            string line;
            while ((line = streamReader.ReadLine()) != null)
            {                       
                yield return line;
            }
        }
    }
}

public void CalculateSumAndAverage()
{
    var sumA = 0;
    var sumB = 0;
    var average = 0;

    foreach (var line in GetData())
    {
        var split = line.Split(',');
        var a = Convert.ToInt32(split[0]);
        var b = Convert.ToInt32(split[1]);

        sumA += a;
        sumB += b;
    }

    // I'm not a big fan of ternary operators,
    // but feel free to convert this if you so desire.
    if (sumB != 0)
    {
        average = sumA / sumB;
    }
    else 
    {
        // This else clause is redundant, but I converted it from a ternary operator.
        average = 0;
    }
}

9 Comments

@james Purely a business logic error rather than a syntactical error. However, I will correct it.
Thanks, @Cameron. this is actually what I tried in the first place.. I then tried to improve the readability and also reduce the lines of code by using LINQ. It seems that we can't get both at the same time.
@james I'd implore you to embrace simplicity over the reduction of lines of code. (And personally, I think this is more readable.) Just because you can wrap things into 3 lines of Linq doesn't always make it the best solution.
agreed. I just feel that this should be a common problem and given that LINQ is such a mature tool, that's why I expect the performance improvements can be done using linq as well. I think Pedro Mora 's solution is very close.
I'll accept this answer as this answers most of my questions. For people who are interested for a linq solution while still keep the performance the same, please refer to Pedro Mora 's solution .
|
4

Three times, and what you should use here is Aggregate, not Sum.

// do your original selection
var records = from line in myfile 
              let data = line.Split(',')
              select new { a=int.Parse(data[0]), b=int.Parse(data[1]) };
// aggregate them into one record
var sumRec = records.Aggregate((runningSum, next) =>
          { 
            runningSum.a += next.a;
            runningSum.b += next.b;                
            return runningSum;
          });
// Calculate your average
var average = sumRec.b != 0 ? sumRec.a / sumRec.b : 0;

6 Comments

Thanks, @flindeberg. Isn't runningSum read only? How can it be assigned?
@james is correct. Properties of anonymous types are read-only. You'd either have to use local variables or create a strong type. Otherwise, this is the best answer.
This solution does look quite efficient, but I wonder if it loads everything into records before doing Aggregate? i.e. there is really no need to load all the records into memory before we can do the sum and average.
@james It depends... Refer to my answer.
@Cameron Totally missed the read-only effect of anonymous types, freehanded the answer =/
|
2

Each call to the Sum method iterate through all lines in myfile. To improve performance write:

var records = (from line in myfile 
          let data = line.Split(',')
          select new { a=int.Parse(data[0]), b=int.Parse(data[1]) }).ToList();

so it would create the list with all elements (with "a" and "b" properties) and then each call to the Sum method will iterate through this list without splitting and parsing data. Of course you can go further and remember the result of the Sum method in some temporary variable.

3 Comments

"To improve performance write:" - If this is executing against a database this will vastly decrease performance. Depending on the implementation of the enumerable, it may be slower while entirely in memory, too. Adding a ToList means you load all the rows into memory, when you only care about the Sum. And further, I would assume that Sum is more efficient when running in the database than it is against a List<>
But this query looks like it is against file! So it will improve performance as there is no possibility to invoke T-SQL query against file content.
You're absolutely right - my bad, I've removed the downvote. This will still evaluate the Sum twice, but it's definitely an improvement from the original.
1

james, I'm not an expert at all nut this is my idea. I think it may be reduced to 1. Maybe there is a little bit more code. records is still an IEnumerable of AnonymousType {int a,int b}.

*Dynamic was a quick way to solve it. You should write an structure for it.

int sum_a = 0,sum_b = 0;
Func<string[], dynamic> b = (string[] data) => { 
    sum_a += int.Parse(data[0]); 
    sum_b += int.Parse(data[1]);
    return new {a = int.Parse(data[0]),b = int.Parse(data[0]) }; 
};
var records = from line in fileLines 
              let data = line.Split(',')
              let result = b(data)
              select new { a = (int)result.a, b = (int)result.b };
var average = sum_b != 0 ? sum_a / sum_b : 0;

For other structures it's simple.

public struct Int_Int //May be a class or interface for mapping
{
    public int a = 0, b = 0;        
}

Then

int sum_a = 0,sum_b = 0;    
Func<string[], Int_Int> b = (string[] data) => { 
    sum_a += int.Parse(data[0]); 
    sum_b += int.Parse(data[1]);
    return new Int_Int() { a = int.Parse(data[0]), b = int.Parse(data[0]) }; 
};
var records = from line in fileLines
              let data = line.Split(',')
              select b(data);
var average = sum_b != 0 ? sum_a / sum_b : 0;

2 Comments

many thanks, @Pedro. I have up voted your answer although it seems that not so many other people up voted it. I think this is the most efficient solution so far, compared to all the other answers. It does not require everything loaded into memory before doing the sum and it only loop over all the records once!
I also recommend you to create write the lambda as a function. It doesn't require to create it every time have to do this job, mor better yet, a private function inside the class
0

SUM gets all records any time that you call it, I recommend you use ToList() --> Do you ToList()?

var records = from line in myfile 
              let data = line.Split(',')
              select new { a=int.Parse(data[0]), b=int.Parse(data[1]) }.ToList();

var sumb = records.Sum(r => r.b);
var average = sumb !=0?records.Sum(r => r.a) / sumb :0;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.