0

Problem: I have millions of rows from database to be processed.

I need to implement a method that will return a "stream"(?) of database rows. I don't want to load all of them into memory at once.

I was thinking about returning a lazy IEnumerable<Record> and use yield. The method would handle loading consecutive records using SqlDataReader.

But what will happen when a client will call .Count() on my IEnumerable? Counting all the records would mean a need to fetch them all.

Is there any good modern way to return a stream of objects not storing all of them in memory, just process one by one? My method should return a stream of records.

It seems like Reactive Extensions might solve the problem for me but I have never used it.

Any ideas?

Thanks

2 Answers 2

3

First, why reinvent the wheel? Entity Framework makes it easier to do stuff like this and adds all of the abstraction for you. The DbSet<TEntity> on the DbContext object implements IQueryable<TEntity> and IEnumerable<T> so you can:

  • Execute a Count() (with and without a lamda filter argument) with an extension method when you need to figure out the number of records (or some other aggregate function)
  • You can loop through them as an IEnumerable which opens a connection and reads 1 record at a time each time the MoveNext method is called from the connection.
  • If you do want to load everything in memory at once (I understand you don't based on your description) you could call extension method ToList or ToArray.

If you insist on using ADO.NET and manually doing this (I understand with legacy code there is not always a choice to use EF) then opening a data reader from the connection is the best approach. This will read in each next record with each corresponding call to method Read() and this is the least expensive way to go about doing a read of records in the DB.

If you want a Count then I suggest you write a new sql query which returns a count executed on your Database Server using Sql akin to

SELECT COUNT(field) FROM table 

as this is best practice. Do not iterate and sum up all of your records from a reader with some custom work around to execute a sum in memory, that would be waste of resources not to mention creating complex code with no benefit.

Sign up to request clarification or add additional context in comments.

2 Comments

I could be wrong, but I have read that it was only in EF 6 that they have finally added the ability to open an underlying datareader by default. I tested EF 5 against a datareader and the datareader gave about a 33% performance gain when pulling a page of 50 k rows out of a table of 1 million. EF 6 is almost exactly the same amount of time as the datareader.
@NightOwl888 - Interesting. We still have a legacy app that uses EF5 but other than that I do not do much with this version. I tried looking but all I could find was that initial load time of the framework is improved between versions. If you find a link I would be interested to learn more about it. ?It might also be due to more efficient query being generated by EF6?
0

For count query the db, and return to the user.

On the other hand, you need to implement count only for ICollection, IEnumerable does not require that. just return IEnumerable for iteration on the records.

just notice that you handle correctly the connection to the db.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.