4

I'm trying to fetch every record from a table in a MS SQL database with about 20 million entries via an entity data model. My initial idea was to retrieve the data in chunks, like so:

public IEnumerable<IEnumerable<device>> GetDevicesInChunks(int chunkSize)
{
    using (var db = new AccountsEntities())
    {
        for (int i = 0; i < db.devices.Count(); i += chunkSize)
        {
            yield return db.devices.Skip(i).Take(chunkSize);
        }
    }
}

However, it appears that I must call OrderBy before I call Skip, judging by the exception that is thrown when I employ the above method

The method 'Skip' is only supported for sorted input in LINQ to Entities. The method 
'OrderBy' must be called before the method 'Skip'.

I'm sure calling OrderBy on every subset of records I retrieve will be costly since the devices are in no particular order - I feel like I'm walking down the wrong path here.

What's the best approach to handling large SQL queries via LINQ?

7
  • Can you not use Where and filter by a primary key instead of Skip? Or simply OrderBy the primary key? Commented Mar 6, 2013 at 23:53
  • Is there any particular reason why you must do this in chunks? Commented Mar 6, 2013 at 23:56
  • 2
    @Tory I think that loading 20 million entities at once would be reason enough.. Commented Mar 7, 2013 at 0:04
  • What kind of data is it? Why do you need to load all of it? Commented Mar 7, 2013 at 0:05
  • 1
    OrderBy the clustered index, if one is present on the table; if not, OrderBy the Primary Key ... if you must OrderBy. Seems strange that it's required. Commented Mar 7, 2013 at 0:15

2 Answers 2

4

The error happens because the method Skip needs run after the OrderBy. You cannot run the Skip without the OrderBy. The method Skip needs to know the first one to take, and if you put what is the first that needs to know the order of the select to know if the first is that number from beginning to end or end to the beginning.

You can read more here

So, your code looks like this:

public IEnumerable<IEnumerable<device>> GetDevicesInChunks(int chunkSize)
{
    using (var db = new AccountsEntities())
    {
        for (int i = 0; i < db.devices.Count(); i += chunkSize)
        {
            yield return db.devices.OrderByDescending(y => y).Skip(i).Take(chunkSize);
        }
    }
}

if you think that was a heavy query, remeber Entity Framework can do a cache of query and data. If you don't like the sql of that method you can run the query manually.

A personal experiencie: I use that with a database with 2 bi of lines and... it was not slow. But I have index in my table and I use always the cache.

For more: You can use procedures, if you prefer. See more here

Sign up to request clarification or add additional context in comments.

1 Comment

Very comprehensive answer, thanks for all the info and explanation! Just to make sure I'm understanding correctly, the OrderByDescending expression should be y => y.id not y => y, right?
-1

You may not have to do an OrderBy - you simply may need to populate the list before doing Skip and Take. I think you can use a .Count() to populate the query, after which you can use Skip and Take.

2 Comments

That will be very heavy, using Count or ToList as you said will populate the query, but will retrieve the 20 million of records.
True ... but it won't push the entire set out of the server just to do a count - it'll populate it in MSSQL.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.