0

Let's say I have a large query (for the purposes of this exercise say it returns 1M records) in MongoDB, like:

users = Users.where(:last_name => 'Smith')

If I loop through this result, working with each member, with something like:

users.each do |user|
  # Some manipulation to "user"
  # Some calculation for "user"
  ...
  # Saving "user"
end

I'll often get a Mongo cursor timeout (as the database cursor that is reserved exceeds the default timeout length). I know I can extend the cursor timeout, or even turn it off--but this isn't always the most efficient method. So, one way I get around this is to change the code to:

users = Users.where(:last_name => 'Smith')
user_array = []
users.each do |u|
    user_array << u
end

THEN, I can loop through user_array (since it's a Ruby array), doing manipulations and calculations, without worrying about a MongoDB timeout.

This works fine, but there has to be a better way--does anyone have a suggestion?

2
  • I'm going to feel really silly if there is. Testing now @tokland Commented Jun 18, 2012 at 21:09
  • Yes, Users.where(:last_name => 'Smith').to_a works. Thank you +1... With Sergio's comment below, I'll be implementing his batch approach plus Ruby's native to_a instead of a manual loop. Thank you Commented Jun 18, 2012 at 21:15

2 Answers 2

3

If your result set is so large that it causes cursor timeouts, it's not a good idea to load it entirely to RAM.

A common approach is to process records in batches.

  1. Get 1000 users (sorted by _id).
  2. Process them.
  3. Get another batch of 1000 users where _id is greater than _id of last processed user.
  4. Repeat until done.
Sign up to request clarification or add additional context in comments.

2 Comments

Good approach. And "greater than _id" supports the standard BSON format for _id?
Yes, objectids sort just fine.
0

For a long running task, consider using rails runner.

runner runs Ruby code in the context of Rails non-interactively. For instance:

$ rails runner "Model.long_running_method"

For further details, see:

http://guides.rubyonrails.org/command_line.html

1 Comment

Interesting. I didn't know about rails runner, I'm sure I can find a use for that somewhere, but not sure it will resolve any MongoDB-specific timeouts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.