Best way to convert a Mongo query to a Ruby array?

Question

Let's say I have a large query (for the purposes of this exercise say it returns 1M records) in MongoDB, like:

users = Users.where(:last_name => 'Smith')

If I loop through this result, working with each member, with something like:

users.each do |user|
  # Some manipulation to "user"
  # Some calculation for "user"
  ...
  # Saving "user"
end

I'll often get a Mongo cursor timeout (as the database cursor that is reserved exceeds the default timeout length). I know I can extend the cursor timeout, or even turn it off--but this isn't always the most efficient method. So, one way I get around this is to change the code to:

users = Users.where(:last_name => 'Smith')
user_array = []
users.each do |u|
    user_array << u
end

THEN, I can loop through user_array (since it's a Ruby array), doing manipulations and calculations, without worrying about a MongoDB timeout.

This works fine, but there has to be a better way--does anyone have a suggestion?

I'm going to feel really silly if there is. Testing now @tokland — jbnunn
– jbnunn, Commented Jun 18, 2012 at 21:09
Yes, Users.where(:last_name => 'Smith').to_a works. Thank you +1... With Sergio's comment below, I'll be implementing his batch approach plus Ruby's native to_a instead of a manual loop. Thank you — jbnunn
– jbnunn, Commented Jun 18, 2012 at 21:15

Sergio Tulentsev · Accepted Answer · 2012-06-18 21:08:46Z

3

If your result set is so large that it causes cursor timeouts, it's not a good idea to load it entirely to RAM.

A common approach is to process records in batches.

Get 1000 users (sorted by _id).
Process them.
Get another batch of 1000 users where _id is greater than _id of last processed user.
Repeat until done.

answered Jun 18, 2012 at 21:08

Sergio Tulentsev

231k43 gold badges381 silver badges373 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jbnunn Over a year ago

Good approach. And "greater than _id" supports the standard BSON format for _id?

Sergio Tulentsev Over a year ago

Yes, objectids sort just fine.

Anil · Accepted Answer · 2012-06-18 21:11:07Z

0

For a long running task, consider using rails runner.

runner runs Ruby code in the context of Rails non-interactively. For instance:

$ rails runner "Model.long_running_method"

For further details, see:

http://guides.rubyonrails.org/command_line.html

answered Jun 18, 2012 at 21:11

Anil

3,9191 gold badge22 silver badges28 bronze badges

1 Comment

jbnunn Over a year ago

Interesting. I didn't know about rails runner, I'm sure I can find a use for that somewhere, but not sure it will resolve any MongoDB-specific timeouts.

Collectives™ on Stack Overflow

Best way to convert a Mongo query to a Ruby array?

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related