1

I'm having a strange ActiveRecord behavior (possibly in conjunction with MySQL server) There is a huge table (hundreds millions entries) If I make this simple call:

SearchResult.where(id: ids[0..15000]).select('uid').to_a

...it will take less than 1 second.

Now if I make this call:

SearchResult.where(id: ids[0..16000]).select('uid').to_a

...it may take minutes!

While at the same time if I run the following two queries to get all 16k entries - it runs smoothly within 1 second total:

SearchResult.where(id: ids[0..15000]).select('uid').to_a +
  SearchResult.where(id: ids[15001..16000]).select('uid').to_a

Moreover, if I call .to_sql on the long-running command and execute it directly via mysql command or MySQL Workbench - it takes even less than a second.

I ran ANALYZE TABLE and then tried EXPLAIN both directly and through ActiveRecord and they are identical. Both use ALL scan though (full table scan) The only difference is that it executes well directly, but hangs when executed through ActiveRecord. Now the fun thing - if I force use PRIMARY index - it starts working well through ActiveRecord, but that's not a solution because I can't use it with ActiveRecord associations preloading.

This weird behavior really stymies me because I'm using a lot of ActiveRecord associations preloading (eg. .includes(searches: :search_results)) which sometimes ends up requesting dozens of thousands records (yes, I do really need all of them), and such query stops execution for a good few minutes. I would just go with forced index use, but it does not resolve the problem in other places, creates more problems in others, and I won't be able to use ActiveRecord preloading then. Any ideas?

1
  • what is the ids array comprised of? It may make a lot more sense to rethink your query pattern to use a subquery or a range. An Array value will create an IN clause, a Range value will create a BETWEEN clause, and using a different ActiveRecord::Relation will create a subquery. Commented Mar 5, 2019 at 21:21

1 Answer 1

1

Maybe it's not the SQL but the instantiation of SearchResult instances.
If you are only interested in the uid then try

SearchResult.where(id: ids[0..16000]).pluck(:uid)

this returns an array of uids, no an array of SearchResults.

Sign up to request clarification or add additional context in comments.

7 Comments

@neolancer This is what I would have said too. There is no other reason why it would take time - MySQL (or any other DB) does not care where it receives requests from (that is a connection level detail and the execution engine is normally lot deeper). So the instantiation must be the reason.
Also please note that ids[0..16000] will create a) a new Array with potentially up to 16_000 elements b) an IN clause a mile long (e.g. IN(1,2,3,4,5,...,16000)) which is also horrific and may be worth rethinking the query pattern.
agree, but that was not my point. I may have given a wrong example. pluck produces same performance problem. Even if I execute bare SQL via ActiveRecord::Base.connection.execute("select uid from search_results where id in (...)") - it is still slow for 16k ids and fast for 15k ids. Seems like at some point there are too many ids or what
@engineersmnky well, that's how ActiveRecord works when preloading associations. And I believe MySQL should have no problem processing long queries. I do agree that produces large query though. Given enough max_allowed_packet option on the server, that should not be a problem, unless I'm missing something else
@neolancer there are other ways was my point as indicated in my comment under your question. IN will generally be slower than a know range using between or using a subquery. Although I have to agree with this answer that in this case the performance degradation is most likely caused by the object creation overhead.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.