Nested query performance

Question

I have two queries below. The first one has a nested select. The second one makes use of a group by clause.

select
  posts.*,
  (select count(*) from comments where comments.post_id = posts.id and comments.is_approved = 1) as comments_count
from
  posts 

select
  posts.*,
  count(comments.id) comments_count
from
  posts

  left join comments on
     comments.post_id = posts.id 
group by
  posts.*

From my understanding the first query is worse because it has to do a select for each record in posts where as the second query does not.

Is this true or false?

The second won't work at all, you need to join comments to posts. Also, I've never grouped that way so I can't be sure, but even if it is legal syntax, your GROUP BY would be just as effective, and possibly faster, if you just did GROUP BY posts.post_id. Also, once properly written, I would expect the latter to be faster. — Uueerdo
– Uueerdo, Commented Jul 7, 2015 at 17:20
thanks ... sorry I missed out including the left join, edited to include it. — Ryan-Neal Mes
– Ryan-Neal Mes, Commented Jul 7, 2015 at 17:30

Gordon Linoff · Accepted Answer · 2015-07-07 14:46:20Z

1

As with all performance questions, you should test the performance on your system with your data.

However, I would expect the first to perform better, with the right indexes. The right index for:

select p.*,
       (select count(*)
        from comments c
        where c.post_id = p.id and c.is_approved = 1
       ) as comments_count
from posts p

is comments(post_id, is_approved).

MySQL implements a group by by doing a file sort. This version saves a file sort on all the data. My guess is that will be faster than the second method.

As a note: group by posts.* is not valid syntax. I assume this was intended for illustration purposes only.

answered Jul 7, 2015 at 14:46

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AdamMc331 Over a year ago

This is speculation, but it looks to me like OP wants to count comments based on post ID, so would it be more efficient to just do an aggregation for post ID and COUNT(*)?

Vatev Over a year ago

1. MySQL will not do a file sort in this case. 2. Even if it does the join will be much faster than a dependent subquery.

Uueerdo · Accepted Answer · 2015-07-07 17:31:52Z

0

This is the standard way I would do it (the use of LEFT JOIN, and SUM lets you also know which posts have no comments.)

SELECT posts.*
   , SUM(IF(comments.id IS NULL, 0, 1)) AS comments_count
FROM posts
LEFT JOIN comments USING (post_id)
GROUP BY posts.post_id
;

But if I were trying for faster, this might be better.

SELECT posts.*, IFNULL(subQ.comments_count, 0) AS comments_count
FROM posts
LEFT JOIN (
   SELECT post_id, COUNT(1) AS comments_count 
   FROM comments 
   GROUP BY post_id
) As subQ
USING (post_id)
;

edited Jul 7, 2015 at 17:31

answered Jul 7, 2015 at 17:26

Uueerdo

16k1 gold badge19 silver badges21 bronze badges

Comments

Ryan-Neal Mes · Accepted Answer · 2015-07-08 06:20:10Z

After a bit more research I found no time difference between the two queries

Benchmark.bm do |b|
 b.report('joined') do
   1000.times do
     ActiveRecord::Base.connection.execute('
       select
          p.id,
          (select count(c.id) from comments c where c.post_id = p.id) comment_count
       from
          posts l;')
   end
 end

 b.report('nested') do
   1000.times do
     ActiveRecord::Base.connection.execute('
       select
          p.id,
          count(c.id) comment_count
       from
          posts File.join(File.dirname(__FILE__), *%w[rel path here])

          left join comments c on
            c.post_id = p.id
       group by
          p.id;')
   end
 end
end

       user     system      total        real
nested  2.120000   0.900000   3.020000 (  3.349015)
joined  2.110000   0.990000   3.100000 (  3.402986)

However I did notice that when running an explain for both queries, more indexes are possible in the first query. Which makes me think it is a better option if the attributes needed in the select changed.

Collectives™ on Stack Overflow

Nested query performance

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related