0

I have two queries below. The first one has a nested select. The second one makes use of a group by clause.

select
  posts.*,
  (select count(*) from comments where comments.post_id = posts.id and comments.is_approved = 1) as comments_count
from
  posts 

select
  posts.*,
  count(comments.id) comments_count
from
  posts

  left join comments on
     comments.post_id = posts.id 
group by
  posts.*

From my understanding the first query is worse because it has to do a select for each record in posts where as the second query does not.

Is this true or false?

2
  • 1
    The second won't work at all, you need to join comments to posts. Also, I've never grouped that way so I can't be sure, but even if it is legal syntax, your GROUP BY would be just as effective, and possibly faster, if you just did GROUP BY posts.post_id. Also, once properly written, I would expect the latter to be faster. Commented Jul 7, 2015 at 17:20
  • thanks ... sorry I missed out including the left join, edited to include it. Commented Jul 7, 2015 at 17:30

3 Answers 3

1

As with all performance questions, you should test the performance on your system with your data.

However, I would expect the first to perform better, with the right indexes. The right index for:

select p.*,
       (select count(*)
        from comments c
        where c.post_id = p.id and c.is_approved = 1
       ) as comments_count
from posts p

is comments(post_id, is_approved).

MySQL implements a group by by doing a file sort. This version saves a file sort on all the data. My guess is that will be faster than the second method.

As a note: group by posts.* is not valid syntax. I assume this was intended for illustration purposes only.

Sign up to request clarification or add additional context in comments.

2 Comments

This is speculation, but it looks to me like OP wants to count comments based on post ID, so would it be more efficient to just do an aggregation for post ID and COUNT(*)?
1. MySQL will not do a file sort in this case. 2. Even if it does the join will be much faster than a dependent subquery.
0

This is the standard way I would do it (the use of LEFT JOIN, and SUM lets you also know which posts have no comments.)

SELECT posts.*
   , SUM(IF(comments.id IS NULL, 0, 1)) AS comments_count
FROM posts
LEFT JOIN comments USING (post_id)
GROUP BY posts.post_id
;

But if I were trying for faster, this might be better.

SELECT posts.*, IFNULL(subQ.comments_count, 0) AS comments_count
FROM posts
LEFT JOIN (
   SELECT post_id, COUNT(1) AS comments_count 
   FROM comments 
   GROUP BY post_id
) As subQ
USING (post_id)
;

Comments

0

After a bit more research I found no time difference between the two queries

Benchmark.bm do |b|
 b.report('joined') do
   1000.times do
     ActiveRecord::Base.connection.execute('
       select
          p.id,
          (select count(c.id) from comments c where c.post_id = p.id) comment_count
       from
          posts l;')
   end
 end

 b.report('nested') do
   1000.times do
     ActiveRecord::Base.connection.execute('
       select
          p.id,
          count(c.id) comment_count
       from
          posts File.join(File.dirname(__FILE__), *%w[rel path here])

          left join comments c on
            c.post_id = p.id
       group by
          p.id;')
   end
 end
end

       user     system      total        real
nested  2.120000   0.900000   3.020000 (  3.349015)
joined  2.110000   0.990000   3.100000 (  3.402986)

However I did notice that when running an explain for both queries, more indexes are possible in the first query. Which makes me think it is a better option if the attributes needed in the select changed.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.