2

Consider this table (comments):

         id | post_id |      text
------------+---------|----------------
      79507 |     12  | Lorem Ipsum
      79544 |     12  | Foo, bar
      79545 |     14  | Interesting...

And this aggregate query:

SELECT comment_id, SUM(vote) AS votes
FROM votes 
GROUP BY comment_id;

 comment_id | votes 
------------+-------
      79507 |    3
      79544 |    4
      79545 |    1

I'm looking to joining the comments table and the aggregate query, but only interested in a very small subset of the data (only a particular post_id). This naive approach uses a subquery to correctly return the result for post_id 12:

SELECT comment_id, votes, text FROM comments c LEFT JOIN
  (SELECT comment_id, SUM(votes) AS vote
   FROM votes 
   GROUP BY comment_id) AS v
ON c.id = v.comment_id 
WHERE c.post_id = 12;

 comment_id | votes |      text
------------+-------|----------------
      79507 |    3  | Lorem Ipsum
      79544 |    4  | Foo, bar

However, this is highly inefficient, since we are computing the inner subquery on the entire table, but we are only interested in a very small subset of it (the votes table in this application is huge).

Intuitively, it seems we should be filtering the inner query and there we're missing a WHERE comment_id IN (...) in the subselect. However, we don't know which comment_ids we will need at that stage in the computation. Another subselect inside the subselect could be used to retrieve the appropriate comment_ids, but that seems very clumsy.

I'm inexperienced in SQL and not sure if there exists a cleaner solution. Perhaps the subselect approach is the wrong one altogether.

2
  • You forgot to declare your PostgreSQL version, which should be a given. Commented May 29, 2013 at 14:05
  • If you are working with a current version of Postgres, there is probably no need to list all columns redundantly. The primary key covers all columns of a table. Details in this related answer. Commented May 29, 2013 at 14:10

1 Answer 1

3

Not sure I understood well, don't you need something like that ?

SELECT c.id as comment_id, SUM (v.vote) as votes, c.text
FROM comments c
LEFT JOIN votes v ON c.id = v.comment_id
WHERE c.post_id = 12
GROUP BY c.id, c.text
Sign up to request clarification or add additional context in comments.

5 Comments

Oh, wow. That's embarrassingly simple, can't believe I missed it. Thanks!
Also, as I understand it, this implementation will need the GROUP BY clause to contain every column in both tables (except votes), which is a bit ugly and won't work with SELECT *.
@DavidChouinard well, every column you wanna retrieve, yes. This is not "ugly" (while select * is), this is the only way to go ;) : all fields in the select which are not in an aggregate function must be in the group by clause.
Well, I said it was ugly because there's redundant information, ie. the columns we want to retrieve listed twice: in the SELECT and the GROUP BY clause. It's a basic principle that redundant data ought to be factored out. But I understand, this is as good as it gets given the limits of SQL. :)
@DavidChouinard: I think the latest version of PG lets you get away with merely listing the primary key in the group by statement, on grounds that it's unique. (At the very least, there was a discussion on PG Hackers on this topic at some point.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.