9

I have a query where I want to select all users who like a given set of artists. There also are some other WHERE criteria on country etc. Here's what the schema looks like.

          users                     favourite_artists             artists

+----------+------------+    +-----------+------------+    +--------+--------+
|    id    |  country   |    |  user_id  |  artist_id |    |  id    |  name  |
+----------+------------+    +-----------+------------+    +--------+--------+
|     1    |     gb     |    |     1     |      6     |    |   1    |  Muse  |
|     2    |     gb     |    |     1     |      5     |    |   2    |  RATM  |
|     3    |     us     |    |     1     |      3     |    |   3    |  ABBA  | 
|     4    |     us     |    |     2     |      3     |    |   4    |   U2   |
+----------+------------+    +-----------+------------+    +--------+--------+

I want to order them by the number of those artists they like. I also want to include users who don't like any of the artists but who match the WHERE criteria. The expected result set would look like.

+--------+---------------+----------------+
|   id   |    country    |   match_count  |
+--------+---------------+----------------+
|    6   |      gb       |       4        |
|    9   |      gb       |       4        |
|    2   |      gb       |       3        |
|    1   |      gb       |       2        |
|    5   |      gb       |       0        |
|    4   |      gb       |       0        |
+--------+---------------+----------------+

I've been trying to do it using a subquery to get the match_count and ordering by that but it's performing pretty slowly so I thought there'd have to be a better way.

   SELECT users.id, users.country
   (SELECT COUNT(*) FROM favourite_artists 
    WHERE user_id = users.id AND artist_id IN (1,3,4,9)) AS match_count        
   FROM "users" 
   WHERE users.country = 'gb'
   ORDER BY match_count DESC;

I'm using Postgresql 9.0.7. Any thoughts?

1 Answer 1

11

Your query is executing one subquery for every row in users. Such queries are called "correlated subqueries" and their performance, quite understandably, sucks.

Instead you want a join:

SELECT users.id, users.country, count(artist_id) as match_count
FROM users
LEFT JOIN favourite_artists ON user_id = users.id AND artist_id IN (1,3,4,9)
WHERE users.country = 'gb'
GROUP BY 1, 2
ORDER BY 3 DESC;

This query will get the joining rows far more efficiently, assuming you have an index on favourite_artists(user_id) - or better yet a multi-column index favourite_artists(user_id, artist_id).

Sign up to request clarification or add additional context in comments.

2 Comments

Relying on the position in the ORDER BY clause violates relational theory, which holds that the order of fields in the result set is not determined. I suggest "ORDER BY match_count" instead.
My panties are not in a bunch. I simply consider it poor practice to ever rely on numeric positional indicators when 1) it is just as easy and more clear to specify the field name and 2) the ordering of the fields, while supported by the SQL language, violates the relational model. It's not a big-enough deal to start a war over. I simply offered it as a useful suggestion.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.