1

I'm a Rails developer and I'm new to writing SQL script. I have users, portfolios, views, favorites and endorsements tables. users have many portfolios and many endorsements.portfolioshas manyviews, manyfavoritesand manyendorsements`.

Here is the script I wrote

top_users = User.find_by_sql(
  "SELECT users.*, 
    COUNT(portfolios.id) + 
    COUNT(views.id) + 
    COUNT(favorites.id) + 
    COUNT(case when endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id then 1 else 0 end) +
    COUNT(case when endorsements.user_id = users.id then 1 else 0 end)
    AS total 
  FROM users 
  LEFT OUTER JOIN portfolios ON portfolios.user_id = users.id 
  LEFT OUTER JOIN views ON views.subject_id = portfolios.id AND portfolios.user_id = users.id 
  LEFT OUTER JOIN favorites ON favorites.subject_id = portfolios.id AND portfolios.user_id = users.id 
  LEFT OUTER JOIN endorsements ON endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id OR endorsements.user_id = users.id 
  GROUP BY users.id 
  ORDER BY total DESC LIMIT 8"
)

total count is not fully what I expect because each portfolio is worth 50 points, view is 2 points, favorite is worth 10 points, and endorsement is worth 2 points.

Let say we have 3 users

user | COUNT 1 | COUNT 2 | COUNT 3 | COUNT 4 | COUNT 5
-------------------------------------------------------
1     | 0       | 0       | 0       | 0       | 10
2     | 2       | 2       | 2       | 2       | 0
3     | 5       | 0       | 0       | 0       | 0

With my script, the result come in the order of user 1, user 2, then users 3. However base on the points system, it should come out in the order of user 3, user 2 then user 1 because user 3 total points is 250, users 2 total is 128 and user 1 is 20, and this is the order I expect. I did tried this:

top_users = User.find_by_sql(
  "SELECT users.*, 
    COUNT(portfolios.id) * 50 + 
    COUNT(views.id) * 2 + 
    COUNT(favorites.id) * 10 + 
    COUNT(case when endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id then 1 else 0 end) * 2 +
    COUNT(case when endorsements.user_id = users.id then 1 else 0 end) * 2
    AS total 
  FROM users 
  LEFT OUTER JOIN portfolios ON portfolios.user_id = users.id 
  LEFT OUTER JOIN views ON views.subject_id = portfolios.id AND portfolios.user_id = users.id 
  LEFT OUTER JOIN favorites ON favorites.subject_id = portfolios.id AND portfolios.user_id = users.id 
  LEFT OUTER JOIN endorsements ON endorsements.portfolio_id = portfolios.id AND portfolios.user_id = users.id OR endorsements.user_id = users.id 
  GROUP BY users.id 
  ORDER BY total DESC LIMIT 8"
)

I tried the above script but does not work for me. Any thoughts or help would be much appreciated. Again, I'm very new with raw SQL script.


UPDATED I ended up doing this to avoid double count issue when LEFT INNTER JOIN multiple table.

    SELECT t4.id, t4.username, t4.avatar_url, p_count * 50 + ue_count * 2 + fav_count * 10 + ep_count * 2 + COUNT(vp.id) * 2 as point
    FROM (SELECT t3.id, t3.username, t3.avatar_url, p_count, ue_count, fav_count, COUNT(ep.id) as ep_count
          FROM( SELECT t2.id, t2.username, t2.avatar_url, p_count, ue_count, COUNT(fav_p.id) as fav_count
                FROM (SELECT t1.id, t1.username, t1.avatar_url, p_count, COUNT(e.user_id) as ue_count
                      FROM (SELECT u.*, COUNT(p.user_id) as p_count
                            FROM users u
                            LEFT OUTER JOIN (SELECT user_id, id
                                            FROM portfolios) p
                                            ON u.id = p.user_id
                                            GROUP BY u.id) t1
                      LEFT OUTER JOIN (SELECT user_id
                                      FROM endorsements) e
                                      ON e.user_id = t1.id
                      GROUP BY t1.id, t1.username, t1.avatar_url, p_count ) t2
                LEFT OUTER JOIN (SELECT p.id, p.user_id
                                FROM portfolios p
                                INNER JOIN favorites
                                ON favorites.subject_id = p.id) fav_p
                ON fav_p.user_id = t2.id
                GROUP BY t2.id, t2.username, t2.avatar_url, p_count, ue_count) t3
          LEFT OUTER JOIN (SELECT p.id, p.user_id
                          FROM portfolios p
                          INNER JOIN endorsements
                          ON endorsements.portfolio_id = p.id) ep
          ON ep.user_id = t3.id
          GROUP BY t3.id, t3.username, t3.avatar_url, p_count, ue_count, fav_count) t4
    LEFT OUTER JOIN (SELECT p.id, p.user_id
                    FROM portfolios p
                    INNER JOIN views
                    ON views.subject_id = p.id) vp
    ON vp.user_id = t4.id
    GROUP BY t4.id, t4.username, t4.avatar_url, p_count, ue_count, fav_count, ep_count
    ORDER BY point DESC
    LIMIT 8

Since I'm not familiar with SQL script as I'm a very beginner. The updated code above solve my problem but I wonder how bad the performance would be if I do that. Thanks for any inputs.

6
  • Is there a reason for doing it all on raw SQL? It would be more readable and probably easy to get that sum with Rails itself. Commented Mar 7, 2017 at 20:14
  • That try does not work for me can you please be more specific here... What do you observe? Commented Mar 7, 2017 at 20:46
  • @TarynEast I tried the above script but does not work for me Commented Mar 7, 2017 at 23:00
  • Yes, you said that... but "does not work for me" contains zero information that will help us. Can you please say what you observed happening that you have interpreted as "not working"? Commented Mar 7, 2017 at 23:34
  • @TarynEast "With my script, the result come in the order of user 1, user 2, then users 3. However base on the points system, it should come out in the order of user 3, user 2 then user 1 because user 3 total points is 250, users 2 total is 128 and user 1 is 20, and this is the order I expect." So I tried the second script the result still come out in the order of user 1, user 2 and user 3. The correct order should be 3, 2, 1 Commented Mar 8, 2017 at 5:01

3 Answers 3

1

After reading through a few more times, I think I got what you were saying. Try this.

SELECT users.id
    ,COUNT(portfolios.id) * 50 + 
     COUNT(VIEWS.id) * 2 + 
     COUNT(favorites.id) * 10 + 
     COUNT(e1.id) * 2 + 
     COUNT(e2.id) * 2 
     AS total
FROM users
LEFT JOIN portfolios
    ON portfolios.user_id = users.id
LEFT JOIN VIEWS
    ON VIEWS.subject_id = portfolios.id
LEFT JOIN favorites
    ON favorites.subject_id = portfolios.id
LEFT JOIN endorsements e1
    ON e1.portfolio_id = portfolios.id
LEFT JOIN endorsements e2
    ON e2.user_id = users.id
GROUP BY users.id
ORDER BY total DESC LIMIT 8

I assumed that endorsements related to either a user OR a portfolio. I don't know what your values look like in your tables but in theory, since an endorsement relates to a user or a portfolio but a portfolio always relates to a user it wouldn't be strictly necessary to join on both user_id or portfolio_id. In a case like that it's find to join the users table to the endorsements as e1 and the portfolios table to the endorsements as e2 and just add them.

Sign up to request clarification or add additional context in comments.

2 Comments

I will try this tonight and let you know how it goes. Thanks first :)
Sorry for a late response as I was out of town. Yes you are right, endorsements related to either a user or a portfolio. I now understand, since users table is already LEFT JOIN portfolios ON portfolios.user_id = users.id so we don't need portfolios.user_id = users.id in LEFT JOIN views ON views.subject_id = portfolios.id, and so on. By the way, your code result comes out as I expect, Thanks
1

First of all, unless your 'users" table only has one column, this breaks the rule that when you have aggregate functions in your select clause, every column that isn't passed into an aggregate function, has to be in your group by clause.

Second I don't think the case statements inside your COUNT() functions make sense. They are the same statements in your join. You should be able to just count the endoresements.Id and the Portfolios.id, I think. I may be a little fuzzy on what you're looking for. Also, what is a subject_id? is that an id field that determines whether an endorsement belongs to a user or a portfolio?
does a portfolio have both a user_id and a portfolio_id or is it one or the other but not both?

3 Comments

subject_id is basically portfolio_id but it was how previous person want to name it.
"this breaks the rule that" -- Because you are grouping by the table's primary key, Postgres is smart enough to see that users.any_column_you_like will still have a single unambiguous value after grouping, so you are allowed to select it. This was added in 9.1.
"Postgres is smart enough" -- That's seems like an awesome feature that I wasn't aware of. Is it based on defacto uniqueness of values on the table or does it work based on an enforced constraint? Do you still have to use at least one column in the group by clause? can you group by users.*? This generates a million questions I plan to look it up on the support site but do you have a link to get me started?
1

Any time you have multiple outer joins in a GROUP BY query, you have to be careful of double-counting. So I would change COUNT(portfolios.id) to COUNT(DISTINCT portfolios.id) etc. That should also remove the need for your CASE statements. Once you have those counts, you can multiply by their score values, as you say in your question (* 2 or * 50 or whatever you like).

1 Comment

you are right about double counting Paul. I add DISTINCT to Chris code above and the count come out correctly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.