0

I'm trying to fetch 100 posts and order them by the number of times they've been "remixed" in the last week. Here is my query thus far:

SELECT COUNT(remixes.post_id) AS count, posts.title
FROM posts 
LEFT JOIN (
    SELECT * FROM remixes WHERE created_at >= 1343053513
) AS remixes ON posts.id = remixes.post_id
GROUP BY posts.id 
ORDER BY count DESC, posts.created_at DESC
LIMIT 100

This produces the correct result; however, after running DESCRIBE I get this:

Result of the DESCRIBE syntax

And here are my indexes on posts:

Posts Indexes

And my indexes on remixes:

Remixes Indexes

And here are my questions:

  1. Can you explain what the terms used in the extra column are really trying to tell me?
  2. Could you provide tips on how I can optimize this query so that it'll scale better.

Thanks in advance!

Update

Per Zane's solution, I've updated my query to:

SELECT COUNT(remixes.post_id) AS count, posts.title
FROM posts 
LEFT JOIN remixes ON posts.id = remixes.post_id AND remixes.created_at >= 1343053513
GROUP BY posts.id 
ORDER BY count DESC, posts.created_at DESC
LIMIT 100

And here's the latest DESCRIBE

LATEST DESCRIBE

I'm still worried about the filesort part. Any ideas?

4
  • The obvious optimisation would be denormalising your schema to keep an appropriately indexed remix counter in the posts table instead of having to count and sort them every time the query runs. Commented Jul 22, 2012 at 22:41
  • Thanks. I have that already for total remixes but I need to limit that down to those in the last week. Commented Jul 22, 2012 at 22:44
  • What you want to avoid is having to iterate over all the posts every time, that's the part that grows without bound. I'm not sure how you'd avoid this in a simple way. One thing that comes to mind is using a background job to keep a running tally in a remixes_past_week column that updates, say, hourly. E.g. the job runs Aug 10th 12:00, you look at all the remixes made on Aug 3rd between 11:00 and 12:00, substract them from the tallies for the respective posts, then look at remixes made Aug 10th between 11:00 and 12:00, and add them to the respective tallies. Commented Jul 22, 2012 at 22:57
  • I was thinking about a background worker, but was trying to get around that if possible. Thanks for your help! Commented Jul 22, 2012 at 23:00

2 Answers 2

1

Try not to wrap your JOIN in a sub-select as this will create an unindexed temporary table to store the result of the subselect in, where it then joins on that unindexed table.

Instead, put created_at as an additional join condition when joining the remixes table:

SELECT 
    a.title, COUNT(b.post_id) AS remixcnt
FROM 
    posts a
LEFT JOIN 
    remixes b ON a.id = b.post_id AND b.created_at >= 1343053513
GROUP BY 
    a.id, a.title
ORDER BY 
    remixcnt DESC, a.created_at DESC
LIMIT 100
Sign up to request clarification or add additional context in comments.

2 Comments

@JesseBunch, the reason why it's not using any keys is because there is no need to. You are not specifying any filtering in the WHERE clause, so it simply returns ALL rows from posts.
@JesseBunch if you're interested in learning more about DESCRIBE/EXPLAIN, take a look at this awesome presentation by Baron Schwartz
0

It seems to me that

SELECT COUNT(remixes.post_id) AS count, posts.title
FROM posts 
LEFT JOIN (
    SELECT * FROM remixes WHERE created_at >= 1343053513
) AS remixes ON posts.id = remixes.post_id
GROUP BY posts.id 
ORDER BY count DESC, posts.created_at DESC
LIMIT 100

could be rewritten as

SELECT COUNT(r.post_id) AS count, posts.title
FROM posts 
LEFT JOIN remixes r ON posts.id = r.post_id
WHERE r.created_at >= 1343053513
GROUP BY posts.id 
ORDER BY count DESC, posts.created_at DESC
LIMIT 100

which should give you a better EXPLAIN plan and run faster.

2 Comments

Not quite. I want the query to also returns posts where the remix count is zero. Is that possible with your optimization?
I believe so, but try it out to see. If COUNT(r.post_id) is returning NULL, simply wrap it in an IFNULL(). For example: IFNULL(COUNT(r.post_id), 0)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.