1

Is it possible to optimize the following into a single query?

I am assuming here that a single query would be more efficient that multiple queries using a temporary table, so please let me know if my assumption is incorrect.

$id is the current memberid. $list is a list of itemids to be removed from the final results (e.g. items already downloaded).

What this query is supposed to do it find the top 500 members who have downloaded similar items to the $id member. Then find all the items that these members have downloaded, ranked by a score based on the number of similar downloads from each member and the total number of downloads of each item. The final result is therefore a list of recommendations for the $id member.

The queries are:

mysql_query('CREATE TEMPORARY TABLE temp1 ENGINE=MEMORY AS
(SELECT a.memberid, COUNT(*) `score` FROM table_downloads a INNER JOIN
(SELECT itemid FROM table_downloads WHERE memberid='.$id.') b ON a.itemid = b.itemid
WHERE a.memberid!='.$id.' GROUP BY a.memberid HAVING score>0 ORDER BY score DESC LIMIT 500)');

$res=mysql_query('SELECT table_downloads.itemid,COUNT(table_downloads.itemid*temp1.score) AS score2
FROM table_downloads,temp1
WHERE table_downloads.memberid=temp1.memberid AND table_downloads.itemid NOT IN ('.$list.') 
GROUP BY table_downloads.itemid
ORDER BY score2 DESC LIMIT 30');

mysql_query('DROP TABLE temp1');

It's possible that this query might take too long as to be unusable if there were several million rows. Any advice on ensuring it is executes quickly would also be greatly appreciated.

*I am using mysql_query deliberately. Please do not tell me to use mysqli.*

7
  • You can always join the strings, separated by a semicolon (;). Commented Apr 10, 2013 at 11:22
  • "I am assuming here that a single query would be more efficient that multiple queries using a temporary table" --- why do you think so? Try to cut the whole book with scissors at one attempt. Then try one single page after another. Which would be more efficient? Commented Apr 10, 2013 at 11:23
  • It would really help if you made your query human readable ;) Commented Apr 10, 2013 at 11:24
  • @Anyone, you should have seen it before (it was all on one line). ;) Commented Apr 10, 2013 at 11:25
  • 1
    While I am generally a fan of single queries, in this situation you have non trivial queries and only 2 of them so any saving would likely to be nominal. However looking at your query the HAVING appears to be redundant (you have an INNER JOIN while the score is counting the resulting records). Also you can probably eliminate the subselect easily. These changes may slightly speed up your query. Also consider putting the $list details into another temp table and using an outer join to exclude them Commented Apr 10, 2013 at 12:31

2 Answers 2

1
  1. It's not possible to do with mysql_query()
  2. It's a mistake to think that joining 3 calls into one would save something noticeable in this case

And to be clear it's a very common delusion - to think that a single messy query would run faster than multiple. It wouldn't.

Sign up to request clarification or add additional context in comments.

1 Comment

In addition to this answer: sending the queries to the SQL server is a matter of milliseconds. Running the query is what's taking a long time. By sending all three queries at once you could save you only a few milliseconds. Optimize your SQL query instead.
0

Out of interest I had a play.

I think it is possible, although not pretty and I suspect slower than your current script.

SELECT table_downloads.itemid, COUNT(table_downloads.itemid * temp1.score) AS score2
FROM table_downloads
INNER JOIN (SELECT a.memberid, COUNT(*) `score` 
            FROM table_downloads a 
            INNER JOIN table_downloads b
            ON  a.itemid = b.itemid AND b.memberid='.$id.' AND a.memberid=b.memberid
            GROUP BY a.memberid 
            ORDER BY score DESC
            LIMIT 500) temp1
ON table_downloads.memberid = temp1.memberid 
WHERE table_downloads.itemid NOT IN ('.$list.') 
GROUP BY table_downloads.itemid
ORDER BY score2 DESC 
LIMIT 30

Not really tested though (don't know your table layouts)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.