1

Hey guys. This is a follow-on from this question:

After getting the right data and making some tweaks based on requests from business, I've now got this mini-beast on my hands. This query should return the total number of new jobseeker registrations and the number of new uploaded CV's:

SELECT COUNT(j.jobseeker_id) as new_registrations,
(
    SELECT 
      COUNT(c.cv_id)
    FROM 
      tb_cv as c, tb_jobseeker, tb_industry
    WHERE
      UNIX_TIMESTAMP(c.created_at) >= '1241125200'
    AND 
      UNIX_TIMESTAMP(c.created_at) <= '1243717200'
    AND 
      tb_jobseeker.industry_id = tb_industry.industry_id
) 
AS uploaded_cvs
FROM 
  tb_jobseeker as j, tb_industry as i
WHERE
  j.created_at BETWEEN '2009-05-01' AND '2009-05-31'
AND
  i.industry_id = j.industry_id
GROUP BY i.description, MONTH(j.created_at) 

Notes: - The two values in the UNIX TIMESTAMP functions are passed in as parameters from the report module in our backend.

Every time I run it, MySQL chokes and lingers silently into the ether of the Interweb.

Help is appreciated.

Update: Hey guys. Thanks a lot for all the thoughtful and helpful comments. I'm only 2 weeks into my role here, so I'm still learning the schema. So, this query is somewhere between a thumbsuck and an educated guess. Will start to answer all your questions now.

6
  • You'll have to provide some information about the tables involved in this query... which columns have indexes, etc.? ... Also, could you format the query a little friendlier to the eyes? Commented Jun 5, 2009 at 7:37
  • What are you trying to do? You will need to give us the schema and the indexes you're using if you want help on optimization. Commented Jun 5, 2009 at 7:37
  • What's this cv_id? A full table? Also, in the subquery, tb_cv is not joined/linked to tb_jobseeker and tb_industry. Are you sure you want to do this? Commented Jun 5, 2009 at 7:42
  • @Midiane: I suspect the query should return the number of jobseeker registrations and CVs created per industry per month? Is that correct? Commented Jun 5, 2009 at 9:09
  • @Tomalak yes, you're right. Sorry guys, got called into two long meetings. Commented Jun 5, 2009 at 10:10

2 Answers 2

6

tb_cv is not connected to the other tables in the sub-query. I guess this is the root cause for the slow query. It causes generation of a Cartesian product, yielding a lot more rows than you probably need.

Other than that I'd say you need indexes on tb_jobseeker.created_at, tb_cv.created_at and tb_industry.industry_id, and you might want to get rid of the UNIX_TIMESTAMP() calls in the sub-query since they prevent use of an index. Use BETWEEN and the actual field values instead.

Here is my attempt at understanding your query and writing a better version. I guess you want to get the count of new jobseeker registrations and new uploaded CVs per month per industry:

SELECT 
  i.industry_id,
  i.description, 
  MONTH(j.created_at)            AS month_created,
  YEAR(j.created_at)             AS year_created,
  COUNT(DISTINCT j.jobseeker_id) AS new_registrations,
  COUNT(cv.cv_id)                AS uploaded_cvs
FROM 
  tb_cv AS cv
  INNER JOIN tb_jobseeker AS j ON j.jobseeker_id = cv.jobseeker_id
  INNER JOIN tb_industry  AS i ON i.industry_id  = j.industry_id
WHERE
  j.created_at BETWEEN '2009-05-01' AND '2009-05-31'
  AND cv.created_at BETWEEN '2009-05-01' AND '2009-05-31'
GROUP BY 
  i.industry_id,
  i.description, 
  MONTH(j.created_at),
  YEAR(j.created_at)

A few things I noticed while writing the query:

  • you GROUP BY values you don't output in the end. Why? (I've added the grouped field to the output list.)
  • you JOIN three tables in the sub-query while only ever using values from one of them. Why? I don't see what it would be good for, other than filtering out CV records that don't have a jobseeker or an industry attached — which I find hard to imagine. (I've removed the entire sub-query and used a simple COUNT instead.)
  • Your sub-query returns the same value every time. Did you maybe mean to correlate it in some way, to the industry maybe?.
  • The sub-query runs once for every record in a grouped query without being wrapped in an aggregate function.
Sign up to request clarification or add additional context in comments.

3 Comments

+1 Indeed there is a cartesian product on tb_cv and tb_jobseeker
I must admit that I'm not strong on SQL and I used parts of other queries used in the system to get to this... sheepish
hey tomalak, i just tried your query. it works perfectly! thanks. and it's much easier to read and not as complex as my poor attempt. i stripped out a few fields I don't need. Thanks a lot, seriously.
0

First and foremost it may be worth moving the 'UNIX_TIMESTAMP' conversions to the other side of the equation (that is, perform a reverse function on the literal timestamp values at the other side of the >= and <=). That'll avoid the inner query having to perform the conversions for every record, rather than once for the query.

Also, why does the uploaded_cvs query not have any where clause linking it to the outer query? Am I missing something here?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.