0
SELECT COUNT(organization.ID)
FROM organization
WHERE organization.NAME IN (
    SELECT organization.NAME
    FROM organization
    WHERE organization.NAME <> ''
        AND organization.APPROVED = 0 
        AND organization.CREATED_AT > '2012-07-31 04:31:08'
    GROUP BY organization.NAME
    HAVING COUNT(organization.ID) > 1
)

This query finds duplicates, the problem is that it takes 6 seconds for the page to load because of the inner statement. Is there a way to make it run faster? MySQL database version 5.1.

4
  • Isn't the inner statement useless? SELECT COUNT(organization.ID) FROM organization WHERE organization.NAME <> '' AND organization.APPROVED =0 AND organization.CREATED_AT > '2012-07-31 04:31:08' GROUP BY organization.NAME HAVING COUNT( organization.ID ) >1) Commented Aug 31, 2012 at 20:47
  • 2
    No. It will return a other result. Commented Aug 31, 2012 at 20:51
  • No , mine for instance returns 67 duplicates , your query breaks it down to 55,10,2 which adds up to 67 Commented Aug 31, 2012 at 20:53
  • @SativaNL: the OP query is getting a count of all organizations that have a duplicate name, but ONLY for those organization names that have two (or more rows) with the specified predicates on APPROVED and CREATED_AT. The OP query will include additional rows in the total count. Commented Aug 31, 2012 at 21:38

4 Answers 4

1

Yes. This is slow because MySQL is slow in processing "in" queries. You can fix it by using this instead:

SELECT COUNT(organization.ID)
FROM organization o
WHERE exists (
    SELECT organization.NAME
    FROM organization o2
    WHERE organization.NAME <> ''
        AND organization.APPROVED = 0 
        AND organization.CREATED_AT > '2012-07-31 04:31:08' and
        organization.name = o.organization.name
    GROUP BY organization.NAME
    HAVING COUNT(organization.ID) > 1
)
Sign up to request clarification or add additional context in comments.

Comments

0

Try to avoid IN.

SELECT COUNT(organization.ID)
FROM 
    organization
    INNER JOIN 
    (
        SELECT organization.NAME
        FROM organization
        WHERE organization.NAME <> ''
            AND organization.APPROVED = 0 
            AND organization.CREATED_AT > '2012-07-31 04:31:08'
        GROUP BY organization.NAME
        HAVING COUNT(organization.ID) > 1
    ) AS t ON organization.NAME = t.Name

1 Comment

This one is pretty fast, will test it later on again thanks :)
0

I also find making indexes for the db fields included vastly improves speed in complex queries.

1 Comment

I think he has already indexes. The problem is the IN it will execute the statement for each row.
0

If what you want to return is a total "count" of all duplicates, but only for those organizations NAMES that have two or more rows with the specified predicates on APPROVED and CREATED_AT, then you could get by with an alternate statement to return an equivalent result:

SELECT SUM(c.cnt) 
  FROM ( SELECT COUNT(organization.ID) AS cnt
           FROM organization o
          WHERE o.NAME <> ''
          GROUP
             BY o.NAME
         HAVING SUM(o.APPROVED = 0 AND o.CREATED_AT > '2012-07-31 04:31:08') > 1
       ) c

MySQL can make use of a suitable covering index to satisfy this query, otherwise, this is likely a full scan on the organization table. But it avoids referencing the organization table twice, and avoids a JOIN operation.

One suitable covering index for this query would be:

ON organization (NAME, CREATED_AT, APPROVED, ID)

Note that if the ID column is guaranteed to be non-NULL (either a NOT NULL constraint or its the PRIMARY KEY of the table, you can avoid referencing that column, and you can leave that column out of the index definition.)

SELECT SUM(c.cnt) 
  FROM ( SELECT SUM(1) AS cnt
           FROM organization o
          WHERE o.NAME <> ''
          GROUP
             BY o.NAME
         HAVING SUM(o.APPROVED = 0 AND o.CREATED_AT > '2012-07-31 04:31:08') > 1
       ) c

The EXPLAIN output shows this query using the index to satisfy the query without referencing any data blocks from the table:

id  select_type  table       type    possible_keys    key              key_len  ref       rows  Extra                     
--  -----------  ----------  ------  ---------------  ---------------  -------  ------  ------  --------------------------
 1  PRIMARY      <derived2>  ALL     (NULL)           (NULL)           (NULL)   (NULL)       2                            
 2  DERIVED      o           index   organization_ix  organization_ix  44       (NULL)      29  Using where; Using index  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.