Mysql Select rows with same values in one column and different in another

Question

I'd really appreciate it if someone could validate my SQL query.

For the following dataset:

MD5      UserPK      CategoryPK    
ADCDE    1           7  
ADCDE    1           4  
ADCDE    1           7  
dffrf    1           7  
dffrf    2           7  
dffrf    2           6 
dffrf    1           1

I'd like to select MD5 and CategoryPK where two or more rows exist with identical MD5 values, identical CatgegoryPK and two or more DIFFERENT UserPK values.

In other words, I'd like to know the MD5 and categoryPK of all records where two or more different users (UserPK) have assigned the same category (UserPK) to the same file (Md5). I'm not interested in records the same user has assigned the category to multiple times, (unless a different user has also assigned the same category to that file).

So from the above data, I would like to be returned just:

md5    CategoryPK
dffrf  7

The query I've written is:

SELECT md5, 
       count(md5), 
       count(distinct categorypk) as cntcat, 
       count(distinct userpk) as cntpk
FROM Hash
       group by md5 having count(md5) > 1 
                           and cntpk > 1
                           and cntcat = 1;

It seems to work, but before I start using it in anger, I'd appreciate a second opinion in case I've missed something or if there is a better way of doing it.

Thanks

This sort of question might be more appropriate for Code Review than StackOverflow. — eggyal
– eggyal, Commented May 18, 2012 at 9:31

eggyal · Accepted Answer · 2012-05-18 09:37:47Z

13

I don't think your code will give you what you're after; what happens when a file has been assigned more than one category by multiple users, with some categories overlapping? Then cntcat != 1, so your HAVING clause will fail to match even though the file has indeed been categorised the same way by multiple users.

I would instead use a self-join:

SELECT   a.MD5, a.CategoryPK
FROM     Hash a
  JOIN   Hash b
      ON a.MD5 = b.MD5
     AND a.UserPK <> b.UserPK
     AND a.CategoryPK = b.CategoryPK
GROUP BY a.MD5, a.CategoryPK
HAVING   COUNT(DISTINCT a.UserPK) > 2  -- you said "more than 2" ?

answered May 18, 2012 at 9:37

eggyal

127k18 gold badges220 silver badges244 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user1395659 Over a year ago

Good point - I just checked on one of my tables and I found 10 further files using this query!

GarethD · Accepted Answer · 2012-05-18 09:37:22Z

1

I can't see any problems with what you have written apart from you are not getting the category in your select list which appears to be in the criteria? I think you could simplify it slightly and get the category out:

SELECT  MD5, CategoryPK
FROM    Hash
GROUP BY MD5, CategoryPK
HAVING MIN(UserPK) <> MAX(UserPK)

Alternatively, you could look at solving this with a join, you may need to run a few tests and use EXPLAIN, but sometimes joins perform better than GROUP BY. It is worth trying anyway to see if you see any significant difference.

SELECT  DISTINCT t1.MDF, t2.CategoryPK
FROM    Hash T1
        INNER JOIN Hash T2
            ON T1.MD5 = T2.MD5
            AND T1.CategoryPK = T2.CategoryPK
            AND T1.UserPK < T2.UserPK

answered May 18, 2012 at 9:37

GarethD

70k11 gold badges90 silver badges130 bronze badges

2 Comments

eggyal Over a year ago

+1 for both suggestions (although if the OP actually meant "more than two different users" as described in his text, they obviously won't give the desired outcome - but based on his attempt I agree that he probably meant at least two..). However, I think your comment that there aren't any problems with what the OP wrote is erroneous: see my observation at the start of my answer.

user1395659 Over a year ago

Yes - you're correct, I did mean at least two and I've corrected the question.

Collectives™ on Stack Overflow

Mysql Select rows with same values in one column and different in another

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related