MySQL query to get all duplicates based on values from two columns

Question

When I execute

select *, count(*) c 
FROM mytable 
GROUP BY col3, col4 
HAVING c > 1
order by col4, col3;

I was expecting to get rows where for ANY value, among selection, there are at least two equivalent (col3, col4). However I get the results where only one row occurs with some col3 values. Could anybody please explain why?

With another words, I'm trying to build the query that gets all rows for wichch the pair (col3, col4) occurs more than once.

The example of the unexpected result is this:

id - col1 - col2 - col3 - col4 - c       
123- val1  val123   43   val444  2
456- val14  val52   45   val444  2

43 from column col3 never occurs in the result, but I would expect. Otherwise this row should not be in the result.

Correct,

select * from ukberu1m where col3=43 and col4=val444;

gives two results in the original table, but in the result table only one row is displayed by the wishful criteria, not two by some reason.

That query should do what you want. Can you post some sample data that gets the wrong result? — Barmar
– Barmar, Commented Jan 1, 2017 at 10:41
You probably select more non-aggregated columns than listed in group by clause; for getting all columns, use a subquery — Stephan Lechner
– Stephan Lechner, Commented Jan 1, 2017 at 10:42

Barmar · Accepted Answer · 2017-01-01 11:00:32Z

2

If you want to see all the rows that have the duplicates, not just one instance of each, you need to join your query with the original table.

SELECT t1.*
FROM mytable AS t1
JOIN (SELECT col3, col4
      FROM mytable
      GROUP BY col3, col4
      HAVING COUNT(*) > 1) AS t2
ON t1.col3 = t2.col3 AND t1.col4 = t2.col4
ORDER BY col4, col3

answered Jan 1, 2017 at 11:00

Barmar

789k57 gold badges555 silver badges669 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Haradzieniec Over a year ago

Thanks. It took a while to test it. It works. I will appreciate if you explain why the query in the question does a different thing? It groups - yes. Why does it display just the first row? Thanks a lot.

Barmar Over a year ago

Because that's what GROUP BY does: it combines all the rows that have the same values into a single row.

Barmar Over a year ago

You use GROUP BY when you want to get totals for a particular column. Like if you want to count rows by date, you would use SELECT DATE(timestamp) as date, COUNT(*) ... GROUP BY date.

Barmar Over a year ago

You wouldn't expect to get multiple rows for each day if you did that, would you? Why then do you expect to get multiple rows in your query?

Gordon Linoff · Accepted Answer · 2017-01-01 15:31:34Z

1

Assuming the id is unique per row, an alternative method is:

select t.*
from mytable t
where exists (select 1
              from mytable t2
              where t2.col3 = t.col3 and t2.col4 = t.col4 and t2.id <> t.id
             );

The advantage of this method is that it can take very good advantage of an index on mytable(col3, col4).

answered Jan 1, 2017 at 15:31

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Collectives™ on Stack Overflow

MySQL query to get all duplicates based on values from two columns

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related