0

When I execute

select *, count(*) c 
FROM mytable 
GROUP BY col3, col4 
HAVING c > 1
order by col4, col3;

I was expecting to get rows where for ANY value, among selection, there are at least two equivalent (col3, col4). However I get the results where only one row occurs with some col3 values. Could anybody please explain why?

With another words, I'm trying to build the query that gets all rows for wichch the pair (col3, col4) occurs more than once.

The example of the unexpected result is this:

id - col1 - col2 - col3 - col4 - c       
123- val1  val123   43   val444  2
456- val14  val52   45   val444  2

43 from column col3 never occurs in the result, but I would expect. Otherwise this row should not be in the result.

Correct,

select * from ukberu1m where col3=43 and col4=val444;

gives two results in the original table, but in the result table only one row is displayed by the wishful criteria, not two by some reason.

2
  • 1
    That query should do what you want. Can you post some sample data that gets the wrong result? Commented Jan 1, 2017 at 10:41
  • You probably select more non-aggregated columns than listed in group by clause; for getting all columns, use a subquery Commented Jan 1, 2017 at 10:42

2 Answers 2

2

If you want to see all the rows that have the duplicates, not just one instance of each, you need to join your query with the original table.

SELECT t1.*
FROM mytable AS t1
JOIN (SELECT col3, col4
      FROM mytable
      GROUP BY col3, col4
      HAVING COUNT(*) > 1) AS t2
ON t1.col3 = t2.col3 AND t1.col4 = t2.col4
ORDER BY col4, col3
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. It took a while to test it. It works. I will appreciate if you explain why the query in the question does a different thing? It groups - yes. Why does it display just the first row? Thanks a lot.
Because that's what GROUP BY does: it combines all the rows that have the same values into a single row.
You use GROUP BY when you want to get totals for a particular column. Like if you want to count rows by date, you would use SELECT DATE(timestamp) as date, COUNT(*) ... GROUP BY date.
You wouldn't expect to get multiple rows for each day if you did that, would you? Why then do you expect to get multiple rows in your query?
1

Assuming the id is unique per row, an alternative method is:

select t.*
from mytable t
where exists (select 1
              from mytable t2
              where t2.col3 = t.col3 and t2.col4 = t.col4 and t2.id <> t.id
             );

The advantage of this method is that it can take very good advantage of an index on mytable(col3, col4).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.