How-to get rid of duplicates in SQL query

Question

I have a database table which stores ids of employees and their previous projects they have been working on. Now I want to retrieve pairs of employees that have been working on the same projects and amount of common projects between these two employees. If I do "self-join" approach then I get duplicate rows.

SELECT DISTINCT ep1.employee_id, ep2.employee_id, COUNT(p.id)
FROM employee_project ep1, employee_project ep2, project p
WHERE ep1.project_id=ep2.project_id 
AND ep1.employee_id  ep2.employee_id 
AND p.id=ep1.project_id
GROUP BY ep1.employee_id, ep2.employee_id, p.id

Result:

employee1 | employee2 | 5

employee2 | employee1 | 5

Andres · Accepted Answer · 2014-01-04 19:07:13Z

2

add ep1.employee_id >= ep2.employee_id to the where clause.

edited Jan 4, 2014 at 19:07

answered Jan 4, 2014 at 18:30

Andres

10.7k4 gold badges50 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Noyan · Accepted Answer · 2014-01-04 18:37:56Z

0

I would create a new table with a unique index over the columns that you want to keep unique. Then do an insert from the old table into the new, ignoring the warnings about duplicated rows. Lastly, I would drop (or rename) the old table and replace it with the new table. In MySQL, this would look like

CREATE TABLE tmp LIKE mytable;
ALTER TABLE tmp ADD UNIQUE INDEX myindex (emp_name, emp_address, sex, marital_status);
INSERT IGNORE INTO tmp SELECT * FROM mytable;
DROP TABLE mytable;
RENAME TABLE tmp TO mytable;

answered Jan 4, 2014 at 18:37

Noyan

12 bronze badges

Comments

user330315 · Accepted Answer · 2014-01-04 19:14:06Z

0

Another option is to "normalize" the combinations and group on that:

SELECT greatest(ep1.employee_id, ep2.employee_id), 
       least(ep1.employee_id, ep2.employee_id), 
       count(*)
FROM employee_project ep1
  JOIN employee_project ep2 
    ON ep1.project_id=ep2.project_id 
   AND ep1.employee_id=ep2.employee_id 
  JOIN project p ON p.id=ep1.project_id
GROUP BY greatest(ep1.employee_id, ep2.employee_id), 
         least(ep1.employee_id, ep2.employee_id)

answered Jan 4, 2014 at 19:14

user330315

Comments

Ben · Accepted Answer · 2014-01-04 18:34:58Z

-1

It looks like all four column values are duplicated so you can do this -

select distinct emp_name, emp_address, sex, marital_status
from YourTable

However if marital status can be different and you have some other column based on which to choose (for eg you want latest record based on a column create_date) you can do this

select emp_name, emp_address, sex, marital_status
from YourTable a
where not exists (select 1 
                   from YourTable b
                  where b.emp_name = a.emp_name and
                        b.emp_address = a.emp_address and
                        b.sex = a.sex and
                        b.create_date >= a.create_date)

edited Jan 4, 2014 at 18:34

Ben

53.1k36 gold badges133 silver badges156 bronze badges

answered Jan 4, 2014 at 18:32

Noyan

12 bronze badges

1 Comment

user330315 Over a year ago

How is that in any way related to the question?

Collectives™ on Stack Overflow

How-to get rid of duplicates in SQL query

4 Answers 4

Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related