1

I have a database table which stores ids of employees and their previous projects they have been working on. Now I want to retrieve pairs of employees that have been working on the same projects and amount of common projects between these two employees. If I do "self-join" approach then I get duplicate rows.

SELECT DISTINCT ep1.employee_id, ep2.employee_id, COUNT(p.id)
FROM employee_project ep1, employee_project ep2, project p
WHERE ep1.project_id=ep2.project_id 
AND ep1.employee_id  ep2.employee_id 
AND p.id=ep1.project_id
GROUP BY ep1.employee_id, ep2.employee_id, p.id

Result:

employee1 | employee2 | 5

employee2 | employee1 | 5

4 Answers 4

2

add ep1.employee_id >= ep2.employee_id to the where clause.

Sign up to request clarification or add additional context in comments.

Comments

0

I would create a new table with a unique index over the columns that you want to keep unique. Then do an insert from the old table into the new, ignoring the warnings about duplicated rows. Lastly, I would drop (or rename) the old table and replace it with the new table. In MySQL, this would look like

CREATE TABLE tmp LIKE mytable;
ALTER TABLE tmp ADD UNIQUE INDEX myindex (emp_name, emp_address, sex, marital_status);
INSERT IGNORE INTO tmp SELECT * FROM mytable;
DROP TABLE mytable;
RENAME TABLE tmp TO mytable;

Comments

0

Another option is to "normalize" the combinations and group on that:

SELECT greatest(ep1.employee_id, ep2.employee_id), 
       least(ep1.employee_id, ep2.employee_id), 
       count(*)
FROM employee_project ep1
  JOIN employee_project ep2 
    ON ep1.project_id=ep2.project_id 
   AND ep1.employee_id=ep2.employee_id 
  JOIN project p ON p.id=ep1.project_id
GROUP BY greatest(ep1.employee_id, ep2.employee_id), 
         least(ep1.employee_id, ep2.employee_id)

Comments

-1

It looks like all four column values are duplicated so you can do this -

select distinct emp_name, emp_address, sex, marital_status
from YourTable

However if marital status can be different and you have some other column based on which to choose (for eg you want latest record based on a column create_date) you can do this

select emp_name, emp_address, sex, marital_status
from YourTable a
where not exists (select 1 
                   from YourTable b
                  where b.emp_name = a.emp_name and
                        b.emp_address = a.emp_address and
                        b.sex = a.sex and
                        b.create_date >= a.create_date)

1 Comment

How is that in any way related to the question?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.