2

I have a MySQL Table which looks like:
(unique_id, uid_data1, uid_data2, sorting_data1, sorting_data2)

This table is used in a tool, where bidirectional relations weren't supported until now, so the table contains data that looks like (field order according line above):
(1, 1212, 2034, 1, 1)
(2, 2034, 1212, 1, 1)
(3, 4567, 9876, 1, 0)
(4, 9876, 4567, 0, 1)

The table also contains "single-directed" relations, i.e.
(5, 5566, 8899, 1, 9)
=> no row exists for (?, 8899, 5566, 9, 1)

As the tool now supports bidirectional/symmetric relations, I would like to remove the duplicate data from the mysql table - however I'm having some trouble finding an appropriate query to do this.
In the example above I would like to delete the rows with the uids 2 and 4 (as their data is already stored in the rows 1 and 3.

First, I tried to setup a SELECT-Statement to see, which entries would be deleted.
I thought of a JOIN-Query

SELECT x.uid, x.uid_link1, x.uid_link2, y.uid_link1 as 'uid_link2', y.uid_link2 as 'uid_link1'
FROM tx_sdfilmbase_hilfstab x
INNER JOIN tx_sdfilmbase_hilfstab y ON x.uid_link1=y.uid_link2 AND x.uid_link2=y.uid_link1
WHERE ???
ORDER BY x.uid_link1, x.uid_link2

However I'm stuck at the point where I have to tell MySQL to only select "half portion" of the records.
Any suggestions on how to do this?

P.S. Deleting each single record manually in the table isn't an option, as the table contains several thousand rows ;-)

2
  • This isn't going to be syntactically accurate, but something like delete from tx_sdfilmbase_hilfstab where uid_link2 in (select uid_link1 from tx_sdfilmbase_hilfstab) might work... Commented Sep 22, 2012 at 19:44
  • but then I might be deleting rows with no double entry, like: (1122, 2233), (2233, 1122), (5566, 1122) => here both (2233, 1122) and (5566, 1122) would be deleted, although only (2233, 1122) should be deleted, as (5566, 1122) has no double entry Commented Sep 22, 2012 at 19:54

2 Answers 2

4
Select t.* from MyTable t
inner join MyTable tt
On t.uid_data1 = tt.uid_data2 and t.uid_data2 = tt.uid_data1 and t.unique_ID > tt.unique_ID

Should find the "second" part of the pair (records 2 and 4 in your example)

If I got it right then

Delete t from MyTable t
inner join MyTable tt
On t.uid_data1 = tt.uid_data2 and t.uid_data2 = tt.uid_data1 and t.unique_ID > tt.unique_ID

should do the job

Sign up to request clarification or add additional context in comments.

5 Comments

This is even better than having a WHERE clause. Thx!
This is a problem I've had to solve once or twice myself. :( :(
Hm... The SELECT-Statement works perfectly, but the DELETE doesn't seem to work - it says that there is a syntax error... However, according to the manual, JOINs should be allowed for DELETE-Statements
Oops fixed I think, haven't got MySQl Handy.
Uff right, didn't see that either - maybe a bit late already/sleeping time ;-)
1

So, the one Row will be

uid_link1=1,uid_link2=9

and the other one

uid_link1=9 and uid_link2=1

right?

what about

.. WHERE x.uid_link1 < y.uid_link1 ...

but this will not remove duplicates with uid_link1=uid_link2

edit: or you can use ... WHERE x.unique_id < y.unique_id

1 Comment

Nice, "WHERE x.unique_id < y.unique_id" seems to do the job. thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.