1

I've added some data to my database and I just found out that I've got a lot of duplicates, with different key of course, and I want to merge them into a single record.

I'd like to do it within the sql database itself, I don't want to truncate the table and insert the values again, without duplicates, because the script is quite slow.

Here's a sample of my scenario:

Table track:

key |   artist  | title
----|-----------|--------
k1  |  artist1  | title1
----|-----------|--------
k2  |  artist1  | title1
----|-----------|--------
k3  |  artist1  | title1

Table chart:

trackKey | otherKey |  anotherKey  |  value
---------|----------|--------------|---------
k1       |   ok1    |      ak4     |    v1
---------|----------|--------------|---------
k3       |   ok2    |      ak2     |    v2
---------|----------|--------------|---------
k1       |   ok3    |      ak9     |    v2
---------|----------|--------------|---------
k2       |   ok4    |      ak1     |    v6

where chart.trackKey references track.key

The result that I'd like to achieve is:

Table track:

key |   artist  | title
----|-----------|--------
k1  |  artist1  | title1

Table chart:

trackKey | otherKey |  anotherKey  |  value
---------|----------|--------------|---------
k1       |   ok1    |      ak4     |    v1
---------|----------|--------------|---------
k1       |   ok2    |      ak2     |    v2
---------|----------|--------------|---------
k1       |   ok3    |      ak9     |    v2
---------|----------|--------------|---------
k1       |   ok4    |      ak1     |    v6

so that each duplicate of the same entry in track is merged into one row and the old keys in chart are updated with the only one that remained in the track table.

Is there any way to do this in SQL?

EDIT:

Solution #1 based on @popovitsj's answer

UPDATE chart c SET trackUri =
(WITH track_unique AS
(
    SELECT MIN(uri) AS key, artist, title, album. artwork FROM track
    GROUP BY artist, title
)
SELECT tu.key FROM chart c1
INNER JOIN track t ON c1.trackUri = t.key
INNER JOIN track_unique tu ON t.artist = tu.artist AND t.title = tu.title
WHERE c1.trackUri = c.trackUri and c1.countryId = c.countryId and c1.date = c.date);

returns

#1064 - Syntax error near 
'track_unique AS (
SELECT MIN(uri) AS key, artist, title, album. artwork FR' line 2 

Solution #2 based on @juergen d's answer

update chart
join track t1 on t1.uri = chart.trackUri
left join 
(
   select min(uri) as key
   from track 
   group by artist, title
) tmp_track on tmp_track.key = chart.trackUri
set trackkey = tmp_tbl.key
where chart.trackUri not in 
(
  select min(uri)
  from track
  group by artist, title
  having count(*) > 1
);

returns

#1064 - Syntax error near
   'key
   from track
   group by artist, title
) tmp_track on tmp_track.key = c' line 5 

I don't know what I'm doing wrong so I'm adding the schema definitions (taken from phpMyAdmin)

enter image description here

3
  • If you have to do this in the first place, it is definitely a code smell. Why are you getting duplicates? Commented Jun 28, 2014 at 10:51
  • Because the API that I'm using returns duplicates. Each track may have a different uri, based on the country in which is published. This means that track x has a certain uri for the Italian album while it has a different one for the USA's one, but it's still the same track and, for what I need to do, those are considered duplicates Commented Jun 28, 2014 at 11:00
  • This comment explains what's really going on. It isn't data mismanagement (aka code smell). It's that you are seeking a projection of the data onto a different space. "Projection" is one of the fundamental operations on relations, the basis of the entire relational system. Unfortunately, projection requires time, so your best answer is going to run a while. If you are the DBA, you might find it worthwhile to materialize the projection in a secondary table, but this carries baggage of its own. Commented Jun 28, 2014 at 11:53

2 Answers 2

1

The first with clause gets the id's you want to keep, then in the next select query you match those id's to the the chart id.

I edited this answer based on your modification of my original answer. This answer assumes that chart(countryid,date) uniquely identies a chart, and that tracks may be merged only if track(key,artist,title,album) is equal.

UPDATE chart c SET trackUri =
(WITH track_unique AS
(
    SELECT MIN(uri) AS key, artist, title, album, artwork FROM track
    GROUP BY artist, title, album, artwork
)
SELECT tu.key FROM chart c1
INNER JOIN track t ON c1.trackUri = t.key
INNER JOIN track_unique tu
ON t.artist = tu.artist
AND t.title = tu.title
AND t.album = tu.album
AND t.artwork = tu.artwork
WHERE c1.trackUri = c.trackUri
AND c1.countryId = c.countryId
AND c1.date = c.date);

To delete the leftover duplicates after doing this update:

DELETE FROM track
WHERE uri NOT IN
    (SELECT MIN(uri) AS key, artist, title, album, artwork
     FROM track
     GROUP BY artist, title, album, artwork);
Sign up to request clarification or add additional context in comments.

5 Comments

chart's key is formed by (trackKey, otherKey, anotherKey) and track's key is an URI. Is your code still valid?
Yeah, sure, that just makes the query a little bit longer.
I've edited the first comment while you posted yours, please check it up
I don't see why it would matter if the track's key is a URI. It's still comparable so MIN() should work.
It doesn't surprise me that that would give a syntax error. Try my revised answer.
0

If the duplicate values are exact duplicates, you could use

SELECT MIN(key),artist,title FROM track GROUP BY artist,title;

to get a duplicate-free version of the data in the track table. You could put this in a temporary table and swap them over, or use your SQL client to download the data and re-import it, or whatever-- for safety's sake I wouldn't try to do it all in a single statement...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.