0

I'm having a problem with duplicates using mysql server with stuff function. The database contains few million records and using distinct is out of question (it seems that the query does not even process).

This is my database structure:

PersonID    Freetext    Importance
PersonID    Freetext    Importance
PersonID    Freetext    Importance
PersonID    Freetext    Importance
PersonID    Freetext    Importance

I have been using following structure for this query. The query works fine (regarding stuff function), but it returns duplicates. For example, if there are five ID's which are the same, the query returns five rows:

SELECT PersonID, Importance, Freetext = STUFF(
             (SELECT '~' + Freetext
              FROM TABLE t1
              WHERE t1.PersonID = t2.PersonID
              FOR XML PATH (''))
             , 1, 1, '') from TABLE t2
group By PersonID, Importance
order by Importance

How to avoid this?

I would like to have all the freetexts that belong to the same PersonID stuffed together to one single row, ordering the stuffing so that the row that has smallest importance would have it's freetext stuffed first.

3
  • 1
    If you have three freetexts for a person and they all have different importance, what sense does "Importance" make in the output? Which one do you want there? And is importance a number, a string, ... is higher more important or less important (think sev 1 vs. def con 5). Please add real sample data instead of five rows of column names, and show the exact output you expect. Commented Mar 25, 2019 at 15:58
  • 1
    STUFF won't be creating duplicates here, nor can it get rid of them (STUFF just removes the first '~' from the string). If you are getting duplicates that means your subquery is returning duplicates. Perhaps you need to include Importance in the WHERE of the correlated subquery? Commented Mar 25, 2019 at 15:59
  • Actually the importance itself does not matter in the output. Only thing the importance should effect is that the row which has the smallest importance number should be stuffed first. Commented Mar 25, 2019 at 15:59

1 Answer 1

2

Just guessing (can't be sure this gets the desired result without knowing that or the source data), but if you want FOR XML PATH to list the values in order of importance (smallest to largest), you need to move the ORDER BY:

SELECT PersonID, [Freetext] = STUFF(
       (SELECT '~' + Freetext
          FROM dbo.[TABLE] AS t1
          WHERE t1.PersonID = t2.PersonID
          ORDER BY Importance
          FOR XML PATH(''), TYPE
       ).value(N'./text()[1]', N'nvarchar(max)'), 1, 1, '') 
FROM dbo.[Table] AS t2
GROUP BY PersonID;

If you want to include importance somehow, you'll need to concatenate it with the freetext; it didn't make sense in the outer query or the group by (the group by importance was actually what was causing your duplicates).

If you need to prevent duplicate values for freetext, you'll need to add grouping within the subquery. But that gets complicated too; what if you have two identical values for freetext but they were entered with different importance? Do you discard the one with higher or lower importance?

Going back to the original problem, on SQL Server 2017 and above, you can use a much simpler (and more efficient) struct:

SELECT PersonID,  
       [Freetext] = STRING_AGG([Freetext],'~') 
                    WITHIN GROUP (ORDER BY Importance)
  FROM dbo.[Table]
  GROUP BY PersonID;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.