SQL - Find duplicate values and remove in a field

Question

Consider a column named ArticleCategories in Article table. The goal is to delete repeated numbers from ArticleCategories column for each row (.

ArticleCategories
------------
3193;2867;3193;
2871;2923;2923;
3278;3337;3337;
2878;2876;2878;
3720;3680;3680;

Any help on how I can get this done with SQL Server?

desired result

ArticleCategories
------------
3193;2867;
2871;2923;
3278;3337;
2878;2876;
3720;3680;

Thanks in advance if anyone could help.

possible duplicate of T-SQL: Opposite to string concatenation - how to split string into multiple records — Tassadaque
– Tassadaque, Commented Sep 10, 2013 at 4:52
God, when are programmer going to stop concatenating multiple values into a single column?!?! You should stop wasting time on *cleaning" this up - you should instead re-architect it to use proper, normalized table design so that you don't need such a kludge in the first place — marc_s
– marc_s, Commented Sep 10, 2013 at 5:13
@marc_s, it's been like this for years :-). Moving to a new platform soon, everything is to be redesigned next, hopefully... — Ray
– Ray, Commented Sep 10, 2013 at 5:38

Community · Accepted Answer · 2017-05-23 11:56:34Z

First create the function SplitAndRemoveDuplicates which will split a given string by a delimiter, remove the duplicates, and return the string:

CREATE FUNCTION [dbo].[SplitAndRemoveDuplicates] (@sep VARCHAR(32), @s VARCHAR(MAX))

RETURNS VARCHAR(MAX)  
AS
BEGIN

   DECLARE @t TABLE (val VARCHAR(MAX)) 

   DECLARE @xml XML
   SET @xml = N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>'

   INSERT INTO @t(val) SELECT r.value('.','VARCHAR(MAX)') as Item FROM @xml.nodes('//root/r') AS RECORDS(r)

   ;WITH cte
    AS (SELECT ROW_NUMBER() OVER (PARTITION BY val ORDER BY val desc) RN
    FROM  @t)
    DELETE FROM cte
    WHERE  RN > 1

    RETURN (SELECT val + ';' from @t WHERE val <> '' FOR XML PATH ('') )
END

Make use of the function to update the existing data:

UPDATE Article
SET ArticleCategories = (SELECT dbo.[SplitAndRemoveDuplicates](';', ArticleCategories))

Now you have a reusable function.

credit to:https://stackoverflow.com/a/314917/455770 and https://stackoverflow.com/a/3822833/455770

Ray · Accepted Answer · 2013-09-10 05:34:41Z

Found a solution to my problem, hope it'd help the others as well.

CREATE FUNCTION dbo.DistinctList
(
@List VARCHAR(MAX),
@Delim CHAR
)
RETURNS
VARCHAR(MAX)
AS
BEGIN
DECLARE @ParsedList TABLE
(
Item VARCHAR(MAX)
)
DECLARE @list1 VARCHAR(MAX), @Pos INT, @rList VARCHAR(MAX)
SET @list = LTRIM(RTRIM(@list)) + @Delim
SET @pos = CHARINDEX(@delim, @list, 1)
WHILE @pos > 0
BEGIN
SET @list1 = LTRIM(RTRIM(LEFT(@list, @pos - 1)))
IF @list1 <> ''
INSERT INTO @ParsedList VALUES (CAST(@list1 AS VARCHAR(MAX)))
SET @list = SUBSTRING(@list, @pos+1, LEN(@list))
SET @pos = CHARINDEX(@delim, @list, 1)
END
SELECT @rlist = COALESCE(@rlist+',','') + item
FROM (SELECT DISTINCT Item FROM @ParsedList) t
RETURN @rlist
END
GO
SELECT dbo.DistinctList('342,34,456,34,3454,456,aa,bb,cc,aa',',') DistinctList
GO

Giving credit to the Author of the script here. http://blog.sqlauthority.com/2009/01/15/sql-server-remove-duplicate-entry-from-comma-delimited-string-udf/

Community · Accepted Answer · 2017-05-23 12:20:19Z

1

without cursors nor while, is a combination of previous questions in this site before

so the answer is create a string Split function

CREATE FUNCTION [dbo].[StringSplit]
(
  @delimited nvarchar(max),
  @delimiter nvarchar(100)     
) RETURNS @t TABLE
(
-- Id column can be commented out, not required for sql splitting string
  id int identity(1,1), -- I use this column for numbering splitted parts
  val nvarchar(max)
)
AS
BEGIN
  declare @xml xml
  set @xml = N'<root><r>' + replace(@delimited,@delimiter,'</r><r>') + '</r></root>'

  insert into @t(val)
  select 
    r.value('.','varchar(max)') as item
  from @xml.nodes('//root/r') as records(r)

  RETURN
END

The query looks like:

declare @Article  table
(
  ArticleCategories varchar(100)
)

insert into @Article values
('3193;2867;3193;'),
('2871;2923;2923;'),
('3278;3337;3337;'),
('2878;2876;2878;'),
('3720;3680;3680;')

;WITH DistinctArticles AS (
    SELECT DISTINCT ArticleCategories, Val
    FROM @article
    CROSS APPLY dbo.[StringSplit](ArticleCategories, ';')
    WHERE Val <> ''
),
Concatenated AS (
    SELECT 
        ArticleCategories,
        STUFF((
            SELECT '; ' + CAST([Val] AS VARCHAR(MAX)) 
            FROM DistinctArticles
            WHERE (ArticleCategories = Results.ArticleCategories) 
    FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)'),1,2,'') + ';' AS DistinctArticleCategories
    FROM DistinctArticles Results
    GROUP BY ArticleCategories
)
UPDATE @Article 
SET ArticleCategories = Concatenated.DistinctArticleCategories
FROM @Article a
INNER JOIN Concatenated ON a.ArticleCategories = Concatenated.ArticleCategories

edited May 23, 2017 at 12:20

CommunityBot

11 silver badge

answered Sep 10, 2013 at 6:04

Luis LL

2,9932 gold badges21 silver badges22 bronze badges

2 Comments

Rots Over a year ago

You took the same approach as I did, upvoted for that. Mine is simpler... let me know what you think: stackoverflow.com/a/18711930/455770

Luis LL Over a year ago

@Rots I upvoted you too. I think that your will do better with a big tables.

Dhaval · Accepted Answer · 2013-09-10 06:03:10Z

try following solution

create a UDF first which is as below

alter function dbo.SplitString2(@inputStr varchar(1000),@del varchar(5))
RETURNS varchar(max)
As
BEGIN

DECLARE @t table(col1 varchar(100))
DECLARE @table table(col1 varchar(100))
DECLARE @ret varchar(max)
Set @ret = ''
INSERT INTO @t
select @inputStr

if CHARINDEX(@del,@inputStr,1) > 0
BEGIN
    ;WITH CTE1 as (
    select ltrim(rtrim(LEFT(col1,CHARINDEX(@del,col1,1)-1))) as col,RIGHT(col1,LEN(col1)-CHARINDEX(@del,col1,1)) as rem from @t
    union all
    select ltrim(rtrim(LEFT(rem,CHARINDEX(@del,rem,1)-1))) as col,RIGHT(rem,LEN(rem)-CHARINDEX(@del,rem,1))
    from CTE1 c
    where CHARINDEX(@del,rem,1)>0
    )

        INSERT INTO @table 
        select col from CTE1
        union all
        select rem from CTE1 where CHARINDEX(@del,rem,1)=0
    END
ELSE
BEGIN
    INSERT INTO @table 
    select col1 from @t
END

Set @ret = (Select distinct col1 + ';'  from @table for xml path(''))

return @ret

END

then you can run following select/update query as per you requirement

update query -- this will update records in your table

update ArticleCategory Set ArticleCategories = dbo.SplitString2(ArticleCategories, ';' )

select query -- this will select distinct record only

Select ArticleCategories , dbo.SplitString2(ArticleCategories, ';' ) from ArticleCategory

LS_ᴅᴇᴠ · Accepted Answer · 2013-09-10 09:50:49Z

0

If it doesn't matter which rows must stay:

DELETE FROM Article WHERE (
    SELECT COUNT(*) FROM Article t WHERE
    t.ArticleCategories=Article.ArticleCategories AND t.%%physloc%%<Article.%%physloc%%
)>0;

This is how I would do it in SQLite, not sure if I matched syntax of MS SQL Server...

answered Sep 10, 2013 at 9:50

LS_ᴅᴇᴠ

11.2k1 gold badge27 silver badges46 bronze badges

Comments

Aamir Shahzad · Accepted Answer · 2014-01-24 15:47:07Z

Please try following solution,its fully tested for situation like yours, I hope this will help you.

DECLARE @ArticleCategories varchar(500)
DECLARE Cursor_Article CURSOR FOR 
SELECT ArticleCategories from Article
OPEN Cursor_Article
FETCH NEXT FROM Cursor_Article INTO @ArticleCategories
WHILE @@FETCH_STATUS = 0
BEGIN

   DECLARE @individualNum varchar(500),@ArticleCategoriesNew varchar(500) ,@ArticleCategoriesRem varchar(500)
   SET @ArticleCategoriesNew='';
   SET @ArticleCategoriesRem=@ArticleCategories

   WHILE PATINDEX('%;%',@ArticleCategoriesRem) > 0
   BEGIN
       SET @individualNum = SUBSTRING(@ArticleCategoriesRem, 0, PATINDEX('%;%',@ArticleCategoriesRem)) 

       SET @ArticleCategoriesRem = SUBSTRING(@ArticleCategoriesRem, LEN(@individualNum + ';') + 1, LEN(@ArticleCategoriesRem))
       if(@ArticleCategoriesRem not like '%'+@individualNum+'%')
       set @ArticleCategoriesNew=CASE WHEN ISNULL(@ArticleCategoriesNew,'')='' THEN @individualNum ELSE @ArticleCategoriesNew+';'+@individualNum+';' END

   END
   --If ArticleCategories is unique then leave same condition else Fetch primary key in cursor and append in Where condition of Update query
   Update Article SET ArticleCategories=@ArticleCategoriesNew WHERE ArticleCategories=@ArticleCategories    

   FETCH NEXT FROM Cursor_Article INTO @ArticleCategories
END 
CLOSE Cursor_Article;
DEALLOCATE Cursor_Article;

user2763202 · Accepted Answer · 2013-09-10 04:48:13Z

-2

SELECT DISTINCT ArticleCategories FROM Article

OR

SELECT ArticleCategories 
FROM Article
GROUP BY ArticleCategories

AND THIS IS FOR DELETE DUPLICATE VALUES

DELETE
FROM Article
WHERE ArticleCategories NOT IN
(
   SELECT MAX(ArticleCategories)
   FROM Article
   GROUP BY ArticleCategories
)

edited Sep 10, 2013 at 4:48

answered Sep 10, 2013 at 4:40

user2763202

12 bronze badges

5 Comments

Rots Over a year ago

The goal is to delete repeated numbers from ArticleCategories column for each row. This answer doesn't delete them.

ruakh Over a year ago

I don't think you're interpreting the question correctly. By my reading, the OP has individual values such as 3193;2867;3193; -- where the semicolons are actually part of the field values -- and (s)he wishes to write an UPDATE statement that will modify those values to remove duplicates -- resulting in values such as 3193;2867;.

ruakh Over a year ago

Also, even by your apparent interpretation, your DELETE statement doesn't really make sense: it says to delete any rows from Article whose ArticleCategories has a value that doesn't appear in any Article rows' ArticleCategories. That will never delete anything.

Rots Over a year ago

Now you are answering the question, but your answer is incorrect I'm afraid.

Rots Over a year ago

@user2763202 Don't worry too much. This is a question and answer site, so the answers need to be accurate - don't take it personally.

Collectives™ on Stack Overflow

SQL - Find duplicate values and remove in a field

7 Answers 7

Comments

Comments

2 Comments

Comments

Comments

Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

Comments

2 Comments

Comments

Comments

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related