2

Consider a column named ArticleCategories in Article table. The goal is to delete repeated numbers from ArticleCategories column for each row (.

ArticleCategories
------------
3193;2867;3193;
2871;2923;2923;
3278;3337;3337;
2878;2876;2878;
3720;3680;3680;

Any help on how I can get this done with SQL Server?

desired result

ArticleCategories
------------
3193;2867;
2871;2923;
3278;3337;
2878;2876;
3720;3680;

Thanks in advance if anyone could help.

4
  • possible duplicate of T-SQL: Opposite to string concatenation - how to split string into multiple records Commented Sep 10, 2013 at 4:52
  • 1
    God, when are programmer going to stop concatenating multiple values into a single column?!?! You should stop wasting time on *cleaning" this up - you should instead re-architect it to use proper, normalized table design so that you don't need such a kludge in the first place Commented Sep 10, 2013 at 5:13
  • @marc_s, it's been like this for years :-). Moving to a new platform soon, everything is to be redesigned next, hopefully... Commented Sep 10, 2013 at 5:38
  • @Ray see my answer stackoverflow.com/a/18711704/1230248 Commented Sep 10, 2013 at 6:07

7 Answers 7

3

First create the function SplitAndRemoveDuplicates which will split a given string by a delimiter, remove the duplicates, and return the string:

CREATE FUNCTION [dbo].[SplitAndRemoveDuplicates] (@sep VARCHAR(32), @s VARCHAR(MAX))

RETURNS VARCHAR(MAX)  
AS
BEGIN

   DECLARE @t TABLE (val VARCHAR(MAX)) 

   DECLARE @xml XML
   SET @xml = N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>'

   INSERT INTO @t(val) SELECT r.value('.','VARCHAR(MAX)') as Item FROM @xml.nodes('//root/r') AS RECORDS(r)

   ;WITH cte
    AS (SELECT ROW_NUMBER() OVER (PARTITION BY val ORDER BY val desc) RN
    FROM  @t)
    DELETE FROM cte
    WHERE  RN > 1

    RETURN (SELECT val + ';' from @t WHERE val <> '' FOR XML PATH ('') )
END

Make use of the function to update the existing data:

UPDATE Article
SET ArticleCategories = (SELECT dbo.[SplitAndRemoveDuplicates](';', ArticleCategories))

Now you have a reusable function.

credit to:https://stackoverflow.com/a/314917/455770 and https://stackoverflow.com/a/3822833/455770

Sign up to request clarification or add additional context in comments.

Comments

2

Found a solution to my problem, hope it'd help the others as well.

CREATE FUNCTION dbo.DistinctList
(
@List VARCHAR(MAX),
@Delim CHAR
)
RETURNS
VARCHAR(MAX)
AS
BEGIN
DECLARE @ParsedList TABLE
(
Item VARCHAR(MAX)
)
DECLARE @list1 VARCHAR(MAX), @Pos INT, @rList VARCHAR(MAX)
SET @list = LTRIM(RTRIM(@list)) + @Delim
SET @pos = CHARINDEX(@delim, @list, 1)
WHILE @pos > 0
BEGIN
SET @list1 = LTRIM(RTRIM(LEFT(@list, @pos - 1)))
IF @list1 <> ''
INSERT INTO @ParsedList VALUES (CAST(@list1 AS VARCHAR(MAX)))
SET @list = SUBSTRING(@list, @pos+1, LEN(@list))
SET @pos = CHARINDEX(@delim, @list, 1)
END
SELECT @rlist = COALESCE(@rlist+',','') + item
FROM (SELECT DISTINCT Item FROM @ParsedList) t
RETURN @rlist
END
GO
SELECT dbo.DistinctList('342,34,456,34,3454,456,aa,bb,cc,aa',',') DistinctList
GO

Giving credit to the Author of the script here. http://blog.sqlauthority.com/2009/01/15/sql-server-remove-duplicate-entry-from-comma-delimited-string-udf/

Comments

1

without cursors nor while, is a combination of previous questions in this site before

  1. Split Columns
  2. Join The Rows Back

so the answer is create a string Split function

CREATE FUNCTION [dbo].[StringSplit]
(
  @delimited nvarchar(max),
  @delimiter nvarchar(100)     
) RETURNS @t TABLE
(
-- Id column can be commented out, not required for sql splitting string
  id int identity(1,1), -- I use this column for numbering splitted parts
  val nvarchar(max)
)
AS
BEGIN
  declare @xml xml
  set @xml = N'<root><r>' + replace(@delimited,@delimiter,'</r><r>') + '</r></root>'

  insert into @t(val)
  select 
    r.value('.','varchar(max)') as item
  from @xml.nodes('//root/r') as records(r)

  RETURN
END

The query looks like:

declare @Article  table
(
  ArticleCategories varchar(100)
)

insert into @Article values
('3193;2867;3193;'),
('2871;2923;2923;'),
('3278;3337;3337;'),
('2878;2876;2878;'),
('3720;3680;3680;')

;WITH DistinctArticles AS (
    SELECT DISTINCT ArticleCategories, Val
    FROM @article
    CROSS APPLY dbo.[StringSplit](ArticleCategories, ';')
    WHERE Val <> ''
),
Concatenated AS (
    SELECT 
        ArticleCategories,
        STUFF((
            SELECT '; ' + CAST([Val] AS VARCHAR(MAX)) 
            FROM DistinctArticles
            WHERE (ArticleCategories = Results.ArticleCategories) 
    FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)'),1,2,'') + ';' AS DistinctArticleCategories
    FROM DistinctArticles Results
    GROUP BY ArticleCategories
)
UPDATE @Article 
SET ArticleCategories = Concatenated.DistinctArticleCategories
FROM @Article a
INNER JOIN Concatenated ON a.ArticleCategories = Concatenated.ArticleCategories

2 Comments

You took the same approach as I did, upvoted for that. Mine is simpler... let me know what you think: stackoverflow.com/a/18711930/455770
@Rots I upvoted you too. I think that your will do better with a big tables.
0

try following solution

create a UDF first which is as below

alter function dbo.SplitString2(@inputStr varchar(1000),@del varchar(5))
RETURNS varchar(max)
As
BEGIN

DECLARE @t table(col1 varchar(100))
DECLARE @table table(col1 varchar(100))
DECLARE @ret varchar(max)
Set @ret = ''
INSERT INTO @t
select @inputStr

if CHARINDEX(@del,@inputStr,1) > 0
BEGIN
    ;WITH CTE1 as (
    select ltrim(rtrim(LEFT(col1,CHARINDEX(@del,col1,1)-1))) as col,RIGHT(col1,LEN(col1)-CHARINDEX(@del,col1,1)) as rem from @t
    union all
    select ltrim(rtrim(LEFT(rem,CHARINDEX(@del,rem,1)-1))) as col,RIGHT(rem,LEN(rem)-CHARINDEX(@del,rem,1))
    from CTE1 c
    where CHARINDEX(@del,rem,1)>0
    )

        INSERT INTO @table 
        select col from CTE1
        union all
        select rem from CTE1 where CHARINDEX(@del,rem,1)=0
    END
ELSE
BEGIN
    INSERT INTO @table 
    select col1 from @t
END

Set @ret = (Select distinct col1 + ';'  from @table for xml path(''))

return @ret

END

then you can run following select/update query as per you requirement

update query -- this will update records in your table

update ArticleCategory Set ArticleCategories = dbo.SplitString2(ArticleCategories, ';' )

select query -- this will select distinct record only

Select ArticleCategories , dbo.SplitString2(ArticleCategories, ';' ) from ArticleCategory

Comments

0

If it doesn't matter which rows must stay:

DELETE FROM Article WHERE (
    SELECT COUNT(*) FROM Article t WHERE
    t.ArticleCategories=Article.ArticleCategories AND t.%%physloc%%<Article.%%physloc%%
)>0;

This is how I would do it in SQLite, not sure if I matched syntax of MS SQL Server...

Comments

0

Please try following solution,its fully tested for situation like yours, I hope this will help you.

DECLARE @ArticleCategories varchar(500)
DECLARE Cursor_Article CURSOR FOR 
SELECT ArticleCategories from Article
OPEN Cursor_Article
FETCH NEXT FROM Cursor_Article INTO @ArticleCategories
WHILE @@FETCH_STATUS = 0
BEGIN

   DECLARE @individualNum varchar(500),@ArticleCategoriesNew varchar(500) ,@ArticleCategoriesRem varchar(500)
   SET @ArticleCategoriesNew='';
   SET @ArticleCategoriesRem=@ArticleCategories

   WHILE PATINDEX('%;%',@ArticleCategoriesRem) > 0
   BEGIN
       SET @individualNum = SUBSTRING(@ArticleCategoriesRem, 0, PATINDEX('%;%',@ArticleCategoriesRem)) 

       SET @ArticleCategoriesRem = SUBSTRING(@ArticleCategoriesRem, LEN(@individualNum + ';') + 1, LEN(@ArticleCategoriesRem))
       if(@ArticleCategoriesRem not like '%'+@individualNum+'%')
       set @ArticleCategoriesNew=CASE WHEN ISNULL(@ArticleCategoriesNew,'')='' THEN @individualNum ELSE @ArticleCategoriesNew+';'+@individualNum+';' END

   END
   --If ArticleCategories is unique then leave same condition else Fetch primary key in cursor and append in Where condition of Update query
   Update Article SET ArticleCategories=@ArticleCategoriesNew WHERE ArticleCategories=@ArticleCategories    

   FETCH NEXT FROM Cursor_Article INTO @ArticleCategories
END 
CLOSE Cursor_Article;
DEALLOCATE Cursor_Article;

Comments

-2
SELECT DISTINCT ArticleCategories FROM Article

OR

SELECT ArticleCategories 
FROM Article
GROUP BY ArticleCategories 

AND THIS IS FOR DELETE DUPLICATE VALUES

DELETE
FROM Article
WHERE ArticleCategories NOT IN
(
   SELECT MAX(ArticleCategories)
   FROM Article
   GROUP BY ArticleCategories
)

5 Comments

The goal is to delete repeated numbers from ArticleCategories column for each row. This answer doesn't delete them.
I don't think you're interpreting the question correctly. By my reading, the OP has individual values such as 3193;2867;3193; -- where the semicolons are actually part of the field values -- and (s)he wishes to write an UPDATE statement that will modify those values to remove duplicates -- resulting in values such as 3193;2867;.
Also, even by your apparent interpretation, your DELETE statement doesn't really make sense: it says to delete any rows from Article whose ArticleCategories has a value that doesn't appear in any Article rows' ArticleCategories. That will never delete anything.
Now you are answering the question, but your answer is incorrect I'm afraid.
@user2763202 Don't worry too much. This is a question and answer site, so the answers need to be accurate - don't take it personally.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.