0

I'm cleaning up some data and have several rows that contain repetitive words within a string. For instance, "concrete-concrete" or "art-art". I could just do use a case when with a like to find these, however there are so many of these repetitions that it will take too long to find all of them. Is there a SQL function for finding duplicative patterns like this?

Here's some sample data:

category

  1. concrete-concrete
  2. art-art
  3. concrete-art
  4. music-classical
  5. music-music-classical
3
  • Please provide sample data, that'd help. Commented Apr 27, 2022 at 13:11
  • Are the words always separated by a "-"? Commented Apr 27, 2022 at 13:21
  • yes the words are always separated by "-". here's some sample data: category 1. concrete-concrete 2. art-art 3. concrete-art 4. music-classical 5. music-music-classical Commented Apr 27, 2022 at 13:22

2 Answers 2

1

Please check the script below that is developed in SQL Server 2016 that might help you on your task.

CREATE TABLE #temp(Id int,[StringValue] varchar(100))

INSERT INTO #temp VALUES
(1,'word1-art-word2-art'),
(2,'concrete-concrete-word1-word2'),
(3,'word1-word2-art-concrete'),
(4,'art-art-concrete-concrete')

SELECT Id --Gives the ID of string where repetitive word(s) exist
FROM #temp d
WHERE EXISTS(
              SELECT value 
              FROM STRING_SPLIT(d.[StringValue],'-') 
              WHERE value<>'' 
              GROUP BY value 
              HAVING COUNT(*)>1
            )

SELECT Id, --shows the repetitive word(s) for each id.
  (
    SELECT state_code
    FROM
      (
         SELECT STUFF((SELECT CAST(', ' + value AS VARCHAR(MAX)) 
         FROM STRING_SPLIT(d.[StringValue],'-')
         WHERE value<>N'' 
         GROUP BY value HAVING COUNT(*)>1
         FOR XML PATH ('')), 1, 2, '') AS state_code
      ) q
  ) repeatingwords
FROM #temp d
WHERE EXISTS(SELECT value FROM STRING_SPLIT(d.[StringValue],'-') WHERE value<>N'' GROUP BY value HAVING COUNT(*)>1)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for responding. I want to achieve it from a Regex if possible.
0

I would find the using the having function and delete them.

SELECT category, COUNT(*)
FROM users
GROUP BY category
HAVING COUNT(*) > 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.