0

Probably it's a duplicate, but I couldn't find a solution.

Requirement:

I have the below strings:

Heelloo
Heeelloo
Heeeelloo
Heeeeeelloo
Heeeeeeeelloo
.
.
.
Heeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeelloo

Expected output: Hello

What is the best way to achieve this in SQL?

Version I am using :

Microsoft SQL Server 2012 - 10.0.7365.0 (X64) Jul 28 2015 00:39:54 Copyright (c) 
Microsoft Corporation Parallel Data Warehouse (64-bit) on Windows NT 6.2 <X64> 
(Build 9200: )
9
  • So it's always multiple e's to replace with a single e? Commented Jan 14, 2016 at 14:51
  • Yes that's the requirement more or less. Commented Jan 14, 2016 at 14:52
  • Possible duplicate of Removing repeated duplicated characters Commented Jan 14, 2016 at 14:53
  • 1
    Nice challenge! You would need an approach that recognises ee and oo as errors but not ll. Because language is so complex I'm not sure SQL Server is the best tool for the job. Have you considered writing a small app to loop over and spell check the field? Commented Jan 14, 2016 at 14:57
  • 1
    Please avoid editing your question to ask more or different questions. It gets too chaotic that way. Commented Jan 14, 2016 at 16:23

3 Answers 3

12

There is a nice trick for removing such duplicates for a single letter:

select replace(replace(replace(col, 'e', '<>'
                              ), '><', ''
                      ), '<>', 'e'
              )

This does require two characters ("<" and ">") that are not in the string (or more specifically, not in the string next to each other). The particular characters are not important.

How does this work?

Heeello
H<><><>llo
H<>llo
Hello
Sign up to request clarification or add additional context in comments.

6 Comments

Wow!! Awesome answer!
Gordon - I am really sorry but I had an additional requirement here and I forgot to ask it in the question. Can I edit the current question please? (I would have to unmark your reply as an answer).
@SouravA . . . Your original question was quite clear. You should actually ask the revised question as a new question. Changing the question invalidates answers, which draw downvotes. Hence, it is rude.
Gordon - point taken. I will edit it back to original and mark this as an answer.
stackoverflow.com/q/34805863/2993606 Here is the link to the new question.
|
2

Based on T-SQL String Manipulation Tips and Techniques, Part 1 especially part Replacing Multiple Contiguous Spaces With a Single Space and idea of Peter Larsson, a SQL Server MVP:

Then, the solution involves three steps (assuming the token is ~):

  1. Replace in @str each occurrence of ' ' (space) with '~ ' (token plus space).
  2. Replace in the result of the previous step each occurrence of ' ~' (space plus token) with '' (an empty string).
  3. Replace in the result of the previous step each occurrence of '~ ' (token plus space) with ' ' (space).
CREATE TABLE #tab(val NVARCHAR(100));

INSERT INTO #tab
SELECT 'Hello'
UNION ALL SELECT 'Heello'
UNION ALL SELECT 'Heeello'
UNION ALL SELECT 'Heeeello'
UNION ALL SELECT 'Heeeeeello'
UNION ALL SELECT 'Heeeeeeeello'
UNION ALL SELECT 'Heeeeeeeeeello';

-- version for one vowel(it can be enhanced to handle other if needed)
SELECT val,
cleaned = REPLACE(
           REPLACE(
            REPLACE(
            REPLACE(REPLACE(val, REPLICATE('e', 8), '^^')
              , 'e', '~ ')
            , ' ~', '')
          , '~ ', 'e')
          ,'^^','ee')              
FROM #tab;

LiveDemo

Output:

╔════════════════╦═════════╗
║      val       ║ cleaned ║
╠════════════════╬═════════╣
║ Hello          ║ Hello   ║
║ Heello         ║ Hello   ║
║ Heeello        ║ Hello   ║
║ Heeeello       ║ Hello   ║
║ Heeeeeello     ║ Hello   ║
║ Heeeeeeeello   ║ Heello  ║
║ Heeeeeeeeeello ║ Heeello ║
╚════════════════╩═════════╝

1 Comment

I upvoted it. If you copy this answer on that question, I will mark it as an answer. Gordon got there first so marked his as an answer.
2

Try this user defined function:

CREATE FUNCTION TrimDuplicates(@String varchar(max))
RETURNS varchar(max)
AS
BEGIN
    while CHARINDEX('ee',@String)>0 BEGIN SET @String=REPLACE(@String,'ee','e') END
    while CHARINDEX('oo',@String)>0 BEGIN SET @String=REPLACE(@String,'oo','o') END
    RETURN @String
END

Example Usage:

select dbo.TrimDuplicates ('Heeeeeeeelloo')

returns Hello

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.