4

I have table with a description column. There is some html in that column.

That html then passes to page, and render it. But sometimes there are comments met, and sometimes they are not closed. I need to clean this data.

I've found them through

select * from table where description like '%<!--%'`  

I think its not correct, but I have no idea how to do it better.

Can you suggest a solution to my problem?

Edit
Here is some example.

<div class="text"> some data </div>
<ul><!-- Some comment -->
  <li>test</li>
  <li>test2</li> <!-- Some comment
</ul>

Regards, Dmitry.

8
  • If your code works, why do you want to change it? Commented Jun 3, 2013 at 7:38
  • 2
    You'd rather do that job with your program rather than with SQL Server. Otherwise you could always write a CLR method. Commented Jun 3, 2013 at 7:38
  • You're question is "how to clean" not "how to find", right? Commented Jun 3, 2013 at 7:39
  • The problem is that, I've got about 16000 rows, in that table, so CLR methor may be very expensive in resources solution. To @Andrey Gordeev, promlem is that when comment tag is openet, on my page all that is going after that tag is not rendering. Commented Jun 3, 2013 at 7:42
  • @Serge, my question is how to find and clean. Commented Jun 3, 2013 at 7:43

2 Answers 2

2

Try this one -

Query:

DECLARE @temp TABLE
(
      id INT IDENTITY(1,1)
    , [description] NVARCHAR(MAX)
)

INSERT INTO @temp ([description])
VALUES ('
<div class="text"> some data <!--test</div>
<ul>
  <li>test</li>
  <li>test2</li> <!-- Some comment
</ul>')

;WITH cte AS 
(
    SELECT t.id, t.token
    FROM (
        SELECT 
              t.id
            , token = 
                SUBSTRING(
                      t.[description]
                    , number
                    , ABS(CHARINDEX('<', t.[description], number + 1) - number))
        FROM @temp t
        CROSS JOIN [master].dbo.spt_values n
        WHERE [type] = 'p'
            AND number <= LEN(t.[description]) - 1
            AND SUBSTRING(t.[description], number, 1) = '<'
    ) t
    WHERE t.token NOT LIKE '<!--%'
)
UPDATE t
SET [description] = (
    SELECT c.token
    FROM cte c
    WHERE c.id = t.id
    FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')
FROM @temp t

SELECT *
FROM @temp

Results:

<div class="text"> some data </div>
<ul>
  <li>test</li>
  <li>test2</li> 
 </ul>
Sign up to request clarification or add additional context in comments.

1 Comment

Well, thank you it works fine, but it removes only comment to nearest '<' but I think it can be dangerous to leave tags that were commented? For example if I had ul with 5 li but 5th was commented, and comment tag was accidently forgotten or else, it will show unpleasent 5th li item.
1

Use CASE WHEN to check for the existence of an open comment sequence and for the non-existence of a close comment sequence and return a column that adds a closing comment if necessary.

SELECT * ,
  CASE WHEN CHARINDEX('<!--', description) > 0
    AND CHARINDEX('-->', description) = 0 
  THEN description + '-->'
  ELSE description
  END AS clean_description
FROM dbo.[table]

Alternatively, if you want to remove the broken comment, use

SUBSTRING(description, 0, CHARINDEX('<!--', description))

for the THEN branch of the statement.

1 Comment

Good solution, Ill think of it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.