SQL replace query for variable content

Question

I am looking for a SQL query to clean up a hacked SQL Server database. I have some basic SQL knowledge, but I have no idea how to solve the below.

For one of our websites we have a SQL Server database that was recently hacked into. Thousands of records were filled with hidden divs, containing all sorts of dodgy references. Our ISP says the content of the database is not their responsibility and they have no knowledge how to help us clean up the database. There are no clean backups available. It is way too much work to go through all records manually.

So I am now desperately trying to find a SQL query to remove these blocks of hidden text from the database.

Two useful bits of information:

All the spammy content is contained within div tags. The information between the tags is different in every instance, but they all open and close with the div tag.
Our original data will have some HTML-content, but will never contain div tags. So if we can find a way to remove everything from the starting div up to and including the closing div, then we would be sorted.

Any assistance here is much appreciated. Thanks for your time.

Before you get much further along you need to go back to your code and fix the sql injection vulnerability. When you do build up a string containing content from the user without parameters your code is vulnerable. I know it seems backwards but you probably need to fix that before fixing the data. Most times when somebody finds that exploit they just keep doing it so any data cleanup is pointless as it will just get defaced again. Once that is done you can start the cleanup. — Sean Lange
– Sean Lange, Commented May 9, 2016 at 20:56
The cleanup is going to be fun in its own right. You will have to get really familiar with substring, patindex and charindex. — Sean Lange
– Sean Lange, Commented May 9, 2016 at 20:57
No idea what programming language your site is in but this site has some basics for preventing sql injection in many languages. bobby-tables.com — Sean Lange
– Sean Lange, Commented May 9, 2016 at 20:58
Also, once you're back in business, back up your databases regularly. Please. — Jeffrey Van Laethem
– Jeffrey Van Laethem, Commented May 9, 2016 at 20:58
I realize that, but it would hide them until they can fix the code to prevent it from coming back. It would be a stop gap at best for sure. — Sean Lange
– Sean Lange, Commented May 9, 2016 at 21:05

Dharmendar Kumar 'DK' · Accepted Answer · 2016-05-09 21:18:28Z

3

Try this out; will only work if your assumptions are correct. Another assumption hacker did not add nested DIVs. And yes, TEST this thoroughly before running the update. And back up your data before running the update.

CREATE TABLE #temp(id INT IDENTITY, html VARCHAR(MAX));

INSERT #temp(html)
VALUES('<p>Some text</p><strong>other text</strong><div>added by hacker</div>')
,('<p>Some text</p><strong>other text<div>added by hacker within html tag</div></strong>')
,('<p>Some text</p><div>some other text added by <a href="http://google.com">hack</a></div><strong>other text</strong>');


SELECT html
,  CHARINDEX('<div',html) AS startPos
, CHARINDEX('</div>',html) AS endPos
, (CHARINDEX('</div>',html)+6)-(CHARINDEX('<div',html)) AS stringLenToRemove
, SUBSTRING(html, CHARINDEX('<div',html), (CHARINDEX('</div>',html)+6)-(CHARINDEX('<div',html))) AS HtmlAddedByHack
,REPLACE(html,SUBSTRING(html, CHARINDEX('<div',html), (CHARINDEX('</div>',html)+6)-(CHARINDEX('<div',html))), '') AS sanitizedHtml
FROM #temp;

--UPDATE #temp
--SET html = REPLACE(html,SUBSTRING(html, CHARINDEX('<div',html), (CHARINDEX('</div>',html)+6)-(CHARINDEX('<div',html))), '');

--SELECT  *
--FROM    #temp;

answered May 9, 2016 at 21:18

Dharmendar Kumar 'DK'

2,11217 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

m4v21 Over a year ago

Thank you very much for your input, DK. Out of the code suggestions received so far this is the only one that I can actually understand :] It looks pretty logical and is along the lines I was thinking. I will give this one a try and will let you know.

m4v21 Over a year ago

Hi DK, I have been experimenting with the above and have been able to get this code to produce results. What I am running now is this:

m4v21 Over a year ago

SELECT tekst , CHARINDEX('<div',tekst) AS startPos , CHARINDEX('</div>',tekst) AS endPos , (CHARINDEX('</div>',tekst)+6)-(CHARINDEX('<div',tekst)) AS stringLenToRemove , SUBSTRING(tekst, CHARINDEX('<div',tekst), (CHARINDEX('</div>',tekst)+6)-(CHARINDEX('<div',tekst))) AS HtmlAddedByHack FROM [xxx].[dbo].[xxx] WHERE id =1  UPDATE [xxx].[dbo].[xxx] SET tekst = REPLACE(tekst,SUBSTRING(tekst, CHARINDEX('<div',tekst), (CHARINDEX('</div>',tekst)+6)-(CHARINDEX('<div',tekst))), '') WHERE id =1;

m4v21 Over a year ago

As you can see I am limiting the query to one record for testing purposes. What happens is that it looks through the record for the first instance of the div and then succesfully removes it. There are two problems though: 1) the records contain multiple instances (but I can run the query multiple times to take them all out); 2) when I repeat the query, at some point it runs out of divs and starts deleting the first six characters of the record (presumably from the +6 in the code). How can I tell the query to not delete anything when there are no more div's?

Dharmendar Kumar 'DK' Over a year ago

you can add a where clause to ensure only rows that have the issue are cleaned: WHERE html like '%<div>%' and html like '%</div>%'

|

Community · Accepted Answer · 2018-11-13 23:03:11Z

A UDF that uses PATINDEX may be able to do it.

Assuming

All malicious content is in <DIV>...</DIV> sections
No <DIV>...</DIV> sections exist that are not malicious content
You TEST THIS EXTENSIVELY on a backup of your data before applying it to your live database

First use this UDF for Pattern replacement, from here:

CREATE FUNCTION dbo.PatternReplace
(
   @InputString VARCHAR(4000),
   @Pattern VARCHAR(100),
   @ReplaceText VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
   DECLARE @Result VARCHAR(4000) SET @Result = ''
   -- First character in a match
   DECLARE @First INT
   -- Next character to start search on
   DECLARE @Next INT SET @Next = 1
   -- Length of the total string -- 8001 if @InputString is NULL
   DECLARE @Len INT SET @Len = COALESCE(LEN(@InputString), 8001)
   -- End of a pattern
   DECLARE @EndPattern INT

   WHILE (@Next <= @Len) 
   BEGIN
      SET @First = PATINDEX('%' + @Pattern + '%', SUBSTRING(@InputString, @Next, @Len))
      IF COALESCE(@First, 0) = 0 --no match - return
      BEGIN
         SET @Result = @Result + 
            CASE --return NULL, just like REPLACE, if inputs are NULL
               WHEN  @InputString IS NULL
                     OR @Pattern IS NULL
                     OR @ReplaceText IS NULL THEN NULL
               ELSE SUBSTRING(@InputString, @Next, @Len)
            END
         BREAK
      END
      ELSE
      BEGIN
         -- Concatenate characters before the match to the result
         SET @Result = @Result + SUBSTRING(@InputString, @Next, @First - 1)
         SET @Next = @Next + @First - 1

         SET @EndPattern = 1
         -- Find start of end pattern range
         WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) = 0
            SET @EndPattern = @EndPattern + 1
         -- Find end of pattern range
         WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) > 0
               AND @Len >= (@Next + @EndPattern - 1)
            SET @EndPattern = @EndPattern + 1

         --Either at the end of the pattern or @Next + @EndPattern = @Len
         SET @Result = @Result + @ReplaceText
         SET @Next = @Next + @EndPattern - 1
      END
   END
   RETURN(@Result)
END

Then, make use of the UDF:

UPDATE ContentTable SET ContentColumn=dbo.PatternReplace('<DIV>%</DIV>', '')

M.Ali · Accepted Answer · 2016-05-09 21:48:03Z

Maybe a cursor like this ....

Declare @ColumnName sysname , @TableName sysname 
        ,@Schema sysname , @Sql Nvarchar(MAX);

Declare Cur CURSOR FOR 
Select c.name , t.name , s.name
from sys.columns c 
inner join sys.tables  t on c.object_id = t.object_id
inner join sys.types   p on p.user_type_id = c.user_type_id
inner join sys.schemas s on t.schema_id = s.schema_id
where t.is_ms_shipped = 0
and p.name in ('varchar','nvarchar', 'char', 'nchar')

OPEN Cur 

 FETCH NEXT FROM Cur INTO @ColumnName , @TableName , @Schema

WHILE @@FETCH_STATUS = 0 
BEGIN 
    SET @Sql = N'UPDATE '+ QUOTENAME(@Schema) +'.' + QUOTENAME(@TableName) 
             + N' SET ' + QUOTENAME(@ColumnName) + N' = '
             + N'LEFT(' + QUOTENAME(@ColumnName) + N', CHARINDEX(''<div>'', 
                          ' + QUOTENAME(@ColumnName) + N') - 1) 
                      + SUBSTRING(' + QUOTENAME(@ColumnName) + N', 
                      CHARINDEX(''</div>'', ' + QUOTENAME(@ColumnName) + N') + 6
                      , LEN(' + QUOTENAME(@ColumnName) + N'))
                Where ' + QUOTENAME(@ColumnName) + N' IS NOT NULL 
                AND LEN(' + QUOTENAME(@ColumnName) + N') > 6' 

     Exec sp_executesql @Sql 

     FETCH NEXT FROM Cur INTO @ColumnName , @TableName , @Schema
END 

CLOSE Cur 
DEALLOCATE Cur

Note

The cursor loops through all the tables and select columns with varchar , nvarchar, char ,nchar data types, then it creates an update statement to strip out any string between <div> </div> tags thats if it exists, else column is left as it is.

Warning

Test the script before you actually run it against the live database.

Collectives™ on Stack Overflow

SQL replace query for variable content

3 Answers 3

13 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

13 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related