0

I want to strip out all JavaScript from a small snippet (4-6 lines) of HTML, I've read on here before that its best not to use REGEX on HTML, so if anybody knows a better way, please advise.

So for example i have the following code:

<a href="go/to/my/link" onclick="fetchMeSomeData(this)">My Link</a>
<p onfocus="doSomethingAmazing();"></p>

Now in PHP i want to replace the on(what ever event it is) event with just an empty space.

2 Answers 2

2

Use the HTML Purifier library to strip things like JavaScript and plugins from the code. It's much better then a blacklist-based regex approach because it uses a full HTML parser and a whitelist to clean the HTML.

Sign up to request clarification or add additional context in comments.

Comments

1

I've build such regexp some time ago, looks a bit scary though :). Here is pure regexp, you might need to additionally mask special chars to match your language requirements.

(\son[a-z]+\s*=\s*"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)*"(?=[^<]*?>))|(\son[a-z]+\s*=\s*'[^'\\\r\n]*(?:\\.[^'\\\r\n]*)*'(?=[^<]*?>))

Here is masked version (according to java standards), that you should be able to use as a string.

(\\son[a-z]+\\s*=\\s*\"[^\"\\\\\\r\\n]*(?:\\\\.[^\"\\\\\\r\\n]*)*\"(?=[^<]*?>))|(\\son[a-z]+\\s*=\\s*'[^'\\\\\\r\\n]*(?:\\\\.[^'\\\\\\r\\n]*)*'(?=[^<]*?>))

It looks only inside tags and takes into consideration masked quotes inside events. I'm sure it is not 100% bullet proof though.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.