2

I have a rich text area where the user can type something. I am trying to prevent JavaScript injection using the following regex:

return input == null ? null : input.replaceAll("(?i)<script.*?>.*?</script.*?>", "") // case 1
            .replaceAll("(?i)<.*?javascript:.*?>.*?</.*?>", "") // case 2
            .replaceAll("(?i)<.*?\\s+on.*?>.*?</.*?>", ""); // case 3

Above, input is the text from the rich text area and I am using this regex to avoid possible JavaScript injections.

The problem is case 3. If the user's text contains "on", all the text before "on" gets removed.

How can I make the last case more rigid and avoid the above problem?

1 Answer 1

1

If you want to remove "on" and everything up to the end of the tag, you can use this: .replaceAll("(?i)(<.?\s+)on.?(>.*?)", "$1$2");

This renders "ACD" as "ACD". But be aware that if someone puts a ">" character inside the script, it will mess up the regex...

EDIT: the moral of my remark is that I would not recommend a custom parsing to remove javascript code. I suggest you get yourself acquainted with the answer to the following question: Java: Best way to remove Javascript from HTML and probably use Jsoup.clean (if it is possible in your environment).

Sign up to request clarification or add additional context in comments.

2 Comments

JSoup removes the attributes from HTML. Does it also work with just plain text. Example: it doenst work on "I like this site because <script>alert('Injected!');</script> teaches me a lot"
It does accept just text... But it might do some stuff that you don't want: it removed <body> tag completely (it should not be within the text) and it added a newline when I tried it with <p>. Did you think about escaping the html (including javascript) instead of removing it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.