0

I'm writing a sever-side script that replaces all URLs in a body of text with <a/> tag versions (so they can be clicked).

How can I make sure that any urls I convert do not contain any XSS style javascripts in them?

I'm currently filtering for "javascript:" in the string, but I feel that is likely not sufficient..

4
  • There's also onclick, onmouseout, onwhatever. Commented Jan 28, 2011 at 19:37
  • That shouldn't be an issue since I'm doing <a href="\1">\1</a> neither string can have " or <> Commented Jan 28, 2011 at 19:41
  • What's the server-side language you're using? There are lots of opensource XSS filters available. Commented Jan 28, 2011 at 19:46
  • PHP, I can always htmlspecialchars() the content, but I'm not sure that will suffice... Commented Jan 28, 2011 at 19:48

3 Answers 3

1

Any modern server-side language has some sort of implementation of Markdown or other lightweight markup languages. Those markup languages replace URLs with a clickable link.

Unless you have a lot of time to spend to research about this topic and implement this script, I'd suggest to spot the best Markdown implementation in your language and dig its code, or simply use it in your code.

Markdown is usually shipped as a library; some of them let you configure what they have to process and what they have to ignore – in your case you want to process URL, ignoring any other element.

Here's an (incomplete) list of solid Markdown implementations for different languages:

Sign up to request clarification or add additional context in comments.

2 Comments

I've opted not to do any sort of markdown, since it contains way more than I need and I don't want people to start using alternative syntax since I'm implementing this in a messaging system I'm making, it's just to allow urls to be clickable, that's it. I've already got the regex set up to identify urls, I just need a means to filter any potential JS out of the links..
If you cannot configure the Markdown to make URLs clickable and ignore other tags, I'd locate the code related to making URLs clickable, extract it from the Markdown implementation and use it.
0

You need to attribute-encode the URLs.
You should also make sure that they start with http:// or https://.

1 Comment

http://javascript: works in some browsers I believe... I also want to allow "www.example.com" to become linkable....
0

This was taken from Kohana framework, related to XSS filtering. Not a complete answer, but might get you on the way.

// Remove javascript: and vbscript: protocols
$str = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $str);
$str = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $str);
$str = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $str);

// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$str = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#is', '$1>', $str);
$str = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#is', '$1>', $str);
$str = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#ius', '$1>', $str);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.