0

I would like to remove all event attributes (e.g. from all events based on Event reference list).

Is there a function in PHP's DOMDocument class that recognizes event attributes?

I tried using RegEx but it has gotten complicated with single quotes and double quotes:

preg_replace('/on*[a-z]+=".*?"/i', '', $html); // Doesn't match onclick="alert(\"hello\");"

I tried an external library called HTMLPurifier but it has no option to remove all event attributes.

Any idea what directions to take or a simple solution?

10
  • @Dagon I have a form that allows user to display their HTML of their article, but restrict them from using zero javascript: no script tags, no link tags and DEFINITELY no event attribute tags. Commented Jun 24, 2013 at 1:34
  • 2
    Is this what you mean Commented Jun 24, 2013 at 1:34
  • 1
    Whitelisting is safer than blacklisting. Commented Jun 24, 2013 at 1:38
  • 1
    With your current code, I can do this: onClick='omghaxorz();' Commented Jun 24, 2013 at 1:40
  • 1
    @user1105430: Please use the DOM for that. Commented Jun 24, 2013 at 1:56

4 Answers 4

1

If you want truly secure code, a whitelist approach ('only allow these things: ...') is typically sturdier than a blacklist approach ('don't allow these things: ...').

You mentioned HTML Purifier and that "it has no option to remove all event attributes".

That's... technically correct, in that you can't tell it to remove event attributes. The reason is the selling point, though: It does that automatically. The option that's "missing" is an ability to configure HTML Purifier to allow event attributes. That's deliberately amiss. HTML Purifier (as the name suggests) has a strong security focus.

There are some 'unsafe HTML' aspects that you can allow using the HTML Purifier configuration (the default configuration is deliberately picky), but the event attributes are not amongst those. (Well, you could teach HTML Purifier to accept them if you jumped through hoops, but it would take a lot of effort.)

I'd recommend giving it another try if you want to accept user HTML. It's a fairly established tool that's been tested by a lot of people.

There are some very tricky ways to break HTML and inject JavaScript. For example, did you know that you can inject JavaScript using the src or href attribute? Did you know you can, in some browsers, inject JavaScript using the style tag? Take a look at this XSS cheatsheet. It might give you a rough idea what you're up against, and why whitelisting is generally considered more efficient.

Either way, good luck!

Sign up to request clarification or add additional context in comments.

Comments

1
function filterText($value)
{
  if(!$value) return $value;

  return escapeJsEvent(removeScriptTag($value));

}

function escapeJsEvent($value){
  return preg_replace('/(<.+?)(?<=\s)on[a-z]+\s*=\s*(?:([\'"])(?!\2).+?\2|(?:\S+?\(.*?\)(?=[\s>])))(.*?>)/i', "$1 $3", $value);        
}

function removeScriptTag($text)
{
   $search = array("'<script[^>]*?>.*?</script>'si",
         "'<iframe[^>]*?>.*?</iframe>'si");

  $replace = array('','');

  $text = preg_replace($search, $replace, $text);

  return preg_replace_callback("'&#(\d+);'", function ($m) {
    return chr($m[1]);
  }, $text);
}


echo filterText('<img src=1 href=1 onerror="javascript:alert(1)"></img>');

Comments

0

A way to make it with the DOM.

The following code seeks and removes attributes whose name starts by "on" in all html tags.
($html stands for the html code)

$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

$onAttributes = $xpath->query("//*/@*[starts-with(name(), 'on')]");
foreach ($onAttributes as $onAttribute) {
    $onAttribute->ownerElement->removeAttributeNode($onAttribute);
}

$body = $xpath->query('body')->item(0);
$result = substr($doc->saveHTML($body),6,-7);

Comments

0

Load the HTML document, iterate over all elements and then over all attributes of them (nested), remove attributes if they start with on:

$doc = new DOMDocument();
$doc->loadHTML($html);

foreach ($doc->getElementsByTagname('*') as $element) 
{
    foreach (iterator_to_array($element->attributes) as $name => $attribute)
    {
        if (substr_compare($name, 'on', 0, 2, TRUE) === 0)
        {
            $element->removeAttribute($name);
        }
    }
}

You also might want to scrape the list of known attribute names and give a warning if an unknown one is found (or have a whitelist of attributes you allow). Hope this helps, the code is quickly typed to might have some little errors.

1 Comment

any reason for the iterator_to_array() ? i mean.. foreach is meant for iterators.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.