0

I'm creating a "HTML editor" for a webpage of mine. At the moment, I only want the editor to allow entry of HTML and CSS elements and not Javascript (or Jquery for that matter).

I'm trying to find a way that disables the use of <script> or <script type="text/javascript"> </script> using PHP. However the current way outputs a messy result!

        $content_in_before = str_replace('<script','',$content_in_before);
        $content_in_before = str_replace('script>','',$content_in_before);

It's also not very well coded!

Is there a more bulletproof way of coding this, stopping all type of Javascript from being entered into this form? (While still allowing CSS and HTML)?

Thanks in advance!

11
  • strip_tags() perhaps? Commented Dec 20, 2014 at 3:22
  • Regular Expressions Commented Dec 20, 2014 at 3:25
  • 2
    This will not solve your problem, you can still add javascript via event attributes like onLoad, onClick, etc. If you really want to make sure no scripts get uploaded, use another language like markdown or use a proven library. Commented Dec 20, 2014 at 3:27
  • The use of attributes such as onLoad and onClick do not worry me, as there is a minimal amount of things you can do with them. <script> tags do however, because functions can be written through them , for example. Commented Dec 20, 2014 at 3:30
  • Anything you can do in a script tag, you can do in an onLoad attribute... Commented Dec 20, 2014 at 3:32

4 Answers 4

2

I'd recommend using a sanitization library, like HTML Purifier, since just stripping <script> tags isn't enough to prevent XSS attacks, since JS can be automatically executed using attributes like onLoad, onMouseOver, onUnload, etc.

To remove tags, and allow some, you can use PHP's strip_tags() function, but it doesn't strip the attributes, hence my recommendation for a HTML sanitization library. If you're able to run it, perhaps one of the best choices is Google's Caja library, albeit it doesn't work in shared hosting environments since it's written in Java, but it can be hosted on Google's AppEngine.

Also, simple regex solutions aren't always reliable, since even malformed tags can still be parsed. For example, <script > wouldn't be caught by simple regex detection of normal script tags unless it's looking for spaces after the tag name. It's possible to check for this, but using an established library would save you time, and would give you the added bonus of a battle-tested library.

Example: Script Tags with Spaces producing an alert

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! I decided to implement HTML Purifier onto the page and it works like a charm!
This is the right way to do it. Even though one could write a regex to replace the attributes, but that would require too much work, and regex isn't the right tool.
1

You could you a regexplike this

echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $var);

source: https://stackoverflow.com/a/1886842/2046700

or as stated us a libary to do this for you such as: http://htmlpurifier.org/

another possible example:

<?php    
   $javascript = '/<script[^>]*?javascript{1}[^>]*?>.*?<\/script>/si'; 
   $noscript = '';    
   $document = file_get_contents('test.html'); 
   echo preg_replace($javascript, $noscript, $document);  
?>

Comments

1

Whitelist tags you permit, and attributes you permit, then remove everything else. You can use DOMDocument for this.

I wrote this piece of code once but never had anyone else review it

function legal_html($str, $tags='<a><b><br><i><span><table><tbody><tr><td><thead><th><img>', $attribArray=false) {
    if ($attribArray===false) {
        $attribs = array('id','class','src','href','alt');
    } else {
        $attribs = $attribArray;
    }
    $stripped = strip_tags($str,$tags);
    $dom = new DOMDocument();
    @$dom->loadHTML('<div>'.$stripped.'</div>');
    foreach ($dom->getElementsByTagName('*') as $node) {
        for ($i = $node->attributes->length -1; $i >= 0; $i--) {
            $attrib = $node->attributes->item($i);
            if (!in_array($attrib->name,$attribs)) $node->removeAttributeNode($attrib);
        }
    }
    $stripped = $dom->saveHTML();
    $start = strpos($stripped,'<div>')+5;
    $end = strrpos($stripped,'</div>');
    $stripped = trim(substr($stripped,$start,$end-$start));
    return $stripped;
}

Comments

-3

You can use something likes this-

$content=$_POST['textbox'];

if(strpos($content,'<script>')!==false){
//show error;
}
else{
//proceed with work;
}

1 Comment

Although this is a good idea, I'd prefer the other text to still show; and somehow have the <script> removed by itself.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.