1

I have prepared a white list of allowed styles and I want to remove all the styles out of the white list from HTML String

$allowed_styles = array('font-size','color','font-family','text-align','margin-left');
$html = 'xyz html';
$html_string = '<bdoy>' . $html . '<body>';
$dom = new DOMDocument();
$dom->loadHTML($html_string);
$elements = $dom->getElementsByTagName('body');
foreach($elements as $element) {

foreach($element->childNodes as $child) {

if($child->hasAttribute('style')) {

$style = strtolower(trim($child->getAttribute('style')));

//match and get only the CSS Property name
preg_match_all('/(?<names>[a-z\-]+):/', $style, $matches);

for($i=0;$i<sizeof($matches["names"]);$i++) {

  $style_property = $matches["names"][$i];

  // if the css-property is not in allowed styles array
  // then remove the whole style tag from this child

  if(!in_array($style_property,$allowed_styles)) {

   $child->removeAttribute('style');
   continue;

   }

}

    }
  }
}

$dom->saveHTML();
$html_output = $dom->getElementsByTagName('body');

I have tested so many html strings, it works fine every where. But When I tried to filter this html string

$html_string = ​'<div style="font-style: italic; text-align: center; 
background-color: red;">On The Contrary</div><span 
style="font-style: italic; background-color: rgb(244, 249, 255); 
font-size: 32px;"><b style="text-align: center; 
background-color: rgb(255, 255, 255);">This is USA</b></span>';

All other un allowed styles are removed from this string except this line

<b style="text-align: center; background-color: rgb(255, 255, 255);">

Can Some one tell me any other efficient and robust way to remove the styles other than the whitelist

2 Answers 2

2

Similar to Oleja solution, but this one removes only unallowed properties, not whole style attribute.

//$this->removeStylesheet($doc, ['color','font-weight']);

function removeStylesheet($tree, $allowed_styles) {
    if ($tree->nodeType != XML_TEXT_NODE) {
        if ($tree->hasAttribute('style')) {
            $style = strtolower(trim($tree->getAttribute('style')));
            preg_match_all('/(?<names>[a-z\-]+) *:(?<values>[^\'";]+)/', $style, $matches);
            $new_styles = array();
            for ($i=0; $i<sizeof($matches['names']); $i++) {
                if(in_array($matches['names'][$i], $allowed_styles)) {
                    $new_styles[] = $matches['names'][$i].':'.$matches['values'][$i];
                }
            }
            if ($new_styles)
                $tree->setAttribute('style', implode(';', $new_styles));
            else
                $tree->removeAttribute('style');
        }
        if ($tree->childNodes) {
            foreach ($tree->childNodes as $child) {
                $this->removeStylesheet($child, $allowed_styles);
            }
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

1

For this (and other nested) html you must use recursive function like this:

$html = 'your html';
$allowed_styles = array('font-size','color','font-family','text-align','margin-left');
$html_string = '<body>' . $html . '</body>';
$dom = new DOMDocument();
$dom->loadHTML($html_string);
$elements = $dom->getElementsByTagName('body');
foreach ($elements as $element)
    clearHtml($element, $allowed_styles);
$html_output = $dom->saveHTML(); 

function clearHtml($tree, $allowed_styles) {
    if ($tree->nodeType != XML_TEXT_NODE) {
        if ($tree->hasAttribute('style')) {
            $style = strtolower(trim($tree->getAttribute('style')));
            preg_match_all('/(?<names>[a-z\-]+):/', $style, $matches);
            for($i = 0; $i < sizeof($matches['names']); $i++) {
                $style_property = $matches['names'][$i];
                if(!in_array($style_property, $allowed_styles)) {
                    $tree->removeAttribute('style');
                    continue;
                }
            }
        }
        if ($tree->childNodes)
            foreach ($tree->childNodes as $child)
                clearHtml($child, $allowed_styles);
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.