0

This is the expression I have come up with to remove comments from my source code, it removes all comments except browser specific comments.

/<\!--(?!\[if).*?-->/s

I do not 100% understand regex but I managed to "Frankenstein" this together from various expressions I have found while searching through Stack. This is how I am seeing the breakdown of this expression and I would love if someone could help me understand it further.

/<\!--

Is used to search for the part I want to replace with the slashes to escape the ! since it is part of the expression?

(?!\[if)

Says if does not contain [if block?

 .*?

Wildcard so saying no matter what is in between?

-->

The end of the bit I want the expression to find.

/s

Tells the expression to run even if there is white space between the expression?

I don't want to just use a code because it works for what I need, I want to actually understand what I am using and learn how to use it better in the future.

This expression works great, but I do need help taking it one step further. In my source I have code snippets within script tags I.E.

<script type="text/javascript">
  <!--
    // Main vBulletin Javascript Initialization
    vBulletin_init();
  //-->
</script>

Is there a way I can adapt my current expression to exclude <!-- comments withing javascript?

An example of what I am trying to achieve can be seen HERE

1 Answer 1

2

You reached the point where you realized that regex patterns are bad to deal with programming languages. Here, the tool to use is obviously an HTML parser. Example using the DOM and XPath:

$dom = new DOMDocument;
$dom->loadHTML($html);
$xp = new DOMXPath($dom);

$comments = $xp->query('//*[not(local-name()="script")]/comment()[not(contains(.,"[if"))]');

foreach($comments as $comment) {
    $comment->parentNode->removeChild($comment);
}

echo $dom->saveHTML();
Sign up to request clarification or add additional context in comments.

4 Comments

Lol, I'll give that a shot. I have never worked with DOM. Also if dom is the solution, when would it be acceptable to use regex rather than dom?
@DrCustUmz anything involving tags with unpredictable contents is generally not something you want to use regex for. Patterns become too complicated and unreliable.
Thank you emsimpson, I will try to impliment this dom example and report back. Thank you guys.
I have implemented this method successfully The only oddity I have found using this method is for some reason it is turning ' (apostrophe) into &acirc;&#128;&#153; within the script tag. the ' i had in it was added to a variable to show a persons ownership and not really needed so it wasn't an issue. Albeit I found it odd that it did that. I will not my page is UTF-8 encoding.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.