52

Looking for a regexp sequence of matches and replaces (preferably PHP but doesn't matter) to change this (the start and end is just random text that needs to be preserved).

IN:

fkdshfks khh fdsfsk 
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
    <!--eg1-->
    <div class="autoit" style="font-family:monospace;">
        <span class="kw3">msgbox</span>
    </div>
    <!--gc2-->
    <!--bXNnYm94-->
    <!--egc2-->
    <!--g2-->
</div>
<!--eg2-->
fdsfdskh

to this OUT:

fkdshfks khh fdsfsk 
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
    <div class="autoit" style="font-family:monospace;">
        <span class="kw3">msgbox</span>
    </div>
</div>
fdsfdskh

Thanks.

0

15 Answers 15

103

Are you just trying to remove the comments? How about

s/<!--[^>]*-->//g

or the slightly better (suggested by the questioner himself):

<!--(.*?)-->

But remember, HTML is not regular, so using regular expressions to parse it will lead you into a world of hurt when somebody throws bizarre edge cases at it.

Sign up to request clarification or add additional context in comments.

10 Comments

No, I want to make the IN become the OUT, exactly how it is.
I don't see any differences other than the comments. Are you going to make us guess?
@James Brooks, the only difference between the IN and the OUT is that IN has comments and OUT not. So what else do you want but stripping the comments?
This does not work when you have an > inside your comment. A better version would be /(?=<!--)([\s\S]*?)-->/
@EugenMihailescu well, when you come right down to it, using regexs on a non-regular language is a mistake anyway.
|
52
preg_replace('/<!--(.*)-->/Uis', '', $html)

This PHP code will remove all html comment tags from the $html string.

3 Comments

Details on the modifiers: U makes it Ungreedy and so goes only to the first close comment. i makes it case insensitive (Not sure why this is needed here) s means that newlines are allowed inside the comments too.
+1 for Ungreedy specifically being called separately from the regex making it easier to understand/read.
We have been using this script for a couple for years now and it worked fine. But please consider that it ignores the fact that the String <!--can occure inside a Javascript tag and be valid there. Example: <script> //... <!--</script> <script type="text/javascript" src="http://.."></script> </head> <!-- body -->
42

A better version would be:

(?=<!--)([\s\S]*?)-->

It matches html comments like these:

<!--
multi line html comment
-->

or

<!-- single line html comment -->

and what is most important it matches comments like this (the other regex shown by others do not cover this situation):

<!-- this is my blog: <mynixworld.inf> -->

Note

Although syntactically the one below is a html comment your browser might parse it somehow differently and thus it might have a special meaning. Stripping such strings might break your code.

<!--[if !(IE 8) ]><!-->

7 Comments

A great regex here. However, it might be more useful to change to the following so that the comment can be completely stripped out. (?=<!--)([\s\S]*?-->)
this is the best one
Why couldn't it just be (<!--[\s\S]*?-->) ?
Honestly I think the solution posted in this comment @andy_roddman should be marked as the only solution.
@FabrizioSabato: the Andy's solution is a verbatim copy of the solution I suggested, with a very small diff which does not improve at all the result. Test: regex101.com/r/lUYBNb/1
|
17

Do not forget to consider conditional comments, as

<!--(.*?)-->

will remove them. Try this instead:

<!--[^\[](.*?)-->

This will also remove downlevel-revealed conditional comments, though.

EDIT:

This won't remove downlevel-revealed or downlevel-hidden comments.

<!--(?!<!)[^\[>].*?-->

Comments

9

Ah I've done it,

<!--(.*?)-->

7 Comments

@Paul: It's actually better, because > not preceded by -- doesn't end an HTML comment. The important bit that changed was using a non-greedy, or shortest, match.
Oh right, I forgot that .*? was non-greedy. Still kind of cheesy to ask a horribly vague question, complain when somebody answers with a regex that strips the comments, and then post your own "strip the comment" answer.
Why all this downvoting? The question was badly written, but this answer seems fine.
@cube: because an equivalent answer had already been posted and was rejected by the asker for no good reason.
I'm giving him an upvote because in spite of his stubborn refusal to explain what he wanted and his refusal to explain what he didn't like about my answer, when you come right down to it, his answer is a tiny bit better than mine.
|
3

With next:

/( )*<!--((.*)|[^<]*|[^!]*|[^-]*|[^>]*)-->\n*/g

Can remove multiline comments using test string:

fkdshfks khh fdsfsk 
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
    <div class='geshimain'>
    <!--eg1-->
    <div class="autoit" style="font-family:monospace;">
        <span class="kw3">msgbox</span>
    </div>
    <!--gc2-->
    <!--bXNnYm94-->
    <!--egc2-->
    <!--g2-->
</div>
<!--eg2-->
fdsfdskh

<!-- --
> test
- -->

<!-- --
<- test <
>
- -->

<!--
test !<
- <!--
-->

<script type="text/javascript">//<![CDATA[
    var xxx = 'a';   
    //]]></script>

ok

1 Comment

This helped me clean out a bunch of HTML comments added in by Angular. Fantastic 🚀
2

Try the following if your comments contain line breaks:

/<!--(.|\n)*?-->/g

3 Comments

That's not a good way to deal with the newline issue. In PHP you can use the s modifier to allow . to match newlines. In JavaScript doesn't have an s mode, but the preferred workaround is to replace the . with [\S\s]. It's much more efficient than (.|\n). By the way, the OP was using PHP, which doesn't have a g modifier.
preg_match_all is the way to indicate global (g modifier)
This must be removed as it is a dangerous and very resource consuming pattern.
2
<!--([\s\S]*?)-->

Works in javascript and VBScript also as "." doesn't match line breaks in all languages

Comments

2

Here is my attempt:

<!--(?!<!)[^\[>][\s\S]*?-->

This will also remove multi line comments and won't remove downlevel-revealed or downlevel-hidden comments.

Comments

2

I know that this is quite an old post, but I felt that it would be useful to add to this post in case anyone wants an easy to implement PHP function that directly answers the original question.

/**
 * Strip all the html comments from $text
 *
 * @param $text - text to modify
 * @param string $new replacement string
 * @return array|string|string[]|null
 */
function strip_html_comments($text, $new=''){
    $search = array ("|<!--[\s\S]*?-->|si");
    $replace = array ($new);
    return preg_replace($search, $replace, $text);
}

Comments

1

these code is also remove javascript code. that's too bad :|

here's the example javascript code will be remove with this code:

<script type="text/javascript"><!--
    var xxx = 'a';
    //-->
    </script>

3 Comments

but... why would you wrap JS in html comment tags?
Very, very, very, very old browsers don't run javascript and instead print it on the page. The comment tags prevent the script from being printed.
you can use this /(?:[^>]\s|^)<!--(?!<!)[^\[>][\s\S]*?-->/g jsfiddle.net/3vkrrt8e
1
function remove_html_comments($html) {
   $expr = '/<!--[\s\S]*?-->/';
   $func = 'rhc';
   $html = preg_replace_callback($expr, $func, $html);
   return $html;
}

function rhc($search) {
   list($l) = $search;
   if (mb_eregi("\[if",$l) || mb_eregi("\[endif",$l) )  {
      return $l;
   }
}

Comments

1

You can achieve this with modern JavaScript.

function RemoveHtmlComments() {
    let children = document.body.childNodes;
    for (let child in children) {
        if (children[child].nodeType === Node.COMMENT_NODE) children[child].remove();
    }
}

It should be safer than RegEx.

Comments

0
// Remove multiline comment
    $mlcomment = '/\/\*(?!-)[\x00-\xff]*?\*\//';
    $code = preg_replace ($mlcomment, "", $code);
// Remove single line comment
    $slcomment = '/[^:]\/\/.*/';
    $code = preg_replace ($slcomment, "", $code);
// Remove extra spaces
    $extra_space = '/\s+/';
    $code = preg_replace ($extra_space, " ", $code);
// Remove spaces that can be removed
    $removable_space = '/\s?([\{\};\=\(\)\\\/\+\*-])\s?/';
    $code = preg_replace ('/\s?([\{\};\=\(\)\/\+\*-])\s?/', "\\1", $code);

Comments

0

If you just want the text or text with specific tags you can handle this with PHP strip_tags it also delete HTML comment and you can save HTML tags you need like this:

$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text, ['p', 'a']);

the output will be:

<p>Test paragraph.</p> <a href="#fragment">Other text</a>

I hope it helps somebody!

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.