RegExp to strip HTML comments [duplicate]

Question

Looking for a regexp sequence of matches and replaces (preferably PHP but doesn't matter) to change this (the start and end is just random text that needs to be preserved).

IN:

fkdshfks khh fdsfsk 
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
    <!--eg1-->
    <div class="autoit" style="font-family:monospace;">
        <span class="kw3">msgbox</span>
    </div>
    <!--gc2-->
    <!--bXNnYm94-->
    <!--egc2-->
    <!--g2-->
</div>
<!--eg2-->
fdsfdskh

to this OUT:

fkdshfks khh fdsfsk 
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
    <div class="autoit" style="font-family:monospace;">
        <span class="kw3">msgbox</span>
    </div>
</div>
fdsfdskh

Thanks.

Paul Tomblin · Accepted Answer · 2010-03-15 23:19:22Z

103

Are you just trying to remove the comments? How about

s/<!--[^>]*-->//g

or the slightly better (suggested by the questioner himself):

<!--(.*?)-->

But remember, HTML is not regular, so using regular expressions to parse it will lead you into a world of hurt when somebody throws bizarre edge cases at it.

edited Mar 15, 2010 at 23:19

answered Jul 5, 2009 at 20:24

Paul Tomblin

184k59 gold badges324 silver badges412 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

James Brooks Over a year ago

No, I want to make the IN become the OUT, exactly how it is.

Paul Tomblin Over a year ago

I don't see any differences other than the comments. Are you going to make us guess?

Toon Krijthe Over a year ago

@James Brooks, the only difference between the IN and the OUT is that IN has comments and OUT not. So what else do you want but stripping the comments?

Eugen Mihailescu Over a year ago

This does not work when you have an > inside your comment. A better version would be /(?=/

Paul Tomblin Over a year ago

@EugenMihailescu well, when you come right down to it, using regexs on a non-regular language is a mistake anyway.

|

Benoit Villière · Accepted Answer · 2010-07-13 09:26:12Z

52

preg_replace('/<!--(.*)-->/Uis', '', $html)

This PHP code will remove all html comment tags from the $html string.

answered Jul 13, 2010 at 9:26

Benoit Villière

5911 gold badge6 silver badges3 bronze badges

3 Comments

Brian Layman Over a year ago

Details on the modifiers: U makes it Ungreedy and so goes only to the first close comment. i makes it case insensitive (Not sure why this is needed here) s means that newlines are allowed inside the comments too.

Nathan J.B. Over a year ago

+1 for Ungreedy specifically being called separately from the regex making it easier to understand/read.

KTB Over a year ago

We have been using this script for a couple for years now and it worked fine. But please consider that it ignores the fact that the String 

elboletaire · Accepted Answer · 2017-05-29 14:08:02Z

42

A better version would be:

(?=<!--)([\s\S]*?)-->

It matches html comments like these:

<!--
multi line html comment
-->

or

<!-- single line html comment -->

and what is most important it matches comments like this (the other regex shown by others do not cover this situation):

<!-- this is my blog: <mynixworld.inf> -->

Note

Although syntactically the one below is a html comment your browser might parse it somehow differently and thus it might have a special meaning. Stripping such strings might break your code.

<!--[if !(IE 8) ]><!-->

edited May 29, 2017 at 14:08

elboletaire

5,4272 gold badges38 silver badges48 bronze badges

answered Mar 22, 2015 at 12:20

Eugen Mihailescu

3,7512 gold badges34 silver badges31 bronze badges

7 Comments

andy_roddam Over a year ago

A great regex here. However, it might be more useful to change to the following so that the comment can be completely stripped out. (?=)

Ismail Sahin Over a year ago

this is the best one

Jarod Over a year ago

Why couldn't it just be () ?

Fabrizio Sabato Over a year ago

Honestly I think the solution posted in this comment @andy_roddman should be marked as the only solution.

Eugen Mihailescu Over a year ago

@FabrizioSabato: the Andy's solution is a verbatim copy of the solution I suggested, with a very small diff which does not improve at all the result. Test: regex101.com/r/lUYBNb/1

|

Pierre Wahlgren · Accepted Answer · 2012-03-14 21:59:30Z

17

Do not forget to consider conditional comments, as

<!--(.*?)-->

will remove them. Try this instead:

<!--[^\[](.*?)-->

This will also remove downlevel-revealed conditional comments, though.

EDIT:

This won't remove downlevel-revealed or downlevel-hidden comments.

<!--(?!<!)[^\[>].*?-->

edited Mar 14, 2012 at 21:59

answered Apr 15, 2011 at 15:58

Pierre Wahlgren

8757 silver badges15 bronze badges

Comments

James Brooks · Accepted Answer · 2009-07-05 20:31:52Z

9

Ah I've done it,

<!--(.*?)-->

answered Jul 5, 2009 at 20:31

James Brooks

1,3095 gold badges18 silver badges28 bronze badges

7 Comments

Phil Miller Over a year ago

@Paul: It's actually better, because > not preceded by -- doesn't end an HTML comment. The important bit that changed was using a non-greedy, or shortest, match.

Paul Tomblin Over a year ago

Oh right, I forgot that .*? was non-greedy. Still kind of cheesy to ask a horribly vague question, complain when somebody answers with a regex that strips the comments, and then post your own "strip the comment" answer.

cube Over a year ago

Why all this downvoting? The question was badly written, but this answer seems fine.

Konrad Rudolph Over a year ago

@cube: because an equivalent answer had already been posted and was rejected by the asker for no good reason.

Paul Tomblin Over a year ago

I'm giving him an upvote because in spite of his stubborn refusal to explain what he wanted and his refusal to explain what he didn't like about my answer, when you come right down to it, his answer is a tiny bit better than mine.

|

davlem · Accepted Answer · 2020-02-11 12:18:05Z

3

With next:

/( )*<!--((.*)|[^<]*|[^!]*|[^-]*|[^>]*)-->\n*/g

Can remove multiline comments using test string:

fkdshfks khh fdsfsk 
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
    <div class='geshimain'>
    <!--eg1-->
    <div class="autoit" style="font-family:monospace;">
        <span class="kw3">msgbox</span>
    </div>
    <!--gc2-->
    <!--bXNnYm94-->
    <!--egc2-->
    <!--g2-->
</div>
<!--eg2-->
fdsfdskh

<!-- --
> test
- -->

<!-- --
<- test <
>
- -->

<!--
test !<
- <!--
-->

<script type="text/javascript">//<![CDATA[
    var xxx = 'a';   
    //]]></script>

ok

answered Feb 11, 2020 at 12:18

davlem

414 bronze badges

1 Comment

Jessy Over a year ago

This helped me clean out a bunch of HTML comments added in by Angular. Fantastic 🚀

Hadrian · Accepted Answer · 2010-12-03 01:01:45Z

2

Try the following if your comments contain line breaks:

/<!--(.|\n)*?-->/g

answered Dec 3, 2010 at 1:01

Hadrian

371 bronze badge

3 Comments

Alan Moore Over a year ago

That's not a good way to deal with the newline issue. In PHP you can use the s modifier to allow . to match newlines. In JavaScript doesn't have an s mode, but the preferred workaround is to replace the . with [\S\s]. It's much more efficient than (.|\n). By the way, the OP was using PHP, which doesn't have a g modifier.

Snapey Over a year ago

preg_match_all is the way to indicate global (g modifier)

Wiktor Stribiżew Over a year ago

This must be removed as it is a dangerous and very resource consuming pattern.

Toshinou Kyouko · Accepted Answer · 2014-02-05 10:29:32Z

2

<!--([\s\S]*?)-->

Works in javascript and VBScript also as "." doesn't match line breaks in all languages

answered Feb 5, 2014 at 10:29

Toshinou Kyouko

3349 silver badges21 bronze badges

Comments

Alexandr Kondrashov · Accepted Answer · 2015-06-04 20:36:33Z

2

Here is my attempt:

<!--(?!<!)[^\[>][\s\S]*?-->

This will also remove multi line comments and won't remove downlevel-revealed or downlevel-hidden comments.

answered Jun 4, 2015 at 20:36

Alexandr Kondrashov

493 bronze badges

Comments

Clinton · Accepted Answer · 2022-08-04 08:33:34Z

2

I know that this is quite an old post, but I felt that it would be useful to add to this post in case anyone wants an easy to implement PHP function that directly answers the original question.

/**
 * Strip all the html comments from $text
 *
 * @param $text - text to modify
 * @param string $new replacement string
 * @return array|string|string[]|null
 */
function strip_html_comments($text, $new=''){
    $search = array ("|<!--[\s\S]*?-->|si");
    $replace = array ($new);
    return preg_replace($search, $replace, $text);
}

edited Aug 4, 2022 at 8:33

answered Mar 11, 2022 at 10:26

Clinton

1,2061 gold badge17 silver badges24 bronze badges

Comments

TomSawyer · Accepted Answer · 2012-02-25 12:04:03Z

1

these code is also remove javascript code. that's too bad :|

here's the example javascript code will be remove with this code:

<script type="text/javascript"><!--
    var xxx = 'a';
    //-->
    </script>

answered Feb 25, 2012 at 12:04

TomSawyer

3,8396 gold badges48 silver badges88 bronze badges

3 Comments

totallyNotLizards Over a year ago

but... why would you wrap JS in html comment tags?

lededje Over a year ago

Very, very, very, very old browsers don't run javascript and instead print it on the page. The comment tags prevent the script from being printed.

Abdullah Over a year ago

you can use this /(?:[^>]\s|^)/g jsfiddle.net/3vkrrt8e

TurkiM · Accepted Answer · 2014-07-11 10:07:47Z

1

function remove_html_comments($html) {
   $expr = '/<!--[\s\S]*?-->/';
   $func = 'rhc';
   $html = preg_replace_callback($expr, $func, $html);
   return $html;
}

function rhc($search) {
   list($l) = $search;
   if (mb_eregi("\[if",$l) || mb_eregi("\[endif",$l) )  {
      return $l;
   }
}

answered Jul 11, 2014 at 10:07

TurkiM

111 bronze badge

Comments

Kxmode · Accepted Answer · 2023-01-22 06:53:41Z

1

You can achieve this with modern JavaScript.

function RemoveHtmlComments() {
    let children = document.body.childNodes;
    for (let child in children) {
        if (children[child].nodeType === Node.COMMENT_NODE) children[child].remove();
    }
}

It should be safer than RegEx.

answered Jan 22, 2023 at 6:53

Kxmode

3701 gold badge5 silver badges11 bronze badges

Comments

Mister X · Accepted Answer · 2017-01-29 00:15:27Z

0

// Remove multiline comment
    $mlcomment = '/\/\*(?!-)[\x00-\xff]*?\*\//';
    $code = preg_replace ($mlcomment, "", $code);
// Remove single line comment
    $slcomment = '/[^:]\/\/.*/';
    $code = preg_replace ($slcomment, "", $code);
// Remove extra spaces
    $extra_space = '/\s+/';
    $code = preg_replace ($extra_space, " ", $code);
// Remove spaces that can be removed
    $removable_space = '/\s?([\{\};\=\(\)\\\/\+\*-])\s?/';
    $code = preg_replace ('/\s?([\{\};\=\(\)\/\+\*-])\s?/', "\\1", $code);

answered Jan 29, 2017 at 0:15

Mister X

113 bronze badges

Comments

ThisIsWilliam · Accepted Answer · 2020-11-24 11:48:55Z

0

If you just want the text or text with specific tags you can handle this with PHP strip_tags it also delete HTML comment and you can save HTML tags you need like this:

$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text, ['p', 'a']);

the output will be:

<p>Test paragraph.</p> <a href="#fragment">Other text</a>

I hope it helps somebody!

answered Nov 24, 2020 at 11:48

ThisIsWilliam

1,14511 silver badges11 bronze badges

Collectives™ on Stack Overflow

RegExp to strip HTML comments [duplicate]

15 Answers 15

10 Comments

3 Comments

7 Comments

Comments

7 Comments

1 Comment

3 Comments

Comments

Comments

Comments

3 Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

15 Answers 15

10 Comments

3 Comments

7 Comments

Comments

7 Comments

1 Comment

3 Comments

Comments

Comments

Comments

3 Comments

Comments

Comments

Comments

Comments

Linked

Related