3

I want to capture html comments with the exception of a specific comment i.e,

 <!-- end-readmore-item --> 

At the moment, I can successfully capture all of the HTML comments using the regex below,

(?=<!--)([\s\S]*?)-->

To ignore the specified comment, I have tried the lookahead and lookbehind assertions but being new at the advanced level of Regex I am probably missing out on something.

So far, I have been able to devise the following regex using lookarounds,

^((?!<!-- end-readmore-item -->).)*$

I expect it to ignore the end-readmore-item comment and only capture other comments such as,

<!-- Testing-->

However, it does the job but also captures the regular HTML tags which I want to be ignored as well.

I have been using the following html code as a test case,

<div class="collapsible-item-body" data-defaulttext="Further text">Further 
text</div>
<!-- end-readmore-item --></div>
</div>
&nbsp;<!-- -->
it only should match with <!-- --> but it's selecting everything except <!-- 
end-readmore-item -->
the usage of this is gonna be to remove all the HTML comments except <!-- 
end-readmore-item -->

2 Answers 2

2

You can use the following pattern:

<!--(?!\s*?end-readmore-item\s*-->)[\s\S]*?-->

Regex101 demo.

Breakdown:

<!--                    # Matches `<!--` literally.
(?!                     # Start of a negative Lookahead (not followed by).
    \s*                 # Matches zero or more whitespace characters.
    end-readmore-item   # Matches literal string.
    \s*                 # Matches zero or more whitespace characters.
    -->                 # Matches `-->` literally.
)                       # End of the negative Lookahead.
[\s\S]*?                # Matches any character zero or more time (lazy match), 
                        # including whitespace and non-whitespace characters.
-->                     # Matches `-->` literally.

Which basically means:

Match <!-- that is not followed by [a whitespace* + end-readmore-item + another whitespace* + -->] and which is followed by any amount of characters then immediately followed by -->.


* An optional whitespace repeated zero or more times.

Sign up to request clarification or add additional context in comments.

1 Comment

This works for me! Can you please explain the expression to me as well?
1

You are very close with your negative lookahead assertion, you just need to modify it as follows:

<!--((?!end-readmore-item).)*?-->

Where *? matched non-greedily.

This will match all comments except those that contain the string end-readmore-item inside the comment body.

1 Comment

Nice one! The only critique I have is that this checks for the Lookahead for each character of the comment, which is inefficient, especially if you'll be checking against long comments.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.