2

I'm looking for a regex that matches strings from multiple lines that do not include certain words/characters.

In my case it is for refactoring of HTML template files. I have to remove inline stylings except when they contain a display:none; or $TEMPLATE_VARIABLE. For this I'm trying to use the search and replace function with regex from Netbeans.

What I had first is following:

    style="[^"(?!\$)]*"

Regex Test 1 This matches all style declarations that does not include template variables, but unfortunately does include display:none.

After some research I came up with the following:

    style="(?!display\s*:\s*none)[^"(?!\$)]*"

Regex Test 2 This works until something in the style declaration preceedes the display:none style.

Trying different approaches with negative lookbehinds and lookaheads did not result in success. For example:

    style="(?!.*(\$|display)).*"

Regex Test 3 This seemed to work at first glance but has several problems: other HTML element attributes that follow a style definition are matched together with the style definition and if there is a template variable used somewhere after the style definition there is no match for that style.

Does anyone have an idea how the regex has to look so that it turns this

    <span style="border: 1px solid red">Test</span>
    <form style="border: 1px solid black" method="POST">
        <span style="color:red; $TEMPLATE_VARIABLE"><span style="background-color:blue;" >Test</span>Test</span>
        <div style="display: none;">
            <span style="color: green; display: none;">Test</span>
            <span style="display: inline-block">Test $NOT_STYLING_TEMPLATE_VARIABLE</span>
        </div>
    </form>

into this?

    <span>Test</span>
    <form method="POST">
        <span style="color:red; $TEMPLATE_VARIABLE"><span>Test</span>Test</span>
        <div style="display: none;">
            <span style="color: green; display: none;">Test</span>
            <span">Test $NOT_STYLING_TEMPLATE_VARIABLE</span>
        </div>
    </form>

The remaining stylings where display:none or template variables are used will be cleaned by hand.

Thanks in advance!

1 Answer 1

5

Brief

You shouldn't be using regex to parse HTML, but I'll answer it in regex anyway since you are specifying an answer in regex and haven't specified any other language.

Also, I'd suggest changing \$ in the regex to \$\w+ since a[href$=".pdf"] is valid CSS and you might magically catch something like that (although I'm not sure how, but I'm sure you can be creative). It does add a somewhat preventative measure.

P.S Your regex was very close. In regex . will match any character. I've changed that to [^"] since the issue is the . was also capturing ".


Code

See this code in use here

\s*style="(?![^"]*(\$|display:\s*none))[^"]*"(?:\s*(?=>))?

Results

Input

<span style="border: 1px solid red">Test</span>
<form style="border: 1px solid black" method="POST">
    <span style="color:red; $TEMPLATE_VARIABLE"><span style="background-color:blue;" >Test</span>Test</span>
    <div style="display: none;">
        <span style="color: green; display: none;">Test</span>
        <span style="display: inline-block">Test $NOT_STYLING_TEMPLATE_VARIABLE</span>
    </div>
</form>

Output

<span>Test</span>
<form method="POST">
    <span style="color:red; $TEMPLATE_VARIABLE"><span>Test</span>Test</span>
    <div style="display: none;">
        <span style="color: green; display: none;">Test</span>
        <span>Test $NOT_STYLING_TEMPLATE_VARIABLE</span>
    </div>
</form>

Explanation

  • \s* Match any whitespace character any number of times
  • style=" Match this string literally
  • (?![^"]*(\$|display:\s*none)) Negative lookahead ensuring that what follows does not match the following
    • [^"]* Match any character except "
    • (\$|display:\s*none) Match either of the following
      • \$ Match $ literally
      • display:\s*none Match display: literally, followed by any number of whitespace characters, followed by none literally
  • [^"]* Match any character except "
  • " Match " literally
  • (?:\s*(?=>))? Potentially match any following whitespace characters if the positive lookahead is true (if the following character is >) - This removes extra whitespace when it's not followed by any other attributes
Sign up to request clarification or add additional context in comments.

1 Comment

Ah, replacing both . by [^"] would have done the trick. Thank you very much! And the last part for remaining white space is pretty nice as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.