0

Another regex question, yes, however the context for my implementation is within a Grunt process, with a known set of files to iterate and in those files are known combinations of script tags. There is zero chance of User interference, and the target files will not change over time.

Here are the combinations that I want to catch in a single regex:

<script>*</script>
<script type="text/javascript">*</script>

EDIT: The above combo should exclude:

<script src=""></script>
<script src="" type="text/javascript"></script>
<script SRC=""></script>
<script SRC="" TYPE="text/javascript"></script>

And then I need a second regex to catch the following:

<!--[if lt IE 9]><script>*</script><![endif]-->

And finally a third regex to catch the following:

<!--[if lte IE 9]><script>*</script><![endif]-->

Please don't combine the regexes, as I need different outcomes for each.

For reference, I've worked my way through this SO answer q/a: Removing all script tags from html with JS Regular Expression

But they catch too much, and none of the suggestions there cater for a separate regex for the conditional IE comments that I need to treat separately.

Also, I have tried grunt-dom-munger, however there were too many undesirable outcomes, and so I am trying a simplified solution involving regex actions with separate outcomes, within grunt-text-replace.

Many thanks you clever, clever regex folk!

5
  • 1
    inb4 The Pony. (Please refrain; this is a sufficiently constrained case.) Commented May 25, 2016 at 0:01
  • What are you having trouble with? Pretty much each regex will be exactly what you put, just with .*? at each *, and some escape slashes for escaping brackets and forward slashes. Commented May 25, 2016 at 0:06
  • What about the combo, that must catch both, but ignore anything else within the opening script tag? Sorry, I didn't add that to the question, I will do shortly, but there are also instances of <script src=""></script>, <script src="" type="text/javascript"></script>, <script SRC=""></script> and , <script SRC="" TYPE="text/javascript"></script> and these I do not want to catch. Commented May 25, 2016 at 0:17
  • I recommend breaking this up into multiple questions. One question for each thing you're trying to match Commented May 25, 2016 at 1:20
  • The first regex: regex101.com/r/aQ2yD1/1, just come up with an optional group. Commented May 25, 2016 at 5:10

2 Answers 2

1

First regex:

<script(?: type.*)?>.*<\/script>

Second regex:

<!--\[if lt IE 9\]><script>.*<\/script><!\[endif\]-->

Third regex:

<!--\[if lte IE 9\]><script>.*<\/script><!\[endif\]-->

Regex that matches both second and third:

<!--\[if lte? IE 9\]><script>.*<\/script><!\[endif\]-->
Sign up to request clarification or add additional context in comments.

4 Comments

.* has two problems. First, it can't traverse across newline characters. Second, it is greedy, which means it will consume starting from the first <script> tag all the way to last script's </script>. A better solution is to use [\S\s]*? which will accept any character, and also won't be greedy.
@Luk Storms Although I can't get grunt to find the matches, I can see from regex101 that this deals with exactly everything I need, thanks for that.
@4castle aaand the reason I can't get Grunt to pick up my matches, is because of the difference you've pointed out between .* and [\S\s]*?
[\S\s]* matches anything: all characters, spaces, tabs, linebreaks... Which is usefull when you do a multiline search. while .* just considers all characters & whitespaces without the linebreaks, as in your example. So 4castle is right, if it's expected that the </script> may not be on the same line as the <script>.
1

Here is one big regex that you can use, which uses capture groups to allow you to distinguish the matches from one another. I chose to create one regex, because otherwise the first match would fire-off inside the second or third matches also. I've formatted like PERL for readability:

(<!--\[if lt(e)? IE 9\]>)?                # opening IE with capture groups
    <script(?: type="text\/javascript")?> # opening script tag
        [\S\s]*?                          # lazily capture all characters
    <\/script>                            # closing script tag
(?:<!\[endif\]-->)?                       # closing IE

Regex101 Tested

If the regex matches option #1, there won't be a first or second capture group.
If it matches option #2, there will be a first but not a second capture group.
If it matches option #3, there will be a first and second capture group.

Here's how to use it:

html.replace(
    /(<!--\[if lt(e)? IE 9\]>)?<script(?: type="text\/javascript")?>[\S\s]*?<\/script>(?:<!\[endif\]-->)?/g,
    function(match, $1, $2) {
        if ($1) {
            if ($2) {
                // handle option 3
            } else {
                // handle option 2
            }
        } else {
            // handle option 1
        }
        return match; // this what the match will be replaced by
        // returning the match means the og string won't be changed
    });

JSFiddle Example

1 Comment

This was a bit outside my context, but I appreciate your thoroughness, and also I didn't know regex101 was a thing - amazing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.