Javascript regex for known script tag combinations

Question

Another regex question, yes, however the context for my implementation is within a Grunt process, with a known set of files to iterate and in those files are known combinations of script tags. There is zero chance of User interference, and the target files will not change over time.

Here are the combinations that I want to catch in a single regex:

<script>*</script>
<script type="text/javascript">*</script>

EDIT: The above combo should exclude:

<script src=""></script>
<script src="" type="text/javascript"></script>
<script SRC=""></script>
<script SRC="" TYPE="text/javascript"></script>

And then I need a second regex to catch the following:

<!--[if lt IE 9]><script>*</script><![endif]-->

And finally a third regex to catch the following:

<!--[if lte IE 9]><script>*</script><![endif]-->

Please don't combine the regexes, as I need different outcomes for each.

For reference, I've worked my way through this SO answer q/a: Removing all script tags from html with JS Regular Expression

But they catch too much, and none of the suggestions there cater for a separate regex for the conditional IE comments that I need to treat separately.

Also, I have tried grunt-dom-munger, however there were too many undesirable outcomes, and so I am trying a simplified solution involving regex actions with separate outcomes, within grunt-text-replace.

Many thanks you clever, clever regex folk!

inb4 The Pony. (Please refrain; this is a sufficiently constrained case.) — Andrew Cheong
– Andrew Cheong, Commented May 25, 2016 at 0:01
What are you having trouble with? Pretty much each regex will be exactly what you put, just with .*? at each *, and some escape slashes for escaping brackets and forward slashes. — castletheperson
– castletheperson, Commented May 25, 2016 at 0:06
What about the combo, that must catch both, but ignore anything else within the opening script tag? Sorry, I didn't add that to the question, I will do shortly, but there are also instances of <script src=""></script>, <script src="" type="text/javascript"></script>, <script SRC=""></script> and , <script SRC="" TYPE="text/javascript"></script> and these I do not want to catch. — danjah
– danjah, Commented May 25, 2016 at 0:17
I recommend breaking this up into multiple questions. One question for each thing you're trying to match — Ro Yo Mi
– Ro Yo Mi, Commented May 25, 2016 at 1:20
The first regex: regex101.com/r/aQ2yD1/1, just come up with an optional group. — Jan
– Jan, Commented May 25, 2016 at 5:10

castletheperson · Accepted Answer · 2016-05-25 12:28:04Z

1

First regex:

<script(?: type.*)?>.*<\/script>

Second regex:

<!--\[if lt IE 9\]><script>.*<\/script><!\[endif\]-->

Third regex:

<!--\[if lte IE 9\]><script>.*<\/script><!\[endif\]-->

Regex that matches both second and third:

<!--\[if lte? IE 9\]><script>.*<\/script><!\[endif\]-->

edited May 25, 2016 at 12:28

castletheperson

33.7k11 gold badges74 silver badges111 bronze badges

answered May 25, 2016 at 9:39

LukStorms

29.8k5 gold badges36 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

castletheperson Over a year ago

.* has two problems. First, it can't traverse across newline characters. Second, it is greedy, which means it will consume starting from the first <script> tag all the way to last script's </script>. A better solution is to use [\S\s]*? which will accept any character, and also won't be greedy.

danjah Over a year ago

@Luk Storms Although I can't get grunt to find the matches, I can see from regex101 that this deals with exactly everything I need, thanks for that.

danjah Over a year ago

@4castle aaand the reason I can't get Grunt to pick up my matches, is because of the difference you've pointed out between .* and [\S\s]*?

LukStorms Over a year ago

[\S\s]* matches anything: all characters, spaces, tabs, linebreaks... Which is usefull when you do a multiline search. while .* just considers all characters & whitespaces without the linebreaks, as in your example. So 4castle is right, if it's expected that the </script> may not be on the same line as the <script>.

castletheperson · Accepted Answer · 2016-05-25 22:37:52Z

1

Here is one big regex that you can use, which uses capture groups to allow you to distinguish the matches from one another. I chose to create one regex, because otherwise the first match would fire-off inside the second or third matches also. I've formatted like PERL for readability:

(<!--\[if lt(e)? IE 9\]>)?                # opening IE with capture groups
    <script(?: type="text\/javascript")?> # opening script tag
        [\S\s]*?                          # lazily capture all characters
    <\/script>                            # closing script tag
(?:<!\[endif\]-->)?                       # closing IE

Regex101 Tested

If the regex matches option #1, there won't be a first or second capture group.
If it matches option #2, there will be a first but not a second capture group.
If it matches option #3, there will be a first and second capture group.

Here's how to use it:

html.replace(
    /(<!--\[if lt(e)? IE 9\]>)?<script(?: type="text\/javascript")?>[\S\s]*?<\/script>(?:<!\[endif\]-->)?/g,
    function(match, $1, $2) {
        if ($1) {
            if ($2) {
                // handle option 3
            } else {
                // handle option 2
            }
        } else {
            // handle option 1
        }
        return match; // this what the match will be replaced by
        // returning the match means the og string won't be changed
    });

JSFiddle Example

edited May 25, 2016 at 22:37

answered May 25, 2016 at 13:23

castletheperson

33.7k11 gold badges74 silver badges111 bronze badges

1 Comment

danjah Over a year ago

This was a bit outside my context, but I appreciate your thoroughness, and also I didn't know regex101 was a thing - amazing.

Collectives™ on Stack Overflow

Javascript regex for known script tag combinations

2 Answers 2

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related