1

I need to find a string that contains "script" with as many characters before or after, and enclosed in < and >. I can do this with:<*script.*>

I also want to match only when that string is NOT followed by a < The closest I've come, so far, is with this: (<*script.*>)([^=?<*]*)$

However, that will fail for something like <script></script> because the last > isn't followed by a < (so it doesn't match).

How can I check if only the the first > is followed by < or not?

For example, <script> abc () ; </script> MATCH

<< ScriPT >abc (”XXX”);//<</ ScriPT > MATCH

<script></script> DON'T MATCH

And, a case that I still am working on: <script/script> DON'T MATCH

Thanks!

4
  • 3
    ARe you trying to parse HTMl using regex? That isn't a great idea. Commented Jun 1, 2017 at 14:48
  • no, i'm parsing console output. Commented Jun 1, 2017 at 14:50
  • 1
    Maybe you can give an example or two of short sample source strings and what you would like in the "match". It's hard to understand what you want in your match from the description above Commented Jun 1, 2017 at 14:53
  • Sure, thanks SactoJosh. Commented Jun 1, 2017 at 14:53

3 Answers 3

2

You were close with your Regex. You just needed to make your first query non-greedy using a ? after the second *. Try this out:

(?i)<*\s*script.*?>[^<]+<*[^>]+>

There is an app called Expresso that really helps with designing Regex strings. Give it a shot.

Explanation: Without the ? non-greedy argument, your second * before the first > makes the search go all the way to the end of the string and grab the > at the end right at that point. None of the other stuff in your query was even being looked at.

EDIT: Added (?i) at the beginning for case-insensitivity. If you want a javascript specific case-insensitive regex, you would do that like this:

/<*\s*script.*?>[^<]+<*[^>]+>/i

I noticed you have parenthesis in your regex to make groups but you didn't specifically say you were trying to capture groups. Do you want to capture what's between the <script> and </script>? If so, that would be:

/<*\s*script.*?>([^<]+)<*[^>]+>/i

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you SactoJosh! I appreciate your time! I've been using regex101.com to do quick testing.
1

If I understand what you are looking for give this a try:

regex = "<\s*script\s*>([^<]+)<"

Here is an example in Python:

import re

textlist = ["<script>show this</script>","<script></script>"]

regex = "<\s*script\s*>([^<]+)"

for text in textlist:
    thematch = re.search(regex, text, re.IGNORECASE)
    if thematch:
        print ("match found:")
        print (thematch.group(1))
    else:
        print ("no match sir!")

Explanation: start with < then possible spaces, the word script, possible spaces, a > then capture all (at least 1) non < and make sure that's followed by a <

Hope that helps!

1 Comment

much appreciated sniperd! This works nicely as well!
-1

This would be better solved by using substring() and/or indexOf() JavaScript methods

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.