0

I have a regex pattern ^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}. It validated in Java and the Browser console but failed in Python. Various online regex testers produced different results. Regex101 displayed an error for this pattern in JavaScript, yet it executed correctly in JavaScript code using the new RegExp() constructor. Other regex testers didn't show any errors. I am curious about the varying behavior of the same regex pattern.

I intend to collect user input as a regex, validate it on the frontend, and then send a request to the back end. Validated regex using below code in JS -

isValidRegex(regex) {
      try {
        regex = regex.replace(/^\/|\/[gimuy]*$/g, '')
        new RegExp(regex)
        return true
      } catch (e) {
        console.log(e)
        return false
      }
    },

I tried it on browser console, JS script, Java compiler and Python compiler also. I understand the different regex engine compilation in different languages, but I am confused about JS behaviour . In regex101 website, it throws error for JS -

  • You cannot create a range with shorthand escape sequences

Suggest me good method to validate user input as valid regex expression. This regex pattern expression confused me if I am using the correct method to validate regex or not.

2
  • There are many different regexp implementations, they each operate differently. Commented Jul 3, 2024 at 17:36
  • Keep in mind that even a "valid" pattern is able to crash your backend. Commented Jul 4, 2024 at 4:37

1 Answer 1

2

TL;DR

Historically, [\w-\.] is an issue.

As written, it basically translates to "range from any word char to a period" so the JS interpreter needs to be intelligent enough that you are being a silly goose and meant to say "a word char, a hyphen, or a period."


The issue with ^[\w-\.]+@([\w-]+\.)+[\w-]{2,4} specifically is that you are declaring an invalid range from the \w meta-escape to a period.

I'd say you are lucky that JS code interprets it no problem. The rule of thumb for literal hyphens inside a character class is to escape it [\w\-\.] or place it at the end to avoid ambiguity [\w\.-]

Presumably, regex101's pre-flight error checker is flagging [\w-\.] as a common invalid error and the error-checker needs to be updated to accept that sequence as being valid since modern browsers have adopted it.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.