2

I want to block all character that has possible script such as #$%^&*<>~\[]{}@.,?|/

I cannot use ^[a-zA-Z]([\w -]*[a-zA-Z])?$/i.test(value) because at my application I have spanish lang support which includes alphabets like ę Æ and so on....

Now how can i achieve this forming a Regex? Can anyone help me here? New to RegEx

I want to block special character specified above. characters which can potential form a script. For restriction of user input purpose

9
  • 1
    Sorry, what is the problem? Do you want to match a string that is fully composed of non-special chars? Commented Nov 20, 2020 at 14:33
  • 1
    “Block” them for what purpose? And what do single characters have to do with “script”? Commented Nov 20, 2020 at 14:33
  • 1
    I want to block special character specified above. characters which can potential form a script. For restriction of user input purpose. @CBroe Commented Nov 20, 2020 at 14:35
  • 3
    Yes, but what is the context? While this might make sense for specific value such as maybe a user name, it makes much little sense if we are talking just about any free-form text input here. You say you are worried about characters from the Spanish language, but then you want to block simple punctuation characters such as dot or coma already - so the input of an actual natural language text with multiple sentences would be impossible in English already. So, what is the context? Commented Nov 20, 2020 at 14:38
  • 2
    So just use a negated character class containing all those “bad” ones then? Commented Nov 20, 2020 at 14:42

1 Answer 1

1

/^[a-zA-Z]([\w -]*[a-zA-Z])?$/i regex only matches ASCII characters.

If you plan to make it work with Spanish language, you need to make it Unicode aware.

Bearing in mind that a Unicode aware \w can be represented with [\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}] (see What's the correct regex range for javascript's regexes to match all the non word characters in any script?) and the Unicode letter pattern is \p{L}, the direct Unicode equivalent of your pattern is

/^\p{L}(?:[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}\s-]*\p{L})?$/iu.test(value)

I also replaced the regular space with \s to match any kind of Unicode whitespace.

Details

  • ^ - start of string
  • \p{L} - any Unicode letter
  • (?:[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}\s-]*\p{L})? - an optional occurrence of any 0 or more Unicode word chars (letter, diacritic, number, connector punctuation (like _), join control chars), whitespace or hyphens followed with a single Unicode letter
  • $ - end of string.
Sign up to request clarification or add additional context in comments.

7 Comments

What if I don't want to put space after the input value. Meaning if I put example works but if I put example then it shows error. I want to allow space after example like this. Other than that works great!
@ZeusCarl Do you mean you have this where example has trailing spaces and you want to allow them? Add \s* at the end. See this regex demo where I am using a regular space instead of \s since the demo is run against a single multiline text.
Thanks, This is what I wanted.
@ZeusCarl Then change \p{L} into [\p{L}\p{N}], see this regex demo. If you have a list of specific requirements, please share. Fixing example after example is too time consuming.
So, all allowed chars are allowed everywhere and in any succession? Just use /^(?:[^\p{P}\p{S}]|[_-])*$/u (see demo). Or, just use /^\p{L}[\p{L}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation} -]*$/u, see this demo (where the first char must be a letter, and the rest is any zero or more of allowed chars).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.