Block script character using regex

Question

I want to block all character that has possible script such as #$%^&*<>~\[]{}@.,?|/

I cannot use ^[a-zA-Z]([\w -]*[a-zA-Z])?$/i.test(value) because at my application I have spanish lang support which includes alphabets like ę Æ and so on....

Now how can i achieve this forming a Regex? Can anyone help me here? New to RegEx

I want to block special character specified above. characters which can potential form a script. For restriction of user input purpose

Sorry, what is the problem? Do you want to match a string that is fully composed of non-special chars? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 20, 2020 at 14:33
“Block” them for what purpose? And what do single characters have to do with “script”? — C3roe
– C3roe, Commented Nov 20, 2020 at 14:33
I want to block special character specified above. characters which can potential form a script. For restriction of user input purpose. @CBroe — Zeus Carl
– Zeus Carl, Commented Nov 20, 2020 at 14:35
Yes, but what is the context? While this might make sense for specific value such as maybe a user name, it makes much little sense if we are talking just about any free-form text input here. You say you are worried about characters from the Spanish language, but then you want to block simple punctuation characters such as dot or coma already - so the input of an actual natural language text with multiple sentences would be impossible in English already. So, what is the context? — C3roe
– C3roe, Commented Nov 20, 2020 at 14:38
So just use a negated character class containing all those “bad” ones then? — C3roe
– C3roe, Commented Nov 20, 2020 at 14:42

Wiktor Stribiżew · Accepted Answer · 2020-11-20 16:09:07Z

1

/^[a-zA-Z]([\w -]*[a-zA-Z])?$/i regex only matches ASCII characters.

If you plan to make it work with Spanish language, you need to make it Unicode aware.

Bearing in mind that a Unicode aware \w can be represented with [\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}] (see What's the correct regex range for javascript's regexes to match all the non word characters in any script?) and the Unicode letter pattern is \p{L}, the direct Unicode equivalent of your pattern is

/^\p{L}(?:[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}\s-]*\p{L})?$/iu.test(value)

I also replaced the regular space with \s to match any kind of Unicode whitespace.

Details

^ - start of string
\p{L} - any Unicode letter
(?:[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}\s-]*\p{L})? - an optional occurrence of any 0 or more Unicode word chars (letter, diacritic, number, connector punctuation (like _), join control chars), whitespace or hyphens followed with a single Unicode letter
$ - end of string.

answered Nov 20, 2020 at 16:09

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Zeus Carl Over a year ago

What if I don't want to put space after the input value. Meaning if I put example works but if I put example then it shows error. I want to allow space after example like this. Other than that works great!

Wiktor Stribiżew Over a year ago

@ZeusCarl Do you mean you have this where example has trailing spaces and you want to allow them? Add \s* at the end. See this regex demo where I am using a regular space instead of \s since the demo is run against a single multiline text.

Zeus Carl Over a year ago

Thanks, This is what I wanted.

Wiktor Stribiżew Over a year ago

@ZeusCarl Then change \p{L} into [\p{L}\p{N}], see this regex demo. If you have a list of specific requirements, please share. Fixing example after example is too time consuming.

Wiktor Stribiżew Over a year ago

So, all allowed chars are allowed everywhere and in any succession? Just use /^(?:[^\p{P}\p{S}]|[_-])*$/u (see demo). Or, just use /^\p{L}[\p{L}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation} -]*$/u, see this demo (where the first char must be a letter, and the rest is any zero or more of allowed chars).

|

Collectives™ on Stack Overflow

Block script character using regex

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related