3

In JavaScript we can match individual Unicode codepoints or codepoint ranges by using the Unicode escape sequences, e.g.:

"A".match(/\u0041/) // => ["A"]
"B".match(/[\u0041-\u007A]/) // => ["B"]

But how could we create a regular expression to match a proper name which must include any Unicode "letter" using a JavaScript regular expression? Is there a range of letters? A special regex sequence or character class in JavaScript?

Say my website must validate names that could be in latin based languages as well as Hebrew, Cyrillic, Japanese (Katakana, Hiragana, etc.) is this feasible in JavaScript or is the only sane choice to delegate to a backend language with better Unicode support?

3

2 Answers 2

5

Here's a JS plugin that adds Unicode support to RegEx

http://xregexp.com/plugins/

Sign up to request clarification or add additional context in comments.

Comments

0

I am using for defining unicode of a symbols this site http://www.fileformat.info.

Unicode Blocks (Basic Latin, .+, Cyrillic, .+, Arabic and other): http://www.fileformat.info/info/unicode/block/index.htm

Unicode Character Categories (this does not work in JS): http://www.fileformat.info/info/unicode/category/index.htm

Letters (A-я): http://www.fileformat.info/info/unicode/char/a.htm

Fonts (which chars are supported in each font): http://www.fileformat.info/info/unicode/font/index.htm

Index for all above http://www.fileformat.info/info/unicode/index.htm

1 Comment

You mustn’t use Unicode blocks as a proxy for Unicode scripts, which is what you really want. The Unicode Standard speaks to this matter specifically.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.