1

Edit: I feel like a bit of an idiot now as I actually included the wrong expression in the question. The correct expression is /^([ \u00c0-\u01ffa-zA-Z'\.\-])+$/, although it still throws the same error (except the offset is 5, not 44).

I have the following regular expression that I use to validate names using JavaScript:

/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i

(That was taken from a Stack Overflow answer, although I'm struggling to find the original question to provide a link, I'm afraid)..

I use it to validate names before sending them to the server, but then obviously they need re-validating on the server because of other ways of sending the data to the server (the data is eventually entered into a MySQL database).

It works wonderfully in JavaScript, allowing me to input all sorts of names such as John Smith, Henry O'Conner, Jérémie Dent-O'Brien. However, on copying the RegExp into PHP (using the following code), it throws the error as shown here.

$nameRegEx = "[that expression from above]";
$r = $_POST["r"];
if(preg_match($nameRegEx,$r)){
    // do MySQL stuff
}else{
    trigger_error("Invalid name",E_USER_ERROR); // Obviously I won't use this in
                                                // the final script as it is
                                                // very un-user-friendly
                                                // (is that a word?)
}

Warning: preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 44(see edit) 5 in /path/to/file.php on line 21

Fatal error: Invalid name in /path/to/file.php on line 48

Frankly, I know very little about RegEx and haven't got a clue what's going wrong here. A bit of research reveals that JS and PHP both base their RegEx on Perl syntax, so they're not too different, although there are some differences and this is why it's breaking. How do I convert it to work in PHP? Is there some kind of automated converter out there?

Other people have had similar issues, but because their issues are specific to their regular expressions I don't see how I can use that information in my RegEx.

Could someone point me in the right direction to convert this?

3
  • "...haven't got a clue what's going wrong here." Fundamentally, what's going wrong here is that PHP and JavaScript use different (though closely related) variants of regular expressions. Regular expressions are not a single, unified thing. There are lots of different syntaxes for them used in different environments. JavaScript's are most similar to Perl's, although they have differences even from those. Commented Aug 4, 2013 at 11:54
  • So how different are PHP and JS's RegEx syntaxes? How easy (or difficult) is to convert them between the two? Commented Aug 4, 2013 at 11:56
  • preg_match is for single byte strings. Not sure how it works in PHP, but you either need to use msbstring library (sorry, my info may be dated), or maybe there's something else to make it treat strings as multibyte.... aslo, idk, if you are trying to match some Unicode characters (are those whitespaces?) and your PHP setup simply doesn't support Unicode, I believe, you can just throw those away - no chance of getting them into your program. Commented Aug 4, 2013 at 12:23

1 Answer 1

2

The problem with your regex is that \u1234 matches unicode character 1234 in javascript, but this syntax is not valid in PCRE. The correct syntax in PCRE is \X{1234}. As you are matching a range of unicode characters, alter your regex as follows:

/^[ \X{00c0-01ff}a-zA-Z'\.\-]+$/

Note that I used \X{00c0-01ff} to match any unicode character in that range. I also removed the capture group, as it is slightly pointless to have 1 capture group for every character in the matched string.

This documentation might be helpful if you encounter other problems while converting a javascript regex to a PCRE regex.

Sign up to request clarification or add additional context in comments.

1 Comment

Wonderful. Working perfectly now. I understand that the \X{charcode} is the bit I couldn't work out which is \ucharcode in JavaScript. Thanks. I did actually try adding a u modifier but that didn't do anything, surprisingly. This is working for now though. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.