Why does this regex not validate in the same way in PHP?

Question

when I try preg_match with the following expression: /.{0,5}/, it still matches string longer than 5 characters. It does, however, work properly when trying in online regexp matcher

Spudley · Accepted Answer · 2011-07-23 21:08:25Z

6

The site you reference, myregexp.com, is focussed on Java.

Java has a specific function for matching an exact pattern, without needing to use anchor characters. This is the function which myregexp.com uses.

In most other languages, in order to match an exact pattern, you would need to add the anchoring characters ^ and $ at the start and end of the pattern respectively, otherwise the regex assumes it only needs to find the matched pattern somewhere within the string, rather than the whole string being the match.

This means that without the anchors, your pattern will match any string, of any length, because whatever the string, it will contain within it somewhere a match for "zero to five of any character".

So in PHP, and Perl, and virtually any other language, you need your pattern to look like this:

/^.{0,5}$/

Having explained all that, I would make one final observation though: this specific pattern really doesn't need to be a regular expression -- you could achieve the same thing with strlen(). In addition, the dot character in regex may not work exactly as you expect: it typically matches almost any character; some characters, including new line characters, are excluded by default, so if your string contains five characters, but one of them is a new line, it will fail your regex when you might have expected it to pass. With this in mind, strlen() would be a safer option (or mb_strlen() if you expect to have unicode characters).

If you need to match any character in regex, and the default behaviour of the dot isn't good enough, there are two options: One is to add the s modifier at the end of the expression (ie it becomes /^.{0,5}$/s). The s modifier tells regex to include new line characters in the dot "any character" match.

The other option (which is useful for languages that don't support the s modifier) is to use an expression and its negative together in a character class - eg [\s\S] - instead of the dot. \s matches any white space character, and \S is a negative of \s, so any character not matched by \s. So together in a character class they match any character. It's more long winded and less readable than a dot, but in some languages it's the only way to be sure.

You can find out more about this here: http://www.regular-expressions.info/dot.html

Hope that helps.

edited Jul 23, 2011 at 21:08

answered Jul 23, 2011 at 19:49

Spudley

169k39 gold badges240 silver badges309 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

CookieMonster Over a year ago

Thanks! Yes, I'm aware of the fact that strlen() would work as well to a certain extent (for most purposes), but I'm working on a form validation engine which focuses solely on regular expressions.

Spudley Over a year ago

@Cookie - fair enough. I felt I needed to point it out though. btw, I've added a few extra comments about the dot 'any character' match. Just so you're aware of some quirks which might catch you out (especially if you'll be validating textarea form fields)

CookieMonster Over a year ago

That's cool, thanks for the additional info on the dot as well! I'm definitely going to use the s modifier, great stuff!

Michael Berkowski · Accepted Answer · 2011-07-23 19:36:28Z

4

You need to anchor it with ^$. These symbols match the beginning and end of the string respectively, so it must be 0-5 characters between the beginning and end. Leaving out the anchors will match anywhere in the string so it could be longer.

/^.{0,5}$/

For better readability, I would probably also enclose the . in (), but that's kind of subjective.

/^(.){0,5}$/

answered Jul 23, 2011 at 19:36

Michael Berkowski

271k47 gold badges452 silver badges395 bronze badges

1 Comment

Spudley Over a year ago

adding the brackets also has the effect of adding a capturing group. it's fine to use it for readability, but if that's all you're using it for, be aware that it does add some processing overhead because of this.

Collectives™ on Stack Overflow

Why does this regex not validate in the same way in PHP?

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related