2

when I try preg_match with the following expression: /.{0,5}/, it still matches string longer than 5 characters. It does, however, work properly when trying in online regexp matcher

2 Answers 2

6

The site you reference, myregexp.com, is focussed on Java.

Java has a specific function for matching an exact pattern, without needing to use anchor characters. This is the function which myregexp.com uses.

In most other languages, in order to match an exact pattern, you would need to add the anchoring characters ^ and $ at the start and end of the pattern respectively, otherwise the regex assumes it only needs to find the matched pattern somewhere within the string, rather than the whole string being the match.

This means that without the anchors, your pattern will match any string, of any length, because whatever the string, it will contain within it somewhere a match for "zero to five of any character".

So in PHP, and Perl, and virtually any other language, you need your pattern to look like this:

/^.{0,5}$/

Having explained all that, I would make one final observation though: this specific pattern really doesn't need to be a regular expression -- you could achieve the same thing with strlen(). In addition, the dot character in regex may not work exactly as you expect: it typically matches almost any character; some characters, including new line characters, are excluded by default, so if your string contains five characters, but one of them is a new line, it will fail your regex when you might have expected it to pass. With this in mind, strlen() would be a safer option (or mb_strlen() if you expect to have unicode characters).

If you need to match any character in regex, and the default behaviour of the dot isn't good enough, there are two options: One is to add the s modifier at the end of the expression (ie it becomes /^.{0,5}$/s). The s modifier tells regex to include new line characters in the dot "any character" match.

The other option (which is useful for languages that don't support the s modifier) is to use an expression and its negative together in a character class - eg [\s\S] - instead of the dot. \s matches any white space character, and \S is a negative of \s, so any character not matched by \s. So together in a character class they match any character. It's more long winded and less readable than a dot, but in some languages it's the only way to be sure.

You can find out more about this here: http://www.regular-expressions.info/dot.html

Hope that helps.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! Yes, I'm aware of the fact that strlen() would work as well to a certain extent (for most purposes), but I'm working on a form validation engine which focuses solely on regular expressions.
@Cookie - fair enough. I felt I needed to point it out though. btw, I've added a few extra comments about the dot 'any character' match. Just so you're aware of some quirks which might catch you out (especially if you'll be validating textarea form fields)
That's cool, thanks for the additional info on the dot as well! I'm definitely going to use the s modifier, great stuff!
4

You need to anchor it with ^$. These symbols match the beginning and end of the string respectively, so it must be 0-5 characters between the beginning and end. Leaving out the anchors will match anywhere in the string so it could be longer.

/^.{0,5}$/

For better readability, I would probably also enclose the . in (), but that's kind of subjective.

/^(.){0,5}$/

1 Comment

adding the brackets also has the effect of adding a capturing group. it's fine to use it for readability, but if that's all you're using it for, be aware that it does add some processing overhead because of this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.