PHP regexp strange behavior

Question

I was developing a simple regex to parse part of a URL, the regex must be able to capture part of the url in a named group, there are only a few allowed characters (a-z0-9 and -) if other characters are present the regexp must fail for the given string and no capture will be done.

But as you can see on the screenshoot when the regexp find a % sign it stops, and capture the part before it (if it is longer than two chars), the result is the same without the word boundaries (\b).

I can't understand why % is acting like \n and the engine is capturing the previous chars and stopping the % is not in the allowed list of chars so it should fail for that string... or not?

I've tried in the actual PHP code as well, with the very same result.

EDIT 1:

Actual PHP code:

if (preg_match('/fixed_url_part/\b(?P<codename>[a-z0-9-]{2,})\b', $url, $regs)) {
    return $regs['codename'];
}

Exact code in the question would be useful. It looks though your placeholder simply looks for alphanumeric chars, which excludes %. — mario
– mario, Commented Aug 25, 2015 at 17:02
I edited the answer with the code, but the point is, why with % it capture the previous chars and with, for example _ on the string it fails? why is not failing with %? — SubniC
– SubniC, Commented Aug 25, 2015 at 17:16
Without the end anchor (as pointed out by @Halcyon) your pattern only matches "until" it finds no more matching characters. And the word \b boundary holds true when encountering %. — mario
– mario, Commented Aug 25, 2015 at 17:23

Jonny 5 · Accepted Answer · 2015-08-25 18:26:20Z

3

You didn't tell it to match the full line. Add $ to have it match the end.

^/fixed_url_part/\b(?P<codename>[a-z0-9\-]{2,})\b$
^-- match start of line                          ^-- match end of line

edited Aug 25, 2015 at 18:26

Jonny 5

12.4k2 gold badges29 silver badges42 bronze badges

answered Aug 25, 2015 at 17:01

Halcyon

57.8k10 gold badges93 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Arunesh Singh Over a year ago

keep - also in group as OP wants.

Andrea Corbellini Over a year ago

I'd also add ^, just in case. (I guess that abc/fixed_url_part/def should fail.)

SubniC Over a year ago

With the end of string anchor ($) it works fine, but what i want to know is why with % in the string the regexp capture part of it, when it shoud fail (as it fail if the character is _ instead of %).

Halcyon Over a year ago

I think it's because of \b (word boundary). % is considered a word boundary whereas _ is not. So % triggers the \b causing the match.

Collectives™ on Stack Overflow

PHP regexp strange behavior

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related