0

My question is, why does the position of the '_' (underscore) character cause this problem?

I have inherited a script that is using php's preg_replace in a function. The regex used is returning a 0 on any number it is used on.

function foo($number){ 
  $number = preg_replace('/[a-z$,-_]/i','',$number);
  // more code...
}

I did a bunch of debugging and found the problem was with the preg_replace(). A co-worker mentioned that the order of the characters in the regex maybe causing the bug. So, I played with this and found it to be true. The position of the '_' (underscore) character is the sinister culprit. I changed this to:

'/[a-z$_,-]/i'

... and everything works fine.

So, the question, again, is why does the position of the '_' (underscore) character cause this problem? I've Googled on this but have not found it and I thought the minds in this forum may have the answer.

Thanks for any enlightenment! -jc

3
  • 1
    I suspect it's more the position of the - (hyphen) character than the underscore. Commented May 28, 2014 at 13:40
  • @Deadooshka trifle overzealous but yeah, that would work - you only really need to escape the hyphen in this example '/[a-z$,\-_]/i' since the regexp itself is in single-quotes no interpolation will occur. Commented May 28, 2014 at 13:47
  • so it is always safe to precede a non-alphanumeric with "\" to specify that it stands for itself. Commented May 28, 2014 at 13:58

2 Answers 2

3

It's the position of the hyphen, not the underscore. With [a-z$,-_], you're inadvertently creating a character range from , to _. Put the hyphen on the end or escape it.

Comma , is ASCII 0x2C, underscore _ is 0x5F, and digits fall between those (0x30 to 0x39).
(ref: ASCII table)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! That makes sense and I'm kicking myself for not seeing that in the reading I've done on ranges in regexs. I should have realized that even if the text wasn't explicit. Cheers!
0

Some characters need to be escaped like this

[a-z,\-_]

3 Comments

however, underscore is not one of those characters, the hyphen within the square brackets on the other hand...
escaping the hyphen does the trick. I tested this in my script. Both answers were spot on.
Escaping the hyphen will be more intuitive to a future coder who may inherit this script. I've commented both solutions but who knows if the comment will remain. Thank you all!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.