7

I've got the string

$result = "bei einer Temperatur, die etwa 20 bis 60°C unterhalb des Schmelzpunktes der kristallinen Modifikation"

which comes straight from a MySQL table. The table, and the php headers are both set to UTF-8

I want to strip the 'degree' symbol: http://en.wikipedia.org/wiki/Degree_symbol and replace it with the word 'degrees' to get:

"bei einer Temperatur, die etwa 20 bis 60degreesC unterhalb des Schmelzpunktes der kristallinen Modifikation"

but I can't get it to work with preg_replace.

If I do:

$result = preg_replace('/\xB0/u'," degrees ", $result ); - I get an empty string

And if I do::

$result = preg_replace('/\u00B0/u'," degrees ", $result ); - I get the error:

Warning: preg_replace() [function.preg-replace]: Compilation failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1 in /var/www/html/includes/classes/redeyeTable.inc.php on line 75

I'm not great with encodings... what am I doing wrong here?

5
  • The first solution you post works perfectly for me. Commented Jun 29, 2010 at 12:52
  • According to this page (and to your error message BTW), you cannot use \u fr.php.net/manual/en/reference.pcre.pattern.differences.php Commented Jun 29, 2010 at 12:53
  • Are you sure you have the same symbol? Unicode has many similar characters. Commented Jun 29, 2010 at 12:54
  • 2
    The first one should work fine. But if you are just replacing that, you can use the faster str_replace() instead. Commented Jun 29, 2010 at 12:57
  • Thanks for the comments - I agree the first option should work. I have no idea why it strips ALL text out. I'm finding PHP and UTF-8 to be a rather tricky combination. I'm just using a standard ubuntu 10.04 install and the latest stable PHP build, don't know why character handling fails at every turn. Commented Jun 29, 2010 at 13:36

2 Answers 2

30

Use

$result = preg_replace('/\x{00B0}/u'," degrees ", $result );

Please see here for more information on the \x{FFFF}-syntax.

It's important to note the difference between \xB0 and \x{00B0}:

  • \xB0 denotes a single character with hex-code B0 (176 decimal) which is the degree symbol (°) in ISO-8859-1 for example
  • \x{00B0} denotes the unicode codepoint U+00B0 which describes the degree symbol (°) in the unicode system. This codepoint will be encoded using two bytes \xC2\xB0 when using UTF-8 encoding.
Sign up to request clarification or add additional context in comments.

3 Comments

That works! Thank you Stefan and everyone who contributed. My mistake was not using the { } around the unicode codepoint. I appreciate the difference between \xB0 and \x{00B0} - it was more desperate trial and error that had me settling for \xB0 replacement on a unicode string. Stackoverflow once again is a life saver!
@Ed: You could mark the answer as "accepted" to show other users that this is the solution to your problem.
I don't know how many hours I spent searching for why my regex to replace some UTF8 chars was not working, and thanks to this trick with \xNN vs \x{NN}, I finally got it right. Many thanks Stefan :-)
8

If you use the 'u' modifier, the pattern is supposed to be treated as utf-8, so why not simply write '°' instead of '\u00B0' or '\xB0'?

1 Comment

$result = preg_replace('/°/u'," degrees ", $result ); does work.... why doesn't it work giving the string in hex?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.