unicode preg_replace problem in php

Question

I've got the string

$result = "bei einer Temperatur, die etwa 20 bis 60°C unterhalb des Schmelzpunktes der kristallinen Modifikation"

which comes straight from a MySQL table. The table, and the php headers are both set to UTF-8

I want to strip the 'degree' symbol: http://en.wikipedia.org/wiki/Degree_symbol and replace it with the word 'degrees' to get:

"bei einer Temperatur, die etwa 20 bis 60degreesC unterhalb des Schmelzpunktes der kristallinen Modifikation"

but I can't get it to work with preg_replace.

If I do:

$result = preg_replace('/\xB0/u'," degrees ", $result ); - I get an empty string

And if I do::

$result = preg_replace('/\u00B0/u'," degrees ", $result ); - I get the error:

Warning: preg_replace() [function.preg-replace]: Compilation failed: PCRE does not support \L, \l, \N, \U, or \u at offset 1 in /var/www/html/includes/classes/redeyeTable.inc.php on line 75

I'm not great with encodings... what am I doing wrong here?

According to this page (and to your error message BTW), you cannot use \u fr.php.net/manual/en/reference.pcre.pattern.differences.php — greg0ire
– greg0ire, Commented Jun 29, 2010 at 12:53
Are you sure you have the same symbol? Unicode has many similar characters. — Kobi
– Kobi, Commented Jun 29, 2010 at 12:54
The first one should work fine. But if you are just replacing that, you can use the faster str_replace() instead. — quantumSoup
– quantumSoup, Commented Jun 29, 2010 at 12:57
Thanks for the comments - I agree the first option should work. I have no idea why it strips ALL text out. I'm finding PHP and UTF-8 to be a rather tricky combination. I'm just using a standard ubuntu 10.04 install and the latest stable PHP build, don't know why character handling fails at every turn. — Ed Lewis
– Ed Lewis, Commented Jun 29, 2010 at 13:36

Community · Accepted Answer · 2017-05-23 12:00:10Z

30

Use

$result = preg_replace('/\x{00B0}/u'," degrees ", $result );

Please see here for more information on the \x{FFFF}-syntax.

It's important to note the difference between \xB0 and \x{00B0}:

\xB0 denotes a single character with hex-code B0 (176 decimal) which is the degree symbol (°) in ISO-8859-1 for example
\x{00B0} denotes the unicode codepoint U+00B0 which describes the degree symbol (°) in the unicode system. This codepoint will be encoded using two bytes \xC2\xB0 when using UTF-8 encoding.

edited May 23, 2017 at 12:00

CommunityBot

11 silver badge

answered Jun 29, 2010 at 13:37

Stefan Gehrig

83.9k24 gold badges162 silver badges193 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ed Lewis Over a year ago

That works! Thank you Stefan and everyone who contributed. My mistake was not using the { } around the unicode codepoint. I appreciate the difference between \xB0 and \x{00B0} - it was more desperate trial and error that had me settling for \xB0 replacement on a unicode string. Stackoverflow once again is a life saver!

Stefan Gehrig Over a year ago

@Ed: You could mark the answer as "accepted" to show other users that this is the solution to your problem.

dregad Over a year ago

I don't know how many hours I spent searching for why my regex to replace some UTF8 chars was not working, and thanks to this trick with \xNN vs \x{NN}, I finally got it right. Many thanks Stefan :-)

greg0ire · Accepted Answer · 2010-06-29 12:55:35Z

8

If you use the 'u' modifier, the pattern is supposed to be treated as utf-8, so why not simply write '°' instead of '\u00B0' or '\xB0'?

answered Jun 29, 2010 at 12:55

greg0ire

23.3k17 gold badges76 silver badges104 bronze badges

1 Comment

Ed Lewis Over a year ago

$result = preg_replace('/°/u'," degrees ", $result ); does work.... why doesn't it work giving the string in hex?

Collectives™ on Stack Overflow

unicode preg_replace problem in php

2 Answers 2

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related