PHP Utf8 Decoding Issue

Question

I have the following address line: Praha 5, Staré Město,

I need to use utf8_decode() function on this string before I can write it to a PDF file (using domPDF lib).

However, the php utf8 decode function for the above address line appears incorrect (or rather, incomplete).

The following code:

<?php echo utf8_decode('Praha 5, Staré Město,'); ?>

Produces this:

Praha 5, Staré M?sto,

Any idea why ě is not getting decoded?

utf8_decode simply converts a string encoded in UTF-8,is your string is utf8_encoded ? — Rajeev Ranjan
– Rajeev Ranjan, Commented Jun 20, 2013 at 10:22

deceze · Accepted Answer · 2013-06-20 10:19:43Z

15

utf8_decode converts the string from a UTF-8 encoding to ISO-8859-1, a.k.a. "Latin-1".
The Latin-1 encoding cannot represent the letter "ě". It's that simple.
"Decode" is a total misnomer, it does the same as iconv('UTF-8', 'ISO-8859-1', $string).

See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.

answered Jun 20, 2013 at 10:19

deceze♦

525k89 gold badges806 silver badges954 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user3402040 Over a year ago

Thanks the best answer (2015) +1

whizzkid Over a year ago

@deceze utf8_decode converts the string from a UTF-8 encoding to ISO-8859-1 This saved me probably couple of hours! would gladly buy you a drink if you were in our office :)

Peters V · Accepted Answer · 2013-08-10 01:25:10Z

I wound up using a home-grown UTF-8 / UTF-16 decoding function (convert to &#number; representations), I haven't found any pattern to why UTF-8 isn't detected, I suspect it's because the "encoded-as" sequence isn't always exactly in the same position in the string returned. You might do some additional checking on that.

Three-character UTF-8 indicator: $startutf8 = chr(0xEF).chr(187).chr(191); (if you see this ANYWHERE, not just first three characters, the string is UTF-8 encoded)

Decode according to UTF-8 rules; this replaced an earlier version which chugged through byte by byte:using

function charset_decode_utf_8 ($string) {
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
    return $string;

// decode three byte unicode characters
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",       
"'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",   
$string);

// decode two byte unicode characters
$string = preg_replace("/([\300-\337])([\200-\277])/e",
"'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",
$string);

return $string;
}

vimal1083 · Accepted Answer · 2014-04-25 10:09:32Z

2

Problem is in your PHP file encoding , save your file in UTF-8 encoding , then even no need to use utf8_decode , if you get these data 'Praha 5, Staré Město,' from database , better change it charset to UTF-8

answered Apr 25, 2014 at 10:09

vimal1083

8,6916 gold badges36 silver badges50 bronze badges

Comments

scraaappy · Accepted Answer · 2013-06-20 10:47:26Z

0

you don't need that (@Rajeev :this string is automatically detected as utf-8 encoded :

echo mb_detect_encoding('Praha 5, Staré Město,');

will always return UTF-8.).

You'd rather see : https://code.google.com/p/dompdf/wiki/CPDFUnicode

answered Jun 20, 2013 at 10:47

scraaappy

2,8862 gold badges21 silver badges29 bronze badges

3 Comments

Latheesan Over a year ago

I removed the utf8_decode and set <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> and also DOMPDF_UNICODE_ENABLED is set to true in config. However, it does not work, ě appears as ?

Latheesan Over a year ago

I am using the 'Helvetica' font, could that be why?

scraaappy Over a year ago

you may have to install another font. Check the answers here : stackoverflow.com/questions/990181/…

Collectives™ on Stack Overflow

PHP Utf8 Decoding Issue

4 Answers 4

2 Comments

Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related