9

I have the following address line: Praha 5, Staré Město,

I need to use utf8_decode() function on this string before I can write it to a PDF file (using domPDF lib).

However, the php utf8 decode function for the above address line appears incorrect (or rather, incomplete).

The following code:

<?php echo utf8_decode('Praha 5, Staré Město,'); ?>

Produces this:

Praha 5, Staré M?sto,

Any idea why ě is not getting decoded?

1
  • utf8_decode simply converts a string encoded in UTF-8,is your string is utf8_encoded ? Commented Jun 20, 2013 at 10:22

4 Answers 4

15

utf8_decode converts the string from a UTF-8 encoding to ISO-8859-1, a.k.a. "Latin-1".
The Latin-1 encoding cannot represent the letter "ě". It's that simple.
"Decode" is a total misnomer, it does the same as iconv('UTF-8', 'ISO-8859-1', $string).

See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks the best answer (2015) +1
@deceze utf8_decode converts the string from a UTF-8 encoding to ISO-8859-1 This saved me probably couple of hours! would gladly buy you a drink if you were in our office :)
2

I wound up using a home-grown UTF-8 / UTF-16 decoding function (convert to &#number; representations), I haven't found any pattern to why UTF-8 isn't detected, I suspect it's because the "encoded-as" sequence isn't always exactly in the same position in the string returned. You might do some additional checking on that.

Three-character UTF-8 indicator: $startutf8 = chr(0xEF).chr(187).chr(191); (if you see this ANYWHERE, not just first three characters, the string is UTF-8 encoded)

Decode according to UTF-8 rules; this replaced an earlier version which chugged through byte by byte:using

function charset_decode_utf_8 ($string) {
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
    return $string;

// decode three byte unicode characters
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",       
"'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",   
$string);

// decode two byte unicode characters
$string = preg_replace("/([\300-\337])([\200-\277])/e",
"'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",
$string);

return $string;
}

Comments

2

Problem is in your PHP file encoding , save your file in UTF-8 encoding , then even no need to use utf8_decode , if you get these data 'Praha 5, Staré Město,' from database , better change it charset to UTF-8

Comments

0

you don't need that (@Rajeev :this string is automatically detected as utf-8 encoded :

echo mb_detect_encoding('Praha 5, Staré Město,');

will always return UTF-8.).

You'd rather see : https://code.google.com/p/dompdf/wiki/CPDFUnicode

3 Comments

I removed the utf8_decode and set <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> and also DOMPDF_UNICODE_ENABLED is set to true in config. However, it does not work, ě appears as ?
I am using the 'Helvetica' font, could that be why?
you may have to install another font. Check the answers here : stackoverflow.com/questions/990181/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.