How to convert a double byte utf-8 character to utf-16 in PHP

Question

I've got the following double byte utf8 character

\ud83d\ude04

(It's an ios emoji). I want to convert to a utf-16

U+1F604

How do I do this? I've tried the following:

$utf8_string = "\ud83d\ude04";
$utf16_string = mb_convert_encoding($utf8_string, 'UTF-16', 'UTF-8');

But I get the original utf8 string. It doesnt get converted.

I'm thinking I may need to decode the utf8 string first. I've tried doing this with json_decode (which works quite nicely to decode utf8 character sets). But still no joy.

\u... is not UTF-8 and U+... is not UTF-16. The former looks like a JSON encoded representation of the character and the latter looks like a formal Unicode code point. Neither is a UTF encoding. — deceze
– deceze ♦, Commented Apr 9, 2014 at 12:57

deceze · Accepted Answer · 2014-04-09 13:19:59Z

3

First off, let's get the terms right:

\ud83d\ude04 is a Unicode escape sequence as used in, for example, Javascript. It is not "UTF-8".
It is also not "double byte", but rather a surrogate pair.
U+1F604 is the official notation of a Unicode code point. It is not "UTF-16".

The first step is to get from "\ud83d\ude04" to a UTF-8 encoded string. The easiest method is:

$utf8 = json_decode('"\ud83d\ude04"'); // note the added "" quotes

To convert from here to a UTF-16 encoded string, simply do:

iconv('UTF-8', 'UTF-16', $utf8)

However the result is not "U+1F604", but rather a UTF-16 encoded string (the hex representation of which is feffd83dde04).

To get a Unicode code point notation, the easiest way is probably to convert to UCS-4 and adjust leading zeros:

$ucs4      = iconv('UTF-8', 'UCS-4', $utf8);
$codepoint = sprintf('U+%04s', ltrim(strtoupper(bin2hex($ucs4)), '0'));

edited Apr 9, 2014 at 13:19

answered Apr 9, 2014 at 13:14

deceze♦

525k89 gold badges806 silver badges954 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to convert a double byte utf-8 character to utf-16 in PHP

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related