How to convert ASCII value to UTF-8 String in Java? 9812 -> ♔

Question

I want to convert the asci value of 9812 to ♔ UTF-8 string. How can it be done?

Not clear what you're asking. Strings in Java are never stored in UTF-8, only in (slightly modified) UTF-16. Also, 9812 is not an ASCII value. Are you saying you want to convert that number to the String that it represents in UTF-16? — Dawood ibn Kareem
– Dawood ibn Kareem, Commented Nov 17, 2020 at 0:59
9812 is not an ASCII value, ASCII being limited to 128 numbers. The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) — Basil Bourque
– Basil Bourque, Commented Nov 17, 2020 at 1:07
@BasilBourque yeah agree. I just followed the definition on the linked site. — Ilya Gazman
– Ilya Gazman, Commented Nov 17, 2020 at 1:08
Yeah, I'm thinking of sending a comment in to the people who run that site. It's really misleading, especially considering that many of the people who use that site will be people who don't have their heads around all of this stuff. — Dawood ibn Kareem
– Dawood ibn Kareem, Commented Nov 17, 2020 at 1:14

Artefacto · Accepted Answer · 2020-11-17 01:07:04Z

2

If it's a character in char range, then it's just:

Character.toString(9812).getBytes("UTF8")

If it's a code point larger than U+00FFFF, then you can use:

new String(Character.toChars(0x10400)).getBytes("UTF8")

If you just want a String, not the byte array with the UTF-8 representation, then omit getBytes.

edited Nov 17, 2020 at 1:07

answered Nov 17, 2020 at 0:45

Artefacto

98.1k17 gold badges207 silver badges232 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ilya Gazman Over a year ago

Tnx. You only forgot the char casting in the toString call.

Basil Bourque Over a year ago

Your first approach seems to succeed with a code point over U+00FFFF. See this code run live at IdeOne.com using "Face with Tears of Joy" character at code point 128,514 to produce the same byte array [-16, -97, -104, -126] using both of your two approaches.

Aplet123 · Accepted Answer · 2020-11-17 00:44:05Z

1

You can use Character.toString:

String myString = Character.toString(9812);

Alternatively, if you only need a char, then you don't need any functions:

char myChar = 9812;

answered Nov 17, 2020 at 0:44

Aplet123

35.8k1 gold badge41 silver badges66 bronze badges

2 Comments

Basil Bourque Over a year ago

The char type is obsolete, unable to handle even half of the characters defined in Unicode.

rzwitserloot Over a year ago

@BasilBourque it is not obsolete. It represents UTF-16 nodes; usually, entire symbols; sometimes, one half of a surrogate pair. This is baked into the Unicode spec (0xD800-0xDFFF are intentionally 'left blank', in perpetuity, in order for 'char' to continue to work fine).

Michael Gantman · Accepted Answer · 2020-11-17 01:27:27Z

The number 9812 in decimal is 2654 in Hexadecimal. There is an Open source library that can convert any string into Unicode sequence and vice-versa. So the following code will print your desired String.

System.out.println(StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString("\\u2654"));

The output would be:

♔

Converting String to Unicode sequences would be as follows:

System.out.println(StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence("Hello World"));

would result in this output:

\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064

The library is called MgntUtils and could be found as Maven artifact here and at Github including jar, source code and Javadoc here. Just Javadoc for class StringUnicodeEncoderDecoder can be found here

Collectives™ on Stack Overflow

How to convert ASCII value to UTF-8 String in Java? 9812 -> ♔

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related