0

I want to convert the asci value of 9812 to UTF-8 string. How can it be done?

4
  • 1
    Not clear what you're asking. Strings in Java are never stored in UTF-8, only in (slightly modified) UTF-16. Also, 9812 is not an ASCII value. Are you saying you want to convert that number to the String that it represents in UTF-16? Commented Nov 17, 2020 at 0:59
  • 1
    9812 is not an ASCII value, ASCII being limited to 128 numbers. The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Commented Nov 17, 2020 at 1:07
  • @BasilBourque yeah agree. I just followed the definition on the linked site. Commented Nov 17, 2020 at 1:08
  • Yeah, I'm thinking of sending a comment in to the people who run that site. It's really misleading, especially considering that many of the people who use that site will be people who don't have their heads around all of this stuff. Commented Nov 17, 2020 at 1:14

3 Answers 3

2

If it's a character in char range, then it's just:

Character.toString(9812).getBytes("UTF8")

If it's a code point larger than U+00FFFF, then you can use:

new String(Character.toChars(0x10400)).getBytes("UTF8")

If you just want a String, not the byte array with the UTF-8 representation, then omit getBytes.

Sign up to request clarification or add additional context in comments.

2 Comments

Tnx. You only forgot the char casting in the toString call.
Your first approach seems to succeed with a code point over U+00FFFF. See this code run live at IdeOne.com using "Face with Tears of Joy" character at code point 128,514 to produce the same byte array [-16, -97, -104, -126] using both of your two approaches.
1

You can use Character.toString:

String myString = Character.toString(9812);

Alternatively, if you only need a char, then you don't need any functions:

char myChar = 9812;

2 Comments

The char type is obsolete, unable to handle even half of the characters defined in Unicode.
@BasilBourque it is not obsolete. It represents UTF-16 nodes; usually, entire symbols; sometimes, one half of a surrogate pair. This is baked into the Unicode spec (0xD800-0xDFFF are intentionally 'left blank', in perpetuity, in order for 'char' to continue to work fine).
-1

The number 9812 in decimal is 2654 in Hexadecimal. There is an Open source library that can convert any string into Unicode sequence and vice-versa. So the following code will print your desired String.

System.out.println(StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString("\\u2654"));

The output would be:

Converting String to Unicode sequences would be as follows:

System.out.println(StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence("Hello World"));

would result in this output:

\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064

The library is called MgntUtils and could be found as Maven artifact here and at Github including jar, source code and Javadoc here. Just Javadoc for class StringUnicodeEncoderDecoder can be found here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.