I have a string "Château" with UTF-8 encoded & it gets converted to US-ASCII format as "Ch??teau"(in the underlying lib of my app)
Now, I want to get the original string "Château" back from "U-ASCII" converted string "Ch??teau". But, I am not able to get that using the below code.
StringBuilder masterBuffer = new StringBuilder();
byte[] rawDataBuffer = (Read from InputStream) // say here it is "Château"
String rawString = new String(rawDataBuffer, "UTF-8");
masterBuffer.append(rawString);
onMessageReceived(masterBuffer.toString().getBytes()) => Here, getBytes() uses the platform's default charset 'US-ASCII.
My application receives the byte array of US-ASCII encoded. On application side, even if I try to get UTF-8 string out of it, it's of no use. The conversion attempt still gives "Ch??teau".
String asciiString = "Ch??teau";
String originalString = new String(asciiString.getBytes("UTF-8"), "UTF-8");
System.out.println("orinalString: " + originalString);
The value of 'originalString" is still "Ch??teau".
Is this right way to do this ?
Thanks,
Strings store text data regardless of the character coding, and this means that your problem lies beyond the code you posted. Please paste the full code.String(like C#'s, JavaScript's, …) is a counted sequence of UTF-16 code units, one or two of which encode a Unicode codepoint. (And, there are apparently characters in the computer world that aren't in the Unicode characters set.)charandStringmethods to act on UTF-16 data, so there would have to be additional conversions performed at runtime to facilitate ISO-8859-1 based strings in UTF-16 based code logic.Stringcould be carrier pigeons; aStringhas no encoding.charrelated operation you have to deal with how many UTF-16 code units in each individual Unicode codepoint.