1

Suppose I have 2 jvms running - 1 is a client and the other is a server. Suppose the client and server are using different encodings. If I write a program on the client which sends Strings across the network to the server, is it necessary to encode the String in the client in the server's encoding before the client sends it across to the server? Would this be pointless if the 2 are using different encodings in the first place? How do clients and servers handle scenarios typically where they are exchanging messages where both are using different encodings?

2 Answers 2

2

I suppose you are encountering what is called platform default encoding. For example, when converting bytes into String using new String(byte[]), the default encoding is used to convert bytes to String. Different servers may have different setup that have a different default platform encoding.

To prevent different behaviour of the servers due to different default encoding, specify the encoding to use when converting bytes[] to String. If you don't know the encoding to use, that is another matter but at least you get consistent results for the same byte stream.

For example, to convert String to UTF-8 byte stream use getBytes("UTF-8") and to get back the String, use String(byte[],"UTF-8");

Sign up to request clarification or add additional context in comments.

Comments

1

JVMs always use UTF in Strings (read this answer).

The critical part is the transmission of the String which is likely to happen on a byte-based stream. Converting a String to a byte[] actually requires you to specify the encoding. You should use utf-8 in most cases.

// On the client side
byte[] bytes = myString.getBytes("UTF-8");
serverStream.write(bytes);
// On the server side
byte[] bytes = /* read bytes */;
String myString = new String(bytes, "UTF-8");

I suggest using a DataOutputStream/DataInputStream which provide methods for charset-safe String transmissions.

4 Comments

Clear answer, though DataInputStream and DataOutputStream is more for I/O of Java objects; a bit abused for String I/O. Better use new InputStreamReader(InputStream, "UTF-8") and OutputStreamWriter(OutputStream, "UTF-8").
@still_learning What do you mean when you say JVM always use UTF in Strings. Are you saying that JVM always uses UTF to convert bytes to String? I certainly have experienced a different default encoding in different servers. So if you have a code new String(byte[]), this code can yield different String in different servers even for the same byte stream.
Strings in Java are always in Unicode, so can represent all characters, byte[] is binary data. In Java one hence has to say what encoding the bytes are in to convert between bytes and String. Unfortunately often I/O functions has a method version without encoding where the default encoding of the operating system is taken.
@JoopEggen Absolutely true as long as you just want to read Strings (or chars). And thanks for explaining my answer to @anonymous.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.