different jvms with different encodings

Question

Suppose I have 2 jvms running - 1 is a client and the other is a server. Suppose the client and server are using different encodings. If I write a program on the client which sends Strings across the network to the server, is it necessary to encode the String in the client in the server's encoding before the client sends it across to the server? Would this be pointless if the 2 are using different encodings in the first place? How do clients and servers handle scenarios typically where they are exchanging messages where both are using different encodings?

anonymous · Accepted Answer · 2014-02-27 22:02:04Z

2

I suppose you are encountering what is called platform default encoding. For example, when converting bytes into String using new String(byte[]), the default encoding is used to convert bytes to String. Different servers may have different setup that have a different default platform encoding.

To prevent different behaviour of the servers due to different default encoding, specify the encoding to use when converting bytes[] to String. If you don't know the encoding to use, that is another matter but at least you get consistent results for the same byte stream.

For example, to convert String to UTF-8 byte stream use getBytes("UTF-8") and to get back the String, use String(byte[],"UTF-8");

edited Feb 27, 2014 at 22:02

answered Feb 27, 2014 at 21:47

anonymous

1,3178 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:16:23Z

1

JVMs always use UTF in Strings (read this answer).

The critical part is the transmission of the String which is likely to happen on a byte-based stream. Converting a String to a byte[] actually requires you to specify the encoding. You should use utf-8 in most cases.

// On the client side
byte[] bytes = myString.getBytes("UTF-8");
serverStream.write(bytes);
// On the server side
byte[] bytes = /* read bytes */;
String myString = new String(bytes, "UTF-8");

I suggest using a DataOutputStream/DataInputStream which provide methods for charset-safe String transmissions.

edited May 23, 2017 at 12:16

CommunityBot

11 silver badge

answered Feb 27, 2014 at 21:50

Tobias

7,7671 gold badge30 silver badges44 bronze badges

4 Comments

Joop Eggen Over a year ago

Clear answer, though DataInputStream and DataOutputStream is more for I/O of Java objects; a bit abused for String I/O. Better use new InputStreamReader(InputStream, "UTF-8") and OutputStreamWriter(OutputStream, "UTF-8").

anonymous Over a year ago

@still_learning What do you mean when you say JVM always use UTF in Strings. Are you saying that JVM always uses UTF to convert bytes to String? I certainly have experienced a different default encoding in different servers. So if you have a code new String(byte[]), this code can yield different String in different servers even for the same byte stream.

Joop Eggen Over a year ago

Strings in Java are always in Unicode, so can represent all characters, byte[] is binary data. In Java one hence has to say what encoding the bytes are in to convert between bytes and String. Unfortunately often I/O functions has a method version without encoding where the default encoding of the operating system is taken.

Tobias Over a year ago

@JoopEggen Absolutely true as long as you just want to read Strings (or chars). And thanks for explaining my answer to @anonymous.

Collectives™ on Stack Overflow

different jvms with different encodings

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related