1

I have problem with converting string to bytes in Java when I'm porting my C# library to it. It converts the string but it is not the same byte array.

I use this code in C#

string input = "Test ěščřžýáíé 1234";
Encoding encoding = Encoding.UTF8;
byte[] data = encoding.GetBytes(input);

And code in Java

String input = "Test ěščřžýáíé 1234";
String encoding = "UTF8";
byte[] data = input.getBytes(encoding);

Lwft one is Java output and right one is C# how to make Java output same as C# one ?

enter image description here

3
  • 1
    It should be "UTF-8" (edit: shouldn't matter -- "UTF8" is an alias) Commented Feb 27, 2014 at 12:12
  • Can you try and use StandardCharsets.UTF_8 and the appropriate .getBytes() method? Commented Feb 27, 2014 at 12:15
  • 2
    Wait wait wait -- how do you test that the bytes are the same? Don't forget that byte in C# is unsigned while it is a signed value in Java Commented Feb 27, 2014 at 12:17

2 Answers 2

3

In likelihood, the byte arrays are the same. However, if you're formatting them to a string representation (e.g. to view through a debugger), then they would appear different, since the byte data type is treated as unsigned in C# (having values 0255) but signed in Java (values -128127). Refer to this question and my answer for an explanation.

Edit: Based on this answer, you can print unsigned values in Java using:

byte b = -60;
System.out.println((short)(b & 0xFF));   // output: 196
Sign up to request clarification or add additional context in comments.

3 Comments

And is there any way to get in Java unsigned bytes instead signed in C# ?
@JanSchwar see my answer; but literally, you cannot get "unsigned bytes". Some libraries, like Guava, do provide helpers for such cases however.
To compare the lists (visually) for ( byte b : data ) System.out.println(b < 0 ? 256 + b : b);
2

These arrays are very probably the same.

You are hit by a big difference between C# and Java: in Java, byte is unsigned.

In order to dump, try this:

public void dumpBytesToStdout(final byte[] array)
{
    for (final byte b: array)
        System.out.printf("%02X\n", b);
}

And do an equivalent dump method in C# (no idea how, I don't do C#)

Alternatively, if your dump function involves integer types larger than byte, for instance an int, do:

i & 0xff

to remove the sign bits. Note that if you cast byte -1, which reads:

1111 1111

to an int, this will NOT give:

0000 0000 0000 0000 0000 0000 1111 1111

but:

1111 1111 1111 1111 1111 1111 1111 1111

ie, the sign bit is "carried" (otherwise, casting would yield int value 255, which is not -1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.