String.getBytes() returns array of Unicode chars

Question

I was reading getbytes and from documentation it states that it will return the resultant byte array.

But when i ran the following program, i found that it is returning array of Unicode symbols.

public class GetBytesExample {
    public static void main(String args[]) {
        String str = new String("A");
        byte[] array1 = str.getBytes();
        System.out.print("Default Charset encoding:");
        for (byte b : array1) {
            System.out.print(b);
        }

    }
}

The above program prints output

Default Charset encoding:65

This 65 is equivalent to Unicode representation of A. My question is that where are the bytes whose return type is expected.

What do you expect this method to return? You just printed each byte of the returned array, so they are definitely there. What is your expectation? — JB Nizet
– JB Nizet, Commented May 6, 2017 at 5:36
"My question where are the bytes that the return type is taking about." In your variable b? Quite unclear what your issue with this code is. — Tom
– Tom, Commented May 6, 2017 at 5:38
"where are the bytes" Well... in the array. You are simply misinterpreting what is happening when you print the elements of that array. — Andy Turner
– Andy Turner, Commented May 6, 2017 at 5:40
Even if there was no widening, what would you expect to be printed? A byte is a signed integer number on 8 bits (so from -128 to 127). 65 looks like a valid byte value to me. — JB Nizet
– JB Nizet, Commented May 6, 2017 at 5:45
My guess is that you were confused by the fact that you use print(), so all bytes are concatenated, without any space of new line between each of them. — JB Nizet
– JB Nizet, Commented May 6, 2017 at 5:50

Community · Accepted Answer · 2020-06-20 09:12:55Z

5

There is no PrintStream.print(byte) overload, so the byte needs to be widened to invoke the method.

Per JLS 5.1.2:

19 specific conversions on primitive types are called the widening primitive conversions:

byte to short, int, long, float, or double

...

There's no PrintStream.print(short) overload either.

The next most-specific one is PrintStream.print(int). So that's the one that's invoked, hence you are seeing the numeric value of the byte.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered May 6, 2017 at 5:34

Andy Turner

141k11 gold badges169 silver badges263 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Stephen C · Accepted Answer · 2017-05-06 06:39:30Z

This 65 is equivalent to Unicode representation of A

It is also equivalent to a UTF-8 representation of A

It is also equivalent to a ASCII representation of A

It is also equivalent to a ISO/IEC 8859-1 representation of A

It so happens that the encoding for A is similar in a lot character encodings, and that these are all similar to the Unicode code-point. And this is not a coincidence. It is a result of the history of character set / character encoding standards.

My question is that where are the bytes whose return type is expected.

In the byte array, of course :-)

You are (just) misinterpreting them.

When you do this:

    for (byte b : array1) {
        System.out.print(b);
    }

you output a series of bytes as decimal numbers with no spaces between them. This is consistent with the way that Java distinguishes between text / character data and binary data. Bytes are binary. The getBytes() method gives a binary encoding (in some character set) of the text in the string. You are then formatting and printing the binary (one byte at a time) as decimal numbers.

If you want more evidence of this, replace the "A" literal with a literal containing (say) some Chinese characters. Or any Unicode characters greater than \u00ff ... expressed using \u syntax.

Henry · Accepted Answer · 2017-05-06 05:37:32Z

1

String.getBytes() returns the encoding of the string using the platform encoding. The result depends on which machine you run this. If the platform encoding is UTF-8, or ASCII, or ISO-8859-1, or a few others, an 'A' will be encoded as 65 (aka 0x41).

answered May 6, 2017 at 5:37

Henry

43.9k7 gold badges75 silver badges89 bronze badges

Collectives™ on Stack Overflow

String.getBytes() returns array of Unicode chars

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related