2

I was reading getbytes and from documentation it states that it will return the resultant byte array.

But when i ran the following program, i found that it is returning array of Unicode symbols.

public class GetBytesExample {
    public static void main(String args[]) {
        String str = new String("A");
        byte[] array1 = str.getBytes();
        System.out.print("Default Charset encoding:");
        for (byte b : array1) {
            System.out.print(b);
        }

    }
}

The above program prints output

Default Charset encoding:65

This 65 is equivalent to Unicode representation of A. My question is that where are the bytes whose return type is expected.

8
  • What do you expect this method to return? You just printed each byte of the returned array, so they are definitely there. What is your expectation? Commented May 6, 2017 at 5:36
  • 1
    "My question where are the bytes that the return type is taking about." In your variable b? Quite unclear what your issue with this code is. Commented May 6, 2017 at 5:38
  • 3
    "where are the bytes" Well... in the array. You are simply misinterpreting what is happening when you print the elements of that array. Commented May 6, 2017 at 5:40
  • 1
    Even if there was no widening, what would you expect to be printed? A byte is a signed integer number on 8 bits (so from -128 to 127). 65 looks like a valid byte value to me. Commented May 6, 2017 at 5:45
  • 4
    My guess is that you were confused by the fact that you use print(), so all bytes are concatenated, without any space of new line between each of them. Commented May 6, 2017 at 5:50

3 Answers 3

5

There is no PrintStream.print(byte) overload, so the byte needs to be widened to invoke the method.

Per JLS 5.1.2:

19 specific conversions on primitive types are called the widening primitive conversions:

  • byte to short, int, long, float, or double
  • ...

There's no PrintStream.print(short) overload either.

The next most-specific one is PrintStream.print(int). So that's the one that's invoked, hence you are seeing the numeric value of the byte.

Sign up to request clarification or add additional context in comments.

Comments

2

This 65 is equivalent to Unicode representation of A

It is also equivalent to a UTF-8 representation of A

It is also equivalent to a ASCII representation of A

It is also equivalent to a ISO/IEC 8859-1 representation of A

It so happens that the encoding for A is similar in a lot character encodings, and that these are all similar to the Unicode code-point. And this is not a coincidence. It is a result of the history of character set / character encoding standards.


My question is that where are the bytes whose return type is expected.

In the byte array, of course :-)

You are (just) misinterpreting them.

When you do this:

    for (byte b : array1) {
        System.out.print(b);
    }

you output a series of bytes as decimal numbers with no spaces between them. This is consistent with the way that Java distinguishes between text / character data and binary data. Bytes are binary. The getBytes() method gives a binary encoding (in some character set) of the text in the string. You are then formatting and printing the binary (one byte at a time) as decimal numbers.

If you want more evidence of this, replace the "A" literal with a literal containing (say) some Chinese characters. Or any Unicode characters greater than \u00ff ... expressed using \u syntax.

Comments

1

String.getBytes() returns the encoding of the string using the platform encoding. The result depends on which machine you run this. If the platform encoding is UTF-8, or ASCII, or ISO-8859-1, or a few others, an 'A' will be encoded as 65 (aka 0x41).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.