If you try:
final String test = "ппп";
you will find -2 -1 only appears at the beginning:
-2
-1
4
63
4
63
4
63
-2 is 0xFE and -1 is 0xFF. Together, they form a BOM (Byte_order_mark):
In UTF-16, a BOM (U+FEFF) may be placed as the first character of a
file or character stream to indicate the endianness (byte order) of
all the 16-bit code units of the file or stream. If an attempt is made
to read this stream with the wrong endianness, the bytes will be
swapped, thus delivering the character U+FFFE, which is defined by
Unicode as a "non character" that should never appear in the text.
test.getBytes("UTF-16"); defaults to using Big Endian when encoding the bytes, so a BOM is included in front so later processors can know that Big Endian was used.
You can explicitly specify endian by using UTF-16LE or UTF-16BE instead, thus avoiding a BOM in the output:
final byte[] bytes = test.getBytes("UTF-16BE");
The UTF-16 charsets use sixteen-bit quantities and are therefore sensitive to byte order. In these encodings the byte order of a stream may be indicated by an initial byte-order mark represented by the Unicode character '\uFEFF'. Byte-order marks are handled as follows:
When decoding, the UTF-16BE and UTF-16LE charsets interpret the initial byte-order marks as a ZERO-WIDTH NON-BREAKING SPACE; when encoding, they do not write byte-order marks.
When decoding, the UTF-16 charset interprets the byte-order mark at the beginning of the input stream to indicate the byte-order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.