I expected that, when a java character is stored as "UTF-16", each character uses 2 bytes, so "hello" should consume 10 bytes, but this code:
String h = "hello";
System.out.println(new String(h.getBytes("UTF-16"), "UTF-16").length());
System.out.println(new String(h.getBytes("UTF-8"), "UTF-8").getBytes("UTF-16").length);
Will print "5 12"
My question:
(1) I expected that the first println should get "10" as I mentioned. But why 5?
(2) For the second println, I am trying to getBytes for it first as "UTF-8" then as "UTF-16". I suppose it should also be 10. But actually it's 12.
I'm using MAC and my region is HongKong. Would you help to explain what's happening in the program, and how "5 12" actually came out?
Thanks a lot!