2

I'm trying to convert Java Strings into their various encoding types and print it out.

For example, luke would be 6C 75 6B 65 in UTF-8 and UTF-16 while the Chinese character would would be E7 8C AA in UTF-8 and 732A in UTF-16.

How do I write a function that does that?

new String( org.apache.commons.codec.binary.Hex.encodeHex(str.getBytes("UTF-16")));

doesn't seem to work for UTF-16.

2
  • The UTF-16 encoding of luke would be 00 6C 00 75 00 6B 00 65. 6C 75 6B 65 would be 汵步. Commented Sep 10, 2013 at 4:14
  • @duskwuff I'm using this tool since I don't know how to do it in Java yet. How did you do the conversion? Commented Sep 10, 2013 at 4:25

1 Answer 1

8
public class UseTheForce {
    public static void main(final String[] args)
        throws java.io.UnsupportedEncodingException {
        for (final byte b : args[0].getBytes(args[1])) {
            System.out.printf("%1$02X ", (b & 0xFF));
        }
        System.out.println();
    }
}

Test

$ java UseTheForce luke US-ASCII
6C 75 6B 65

$ java UseTheForce luke UTF-8
6C 75 6B 65

$ java UseTheForce luke UTF-16
FE FF 00 6C 00 75 00 6B 00 65

$ java UseTheForce luke UTF-16BE
00 6C 00 75 00 6B 00 65

$ java UseTheForce luke UTF-16LE
6C 00 75 00 6B 00 65 00

$ java UseTheForce luke UTF-32
00 00 00 6C 00 00 00 75 00 00 00 6B 00 00 00 65

May the force be with you.

UPDATE

As describe in Formatter.html#detail, the (b & 0xFF) part is not necessary.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks buddy! Mine did actually work. I just forgot the FE FF that's in front for strictly UTF-16.
Can I know why (b & 0xFF) ?
@Mubasher That's making the signed 8-bit value into an unsigned which, I just realized, is not necessary.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.