How can I convert hexadecimal into a utf-8 encoded string in Java [duplicate]

Question

I have a hexadecimal string and I am trying to convert it back into a utf-8 encoded string.

Example:

String hexString = "6a6f65";

How do I convert that string above back into "joe"

From the top of my head: 1) take chunks of two characters from string; 2) parse as hexadecimal ints (there's a version of parseInt that accepts a radix argument, pass 16), 3) convert to char, 4) reassemble string. — Federico klez Culloca
– Federico klez Culloca, Commented Aug 13, 2021 at 18:59
Very quick and dirty and not properly tested - you can do that ;) String s = new String(new BigInteger(hexString, 16).toByteArray()); — g00se
– g00se, Commented Aug 13, 2021 at 19:02

Mark Rotteveel · Accepted Answer · 2021-08-14 09:59:39Z

0

If you can be sure that the hex string comes from a byte array of a properly UTF-8 encoded string, all you need to do is:

Convert the hex string back into a byte array.
Convert the byte array further back into a string, with correct encoding of course.

For the first part, there's a range of ways to do it. Just see this question and pick one that suits your needs.

Once you get the byte array back from the hex string, do this:

String s = new String(bytearr, StandardCharsets.UTF_8);

edited Aug 14, 2021 at 9:59

Mark Rotteveel

110k241 gold badges160 silver badges233 bronze badges

answered Aug 13, 2021 at 19:44

Miigon

8997 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Basil Bourque · Accepted Answer · 2021-08-13 21:02:06Z

-1

You cannot do so reliably.

Unicode characters may be encoded at any code point from U+0000 to U+10FFFF.

So there is no way for us to know how many characters at a time in your input should be parsed as the hexadecimal number of a Unicode code point.

Substring > code point integer > `StringBuilder#appendCodePoint` > `String`

If you know for certain the input should be parsed two characters at a time, use String#substring to retrieve each pair of characters. Parse each pair using Integer.parse.

int codePoint = Integer.parseInt( hexInput ,16 ) ;

Build up your results by using StringBuilder#appendCodePoint.

String hexString = "6a6f65";
StringBuilder builder = new StringBuilder();
for ( int i = 0 ; i < hexString.length() ; i += 2 ) {
    String substring = hexString.substring( i , i + 2 );
    int codePoint = Integer.parseInt( substring , 16 );
    builder.appendCodePoint( codePoint );
}
String result = builder.toString();

See this code run live at IdeOne.com.

result = joe

Caveat: If such inputs are coming from UTF-8 encoded text, this approach is not reliable. Such text may use 1, 2, 3, or 4 octets of data to represent any one character. If your input is indeed UTF-8 encoded text, then you should parse it as such.

Streams

Not that I recommend doing so in this case, but you could use streams.

StringBuilder builder = new StringBuilder();
String input = "6a6f65";
IntStream.iterate( 0 , ( x ) -> x < input.length() , i -> i + 2 ).forEach( i -> builder.appendCodePoint( Integer.parseInt( input.substring( i , i + 2 ) , 16 ) ) );
System.out.println( "builder = " + builder );

builder = joe

edited Aug 13, 2021 at 21:02

answered Aug 13, 2021 at 19:03

Basil Bourque

347k130 gold badges950 silver badges1.3k bronze badges

3 Comments

NotAidan Over a year ago

I got the hexadecimal from a utf-8 encoded byte array so i know where it would come from.

Miigon Over a year ago

Actually, it can be done reliably. No UTF encodings actually stores code points like that (With variable length code point and no length indication of any sort).Since doing it this way will make it impossible for any program to decode, making it pretty much useless.

g00se Over a year ago

Correct. UTF-8 has the number of bytes used encoded in its high bits

Collectives™ on Stack Overflow

How can I convert hexadecimal into a utf-8 encoded string in Java [duplicate]

2 Answers 2

Comments

Substring > code point integer > `StringBuilder#appendCodePoint` > `String`

Streams

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Substring > code point integer > StringBuilder#appendCodePoint > String

Streams

3 Comments

Linked

Related

Substring > code point integer > `StringBuilder#appendCodePoint` > `String`