-1

I have a hexadecimal string and I am trying to convert it back into a utf-8 encoded string.

Example:

String hexString = "6a6f65";

How do I convert that string above back into "joe"

2
  • From the top of my head: 1) take chunks of two characters from string; 2) parse as hexadecimal ints (there's a version of parseInt that accepts a radix argument, pass 16), 3) convert to char, 4) reassemble string. Commented Aug 13, 2021 at 18:59
  • Very quick and dirty and not properly tested - you can do that ;) String s = new String(new BigInteger(hexString, 16).toByteArray()); Commented Aug 13, 2021 at 19:02

2 Answers 2

0

If you can be sure that the hex string comes from a byte array of a properly UTF-8 encoded string, all you need to do is:

  1. Convert the hex string back into a byte array.
  2. Convert the byte array further back into a string, with correct encoding of course.

For the first part, there's a range of ways to do it. Just see this question and pick one that suits your needs.

Once you get the byte array back from the hex string, do this:

String s = new String(bytearr, StandardCharsets.UTF_8);
Sign up to request clarification or add additional context in comments.

Comments

-1

You cannot do so reliably.

Unicode characters may be encoded at any code point from U+0000 to U+10FFFF.

So there is no way for us to know how many characters at a time in your input should be parsed as the hexadecimal number of a Unicode code point.

Substring > code point integer > StringBuilder#appendCodePoint > String

If you know for certain the input should be parsed two characters at a time, use String#substring to retrieve each pair of characters. Parse each pair using Integer.parse.

int codePoint = Integer.parseInt( hexInput ,16 ) ;

Build up your results by using StringBuilder#appendCodePoint.

String hexString = "6a6f65";
StringBuilder builder = new StringBuilder();
for ( int i = 0 ; i < hexString.length() ; i += 2 ) {
    String substring = hexString.substring( i , i + 2 );
    int codePoint = Integer.parseInt( substring , 16 );
    builder.appendCodePoint( codePoint );
}
String result = builder.toString();

See this code run live at IdeOne.com.

result = joe

Caveat: If such inputs are coming from UTF-8 encoded text, this approach is not reliable. Such text may use 1, 2, 3, or 4 octets of data to represent any one character. If your input is indeed UTF-8 encoded text, then you should parse it as such.

Streams

Not that I recommend doing so in this case, but you could use streams.

StringBuilder builder = new StringBuilder();
String input = "6a6f65";
IntStream.iterate( 0 , ( x ) -> x < input.length() , i -> i + 2 ).forEach( i -> builder.appendCodePoint( Integer.parseInt( input.substring( i , i + 2 ) , 16 ) ) );
System.out.println( "builder = " + builder );

builder = joe

3 Comments

I got the hexadecimal from a utf-8 encoded byte array so i know where it would come from.
Actually, it can be done reliably. No UTF encodings actually stores code points like that (With variable length code point and no length indication of any sort).Since doing it this way will make it impossible for any program to decode, making it pretty much useless.
Correct. UTF-8 has the number of bytes used encoded in its high bits

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.