Converting bytes[] to string in HBase

Question

I have the below row stored in a HBase table

 DIEp(^o^)q3    column=DIE:ID, timestamp=1346194191174, value=\x00\x00\x00\x01

I am trying to access the value and convert it to its string representation which should be 1, but I don't get the right string representation when I cat this file (where my output is redirected to)

cat /hadoop/logs/userlogs/job_201209121654_0027/attempt_201209121654_0027_m_000000_0/stdout

I got something like this garbage NUL NUL NUL SOH

below is the code fragment that I am using.

byte[] result1 = value.getValue("DIE".getBytes(), "ID".getBytes());
String myresult = Bytes.toString(result1);
System.out.println(myresult);

What is the character encoding? And does the call to new String(byte[],charset) give you the right string? — Stephen Connolly
– Stephen Connolly, Commented Sep 12, 2012 at 16:47
Guessing is the route to failure when it comes to character encoding. The versions of the methods that take byte arrays and don't take encoding should never have been let loose in the first place ;-) — Stephen Connolly
– Stephen Connolly, Commented Sep 12, 2012 at 17:07
@StephenConnolly: As per the comment on my answer, looking at the data and the desired output, I don't believe "value" is meant to be the binary representation of a string to start with, at which point the question of an encoding is moot. — Jon Skeet
– Jon Skeet, Commented Sep 12, 2012 at 17:13

David · Accepted Answer · 2012-09-12 20:53:04Z

8

The standard HBase way of string conversion is Bytes.toBytes(string) and Bytes.toString(bytes). But Jon Skeet is correct in that you need to consider how you put the data into the column in the first place. If you used Bytes.toBytes(int), then you need to convert your bytes back into an integer before you convert to a string.

answered Sep 12, 2012 at 20:53

David

3,26120 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

vikas · Accepted Answer · 2012-09-12 17:17:07Z

3

We have simply used new String(byte[]), where byte[] comes from org.apache.hadoop.hbase.KeyValue.getValue() to parse the bytes from HBase column as string and it working fine for our projects. :) Sorry, if I missed something in the question. Hope this helps.

answered Sep 12, 2012 at 17:17

vikas

1,5581 gold badge14 silver badges22 bronze badges

7 Comments

fanbondi Over a year ago

I really did some research before posting this question and I have used that but it gives me the same output as specified in the question. I was doing something like. String val = new String(myByteArray)

vikas Over a year ago

are you trying to read from log/output files of hadoop? We have this successfully with hadoop in map-reduce. Can you please be more specific from where you want to read HBase table?

vikas Over a year ago

I am also unable to understand which HBase API contains value.getValue("DIE".getBytes(), "ID".getBytes());

fanbondi Over a year ago

I am reading HBase table from my standalone HBase cluster. well about the API the getValue is from org.apache.hadoop.hbase.client.Result.getValue(byte[], byte[])

vikas Over a year ago

As your problem is solved by @Jon's solution, it seems you have inserted record using p.add(Bytes.toBytes("DIE"), Bytes.toBytes("ID"), Bytes.toBytes(1)); i.e. the value is 1 not "1" while being stored and if this the case String.valueOf(Bytes.toLong(result1)); should have worked fine without any bitwise operation. Also, I think the question is not solved yet because it does not answer why HBase API is not able to do what it is intended to do. :)

|

Jon Skeet · Accepted Answer · 2012-09-12 17:13:11Z

2

Firstly, I'd avoid using String.getBytes() without specifying an encoding. What encoding does the code actually expect? Specify it explicitly when you call "DIE".getBytes() and "ID".getBytes().

Next, it looks like you should be converting the 4 bytes into an integer first - then convert that integer into a string. For example:

byte[] valueAsBytes = ...;
int valueAsInt = ((valueAsBytes[0] & 0xff) << 24) |
                 ((valueAsBytes[1] & 0xff) << 16) |
                 ((valueAsBytes[2] & 0xff) << 8) |
                 (valueAsBytes[3] & 0xff);
String valueAsString = String.valueof(valueAsInt);

There may well be something in the Java API to do the bit manipulation directly, but I can't think of it right now. (There's DataInputStream, but that would require wrapping the byte array in a ByteArrayInputStream first, then you'd need to check the endianness...)

Your current code is doing exactly what you ask it to - admittedly with the default encoding of the platform. You've got "\u0000\u0000\u0000\u0001" basically.

edited Sep 12, 2012 at 17:13

answered Sep 12, 2012 at 16:46

Jon Skeet

1.5m893 gold badges9.3k silver badges9.3k bronze badges

11 Comments

Stephen Connolly Over a year ago

new String(byte[],Charset) not via an int

Jon Skeet Over a year ago

@StephenConnolly: If the OP wants the value to be "1" (as per the question) then I stand by my answer. What encoding would you suggest to turn bytes of 00 00 00 01 into "1"?

Stephen Connolly Over a year ago

UCS-4 is one encoding, but if those bytes are supposed to be an integer then he should be using something like new BigInteger(byte[]).toString(). Down vote retracted

fanbondi Over a year ago

Okay, @JonSkeet answer have given me the expected output, but really JonSkeet I don't get why my above code is not working, because I followed the HBase API and my output was strange.

Jon Skeet Over a year ago

@fanbondi: Have you looked at the bytes of data you're getting? Those aren't a text-encoding of "1".

|

Collectives™ on Stack Overflow

Converting bytes[] to string in HBase

3 Answers 3

Comments

7 Comments

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

7 Comments

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related