3

I have the below row stored in a HBase table

 DIEp(^o^)q3    column=DIE:ID, timestamp=1346194191174, value=\x00\x00\x00\x01

I am trying to access the value and convert it to its string representation which should be 1, but I don't get the right string representation when I cat this file (where my output is redirected to)

cat /hadoop/logs/userlogs/job_201209121654_0027/attempt_201209121654_0027_m_000000_0/stdout

I got something like this garbage NUL NUL NUL SOH

below is the code fragment that I am using.

byte[] result1 = value.getValue("DIE".getBytes(), "ID".getBytes());
String myresult = Bytes.toString(result1);
System.out.println(myresult);
4
  • What is the character encoding? And does the call to new String(byte[],charset) give you the right string? Commented Sep 12, 2012 at 16:47
  • I guess the the character encoding should be UTF-8 Commented Sep 12, 2012 at 16:53
  • Guessing is the route to failure when it comes to character encoding. The versions of the methods that take byte arrays and don't take encoding should never have been let loose in the first place ;-) Commented Sep 12, 2012 at 17:07
  • @StephenConnolly: As per the comment on my answer, looking at the data and the desired output, I don't believe "value" is meant to be the binary representation of a string to start with, at which point the question of an encoding is moot. Commented Sep 12, 2012 at 17:13

3 Answers 3

8

The standard HBase way of string conversion is Bytes.toBytes(string) and Bytes.toString(bytes). But Jon Skeet is correct in that you need to consider how you put the data into the column in the first place. If you used Bytes.toBytes(int), then you need to convert your bytes back into an integer before you convert to a string.

Sign up to request clarification or add additional context in comments.

Comments

3

We have simply used new String(byte[]), where byte[] comes from org.apache.hadoop.hbase.KeyValue.getValue() to parse the bytes from HBase column as string and it working fine for our projects. :) Sorry, if I missed something in the question. Hope this helps.

7 Comments

I really did some research before posting this question and I have used that but it gives me the same output as specified in the question. I was doing something like. String val = new String(myByteArray)
are you trying to read from log/output files of hadoop? We have this successfully with hadoop in map-reduce. Can you please be more specific from where you want to read HBase table?
I am also unable to understand which HBase API contains value.getValue("DIE".getBytes(), "ID".getBytes());
I am reading HBase table from my standalone HBase cluster. well about the API the getValue is from org.apache.hadoop.hbase.client.Result.getValue(byte[], byte[])
As your problem is solved by @Jon's solution, it seems you have inserted record using p.add(Bytes.toBytes("DIE"), Bytes.toBytes("ID"), Bytes.toBytes(1)); i.e. the value is 1 not "1" while being stored and if this the case String.valueOf(Bytes.toLong(result1)); should have worked fine without any bitwise operation. Also, I think the question is not solved yet because it does not answer why HBase API is not able to do what it is intended to do. :)
|
2

Firstly, I'd avoid using String.getBytes() without specifying an encoding. What encoding does the code actually expect? Specify it explicitly when you call "DIE".getBytes() and "ID".getBytes().

Next, it looks like you should be converting the 4 bytes into an integer first - then convert that integer into a string. For example:

byte[] valueAsBytes = ...;
int valueAsInt = ((valueAsBytes[0] & 0xff) << 24) |
                 ((valueAsBytes[1] & 0xff) << 16) |
                 ((valueAsBytes[2] & 0xff) << 8) |
                 (valueAsBytes[3] & 0xff);
String valueAsString = String.valueof(valueAsInt);

There may well be something in the Java API to do the bit manipulation directly, but I can't think of it right now. (There's DataInputStream, but that would require wrapping the byte array in a ByteArrayInputStream first, then you'd need to check the endianness...)

Your current code is doing exactly what you ask it to - admittedly with the default encoding of the platform. You've got "\u0000\u0000\u0000\u0001" basically.

11 Comments

new String(byte[],Charset) not via an int
@StephenConnolly: If the OP wants the value to be "1" (as per the question) then I stand by my answer. What encoding would you suggest to turn bytes of 00 00 00 01 into "1"?
UCS-4 is one encoding, but if those bytes are supposed to be an integer then he should be using something like new BigInteger(byte[]).toString(). Down vote retracted
Okay, @JonSkeet answer have given me the expected output, but really JonSkeet I don't get why my above code is not working, because I followed the HBase API and my output was strange.
@fanbondi: Have you looked at the bytes of data you're getting? Those aren't a text-encoding of "1".
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.