3

Suppose a byte array ending by a lot of zeros. The call

new String(barray,"UTF-8")

will give me a string with wrong length as 0 bytes will be translate to \0 (this cuz java doesn't consider strings like char sequences termined by a \0). Is it correct this function:

public String convertFromByteArray(byte[] a){
    String s = new String(a,"UTF-8");
    return s.trim();

Or is there a more efficient way?

3
  • possible duplicate of Converting a C-ctyle string encoded as a character array to a Java String Commented Mar 16, 2014 at 18:56
  • @BheshGurung Phate is asking about efficiency, in the question you've referenced there's nothing about that Commented Mar 16, 2014 at 19:06
  • 1
    Very good thanks! Anyway I will go with the 6x speed version as I am 100% sure my byte string will not have null bytes in between :) Commented Mar 16, 2014 at 21:52

1 Answer 1

4

Yes there is.

public static void main(String[] args) {
        byte[] barray= new byte[99999999];
        barray[0]=72;
        barray[1]=101;
        barray[2]=108;
        barray[3]=108;
        barray[4]=111;
        barray[5]=33;
        for (int k = 6; k < barray.length; k++) {
            barray[k]=0;
        }
        try {
            long a=System.nanoTime();
            convertFromByteArray(barray);
            long b=System.nanoTime();
            long tot_1=b-a;
            long c=System.nanoTime();
            convertFromByteArray2(barray);
            long d=System.nanoTime();
            long tot_2=d-c;
            System.out.println(tot_1 +" - "+tot_2+" "+(tot_1*1.0/tot_2));

        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
    public static String convertFromByteArray(byte[] a) throws UnsupportedEncodingException{
        String s = new String(a,"UTF-8");
        return s.trim();
    }
    public static String convertFromByteArray2(byte[] barray) throws UnsupportedEncodingException {
        int i=0;
        while(barray[i++]!=0);

        return new String(barray,0,i-1,"UTF-8");
    }

Output:

426205180 - 69702 6114.676479871453

6k x faster

EDIT:

As @SotiriosDelimanolis and @BheshGurung noticed, if there's a byte 0 followed by a valid char, the solution is incorrect.

For covering all cases,

public static String convertFromByteArray2(byte[] barray) throws UnsupportedEncodingException {
    int i=barray.length-1;
    while(barray[i--]==0 && i>=0);
    return new String(barray,0,i+2,"UTF-8");
}

Tested with http://ideone.com/mg2U23

3x faster.

Sign up to request clarification or add additional context in comments.

2 Comments

very nice! I would use a charset item at place of UTF-8 but I think this is the best too
So what about read two bytes at once for a character, and return when there is two consecutive \0? so the problme about 0 in the middle may get resolved.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.