Is it correct/efficient to call trim() after a byte[] to String conversion?

Question

Suppose a byte array ending by a lot of zeros. The call

new String(barray,"UTF-8")

will give me a string with wrong length as 0 bytes will be translate to \0 (this cuz java doesn't consider strings like char sequences termined by a \0). Is it correct this function:

public String convertFromByteArray(byte[] a){
    String s = new String(a,"UTF-8");
    return s.trim();

Or is there a more efficient way?

possible duplicate of Converting a C-ctyle string encoded as a character array to a Java String — Bhesh Gurung
– Bhesh Gurung, Commented Mar 16, 2014 at 18:56
@BheshGurung Phate is asking about efficiency, in the question you've referenced there's nothing about that — rpax
– rpax, Commented Mar 16, 2014 at 19:06
Very good thanks! Anyway I will go with the 6x speed version as I am 100% sure my byte string will not have null bytes in between :) — Phate
– Phate, Commented Mar 16, 2014 at 21:52

Community · Accepted Answer · 2017-05-23 12:21:29Z

4

Yes there is.

public static void main(String[] args) {
        byte[] barray= new byte[99999999];
        barray[0]=72;
        barray[1]=101;
        barray[2]=108;
        barray[3]=108;
        barray[4]=111;
        barray[5]=33;
        for (int k = 6; k < barray.length; k++) {
            barray[k]=0;
        }
        try {
            long a=System.nanoTime();
            convertFromByteArray(barray);
            long b=System.nanoTime();
            long tot_1=b-a;
            long c=System.nanoTime();
            convertFromByteArray2(barray);
            long d=System.nanoTime();
            long tot_2=d-c;
            System.out.println(tot_1 +" - "+tot_2+" "+(tot_1*1.0/tot_2));

        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
    public static String convertFromByteArray(byte[] a) throws UnsupportedEncodingException{
        String s = new String(a,"UTF-8");
        return s.trim();
    }
    public static String convertFromByteArray2(byte[] barray) throws UnsupportedEncodingException {
        int i=0;
        while(barray[i++]!=0);

        return new String(barray,0,i-1,"UTF-8");
    }

Output:

426205180 - 69702 6114.676479871453

6k x faster

EDIT:

As @SotiriosDelimanolis and @BheshGurung noticed, if there's a byte 0 followed by a valid char, the solution is incorrect.

For covering all cases,

public static String convertFromByteArray2(byte[] barray) throws UnsupportedEncodingException {
    int i=barray.length-1;
    while(barray[i--]==0 && i>=0);
    return new String(barray,0,i+2,"UTF-8");
}

Tested with http://ideone.com/mg2U23

3x faster.

edited May 23, 2017 at 12:21

CommunityBot

11 silver badge

answered Mar 16, 2014 at 18:58

rpax

4,4947 gold badges35 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Phate Over a year ago

very nice! I would use a charset item at place of UTF-8 but I think this is the best too

user2889419 Over a year ago

So what about read two bytes at once for a character, and return when there is two consecutive \0? so the problme about 0 in the middle may get resolved.

Collectives™ on Stack Overflow

Is it correct/efficient to call trim() after a byte[] to String conversion?

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related