6

I am reading a large file from disk. That file just contains numbers, encoded as plain old ASCII. At the moment, I am reading in chunks, and then doing something like this:

byte[] token;  // bytes representing a bunch of numbers
int n = Integer.parseInt(new String(token));

In other words, I am converting to a String and then parsing the String to Integer. I would like to know if there is a way to use fast operations like shifting and binary arithmetic instead?

I suspect that this could be made faster. For example, the raw bytes for the numbers 1,2,3 are 49,50,51. Any ideas for hacks?

4
  • How are the numbers delimited in the input file? Commented Mar 6, 2013 at 1:56
  • Does it have any negative integers? Commented Mar 6, 2013 at 2:39
  • @Perception: no need to worry about that, I've already handled the delimination. I'm splitting out byte[] chunks as fast as possible. Commented Mar 6, 2013 at 9:53
  • @kuriouscoder: good question, no negatives. Commented Mar 6, 2013 at 9:54

3 Answers 3

8
    int n=0;
    for(byte b : token)
        n = 10*n + (b-'0');
Sign up to request clarification or add additional context in comments.

5 Comments

If you read a stream of ascii characters the order is already taken care of, right, or I am missing something?
There's no endian issue here. however, will some culture write lowest digit first? Say in Arabic, we see things like عام 2013 هو عام جيد. We know Arabic is written right-to-left, so here the Arabic number is written with lowest digit first? No idea.
This looks quite promising! Order isn't an issue, it's left to right as expected. Also, no foreign language tricks to worry about. I'll test it in a bit and let you know :)
Thanks! For interest, this resulted in a roughly 28% speedup.
how to convert this back to string to byte array ?
2

You can't do binary arithmetic exactly with base 10 numbers, but you can do decimal arithmetic. Assuming that higher-order digits come first:

byte[] token;
long n = 0;
long pow = 1;
for( int i = token.length - 1; i >= 0; i-- ) {
  n += (token[i]-48) * pow;
  pow *= 10;
}

Comments

0

try

    byte[] a = { 1, 2, 3 };
    for (int i = 0; i < a.length; i++) {
        a[i] += '0';
    }
    int n = Integer.parseInt(new String(a));
    System.out.println(n);

output

123

1 Comment

This is essentially the same as my original approach. I didn't want the extra overhead of creating new Strings and then parsing them to ints.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.