4

I'm trying to read a binary file in Java. I need methods to read unsigned 8-bit values, unsigned 16-bit value and unsigned 32-bit values. What would be the best (fastest, nicest looking code) to do this? I've done this in c++ and did something like this:

uint8_t *buffer;
uint32_t value = buffer[0] | buffer[1] << 8 | buffer[2] << 16 | buffer[3] << 24;

But in Java this causes a problem if for example buffer[1] contains a value which has it sign bit set as the result of a left-shift is an int (?). Instead of OR:ing in only 0xA5 at the specific place it OR:s in 0xFFFFA500 or something like that, which "damages" the two top bytes.

I have a code right now which looks like this:

public long getUInt32() throws EOFException, IOException {
    byte[] bytes = getBytes(4);
    long value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
    return value & 0x00000000FFFFFFFFL;
}

If I want to convert the four bytes 0x67 0xA5 0x72 0x50 the result is 0xFFFFA567 instead of 0x5072A567.

Edit: This works great:

public long getUInt32() throws EOFException, IOException {
    byte[] bytes = getBytes(4);
    long value = bytes[0] & 0xFF;
    value |= (bytes[1] << 8) & 0xFFFF;
    value |= (bytes[2] << 16) & 0xFFFFFF;
    value |= (bytes[3] << 24) & 0xFFFFFFFF;
    return value;
}

But isn't there a better way to do this? 10 bit-operations seems a "bit" much for a simple thing like this.. (See what I did there?) =)

3
  • If the variable you are using is long, then the ALU will always perform the operation on 64 bits. If the variable is int, the ALU always does operations on 32 bits (and leaves the other 32 bits of the ALU capability unused). Operations on a byte most likely leave 58 bits of the ALU unused. These operations always take place in one clock cycle, so not a "bit" of good saying 10 bits are too many. Commented Nov 2, 2012 at 22:11
  • Nope, your working implementation is exactly the right approach. Commented Nov 2, 2012 at 22:48
  • 1
    You don't need the last bitwise and operation in your code above: value |= (bytes[3] << 24) & 0xFFFFFFFF; Commented Mar 13, 2014 at 13:11

2 Answers 2

6

A more regular version converts the bytes to their unsigned values as integers first:

public long getUInt32() throws EOFException, IOException {
    byte[] bytes = getBytes(4);
    long value = 
        ((bytes[0] & 0xFF) <<  0) |
        ((bytes[1] & 0xFF) <<  8) |
        ((bytes[2] & 0xFF) << 16) |
        ((long) (bytes[3] & 0xFF) << 24);
    return value;
}

Don't get hung up on the number of bit operations, most likely the compiler will optimize those to byte operations.

Also, you shouldn't be using long for 32-bit values just to avoid the sign, you can use int and ignore the fact that it is signed most of the time. See this answer.

Update: The cast to long for the most significant byte is needed, because its most significant bit would otherwise be shifted into the sign bit of a 32-bit integer, potentially making it negative.

Sign up to request clarification or add additional context in comments.

1 Comment

need to use ((long)(bytes[+3] & 0xFF) << 24); for right sign
3

You've got the right idea, I don't think there's any obvious improvement. If you look at the java.io.DataInput.readInt spec, they have code for the same thing. They switch the order of << and &, but otherwise standard.

There is no way to read an int in one go from a byte array, unless you use a memory-mapped region, which is way overkill for this.

Of course, you could use a DataInputStream directly instead of reading into a byte[] first:

DataInputStream d = new DataInputStream(new FileInputStream("myfile"));
d.readInt();

DataInputStream works on the opposite endianness than you are using, so you'll need some Integer.reverseBytes calls also. It won't be any faster, but it's cleaner.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.