2

I have a binary file which consists of Delphi records. The record looks like:

TRMapFileHeader = record
    FileType: String[8];
    Points: Int64;
    Objects: Int64;
    Text: Int64;
    ObjLayers: byte;
    TextLayers: byte;
  end;

I want to read this file in Java. I opened the file:

DataInputStream file = new DataInputStream(new FileInputStream(filename))

and then I've tried to read data:

for(int i = 0; i<8; i++)
    System.out.print((char)file.readByte());
System.out.println();
System.out.println(file.readLong());
System.out.println(file.readLong());
System.out.println(file.readLong());
System.out.println(file.readByte());
System.out.println(file.readByte());

and I've got

eclipse output

instead of correct data which are:

RMF
441434
80457
14186
11
4

I played with different ways of reading and found out the next:

System.out.println(file.readByte());
for(int i = 0; i<3; i++)
    System.out.print((char)file.readByte());

for(int i = 0; i<36; i++)
    file.readByte();

System.out.println();
System.out.println(file.readByte());
System.out.println(file.readByte());

gives the next output: Eclipse output. First byte equals 3, then goes 3 characters, then 36 bytes and then last 2 parameters of record

So I'm wondering how to read this kind of records

6
  • Consider using packed records in Delphi so you don't have to deal with alignment. Commented Oct 27, 2013 at 21:30
  • Why would you use packed records? That will cause breakage elsewhere if you to reuse the record somewhere else. Commented Oct 28, 2013 at 0:32
  • @MarcusAdams Hmm, not sure about that. Using records to binary blit data is so 1970s! BinaryWriter/BinaryReader, for example, would make more sense these days. Packing records just makes the performance suck. Commented Oct 28, 2013 at 13:03
  • i wonder why not just take ANY hex editor/viewer out there and parse the file using trials and errors, then recreate the parsing in java Commented Oct 29, 2013 at 11:41
  • @Arioch'The Well, I guess trial and error is what you might resort to if you could not work it out from first principles. But how would you know for sure that you had got it right. If you tossed a coin and got H,T,H,T,H,T you might conclude that coin tossing results in an alternating sequence. Commented Oct 29, 2013 at 17:01

1 Answer 1

6

The Delphi type String[8] is a short string. Its implementation contains an extra lead byte containing the length of the string. So, the size of String[8] is 9 bytes.

You'll need to read the first byte to find the length, and then the next 8 bytes for the payload. Remember that the first byte tells you how many of the subsequent 8 bytes carry meaning.

The other thing to watch out for is alignment. As described in the question, the record would appear to be aligned. Whether or not it is depends upon the Delphi compiler settings. It's possible that the Delphi compiler was instructed to pack the records.

Let's assume not. In other words, let us assume that the record is aligned. In order for the fields to be aligned correctly, the Int64 fields will be aligned on 8 byte boundaries. Which means that the layout of the record will look this this:

Offset  Length  Field
 0      9       FileType, 1 byte length, 8 bytes payload
 9      7       <padding>
16      8       Points
24      8       Objects
32      8       Text
40      1       ObjLayers
41      1       TextLayers
42      6       <padding>

The total length of the record is 48 due to the padding at the end of the record. This will be important because if you don't skip over the padding at the end of the record, you'll be at the wrong place to read whatever comes next in the file.

A cursory examination of your output would indicate that the record is indeed aligned rather than packed. Your second block of code reads 40 bytes, and then the next two bytes (at offsets 41 and 42) are 11 and 4 which matches my table above.

One final point to note is that it is likely that the Delphi that generated these files uses little endian integers. Java is big endian (I believe), and so you'll need to perform a little to big endian conversion on the integer fields. For example using java.nio.ByteBuffer.

Let's check out this hypothesis. You state that the three longs that you read have these values:

6538107356104884224
5276531012929585152
7653586091739447296

And converted to hex we have:

5ABC060000000000
493A010000000000
6A37000000000000

Let's reverse the bytes (skipping the leading zero bytes):

6BC5A
13A49
376A

which in decimal are

441434
80457
14186

And those are your desired values. Phew, we got there in the end!

Sign up to request clarification or add additional context in comments.

5 Comments

System.out.println(file.readByte()); for(int i = 0; i<8; i++) System.out.print((char)file.readByte()); for(int i = 0; i<7; i++) file.readByte(); System.out.println(); System.out.println(file.readLong()); System.out.println(file.readLong()); System.out.println(file.readLong()); System.out.println(file.readByte()); System.out.println(file.readByte()); Didn't help. The last two parameters are fine but 'Points', 'Objects' and 'Text' are 6538107356104884224, 5276531012929585152, 7653586091739447296
I read 1 byte, then 8, then 7(padding) then 3 times LongInt(3 x 8 bytes) and last 2 bytes (which are correct '11' and '4') but those 3 LongInt values are wrong.
OK, answer updated. Only plausible explanation is endianness. I guess Java is big endian.
Can you please explain about padding? Why 7 and 6 and why are they where they are? Or can you share the link to the literature?
In simple terms, padding is added so that fields start at an offset that is an exact multiple of the field's type's alignment. An Int64 has alignment 8 and so needs to be placed at offset 0 or 8 or 16 and so on. Read the Wikipedia topic on padding and alignment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.