2

I have a doubt regarding binary I/O for portability of the binary file. Lets say the PC running my software uses 8 bytes for storing double variable. The binary file generated will have 8 bytes for a double variable. Now say the file is being opened in a PC which uses 6 bytes for a double variable (just assuming). Then the application will read only 6 bytes from the file and store it in the double variable in memory. Not only does this result in underflow/overflow of data but also the data read after the double will definitely be incorrect because of the 2 byte offset created due to under reading. I want to support my application for not only 32/64 bit, but also Windows, Ubuntu PC's. So how do you make sure that the data read from the same file in any PC would be the same?

7
  • 1
    There are more incompatibilities like endianess for example! Best is to use text or other standard formats. Commented Jan 10, 2014 at 14:02
  • Look at serialization/de-serialization. Commented Jan 10, 2014 at 14:02
  • The size of data is too large to use a text file, typically in Giga Bytes. Reading writing these many double variables in text takes humongous time. Hence moving on to binary files. Commented Jan 10, 2014 at 17:49
  • 1
    Doubles are an IEEE standard, so they're guaranteed to always be 64 bits. Size is not an issue here, just endianness (a much less likely problem to run into). Commented Jan 10, 2014 at 19:41
  • 1
    @Cool_Coder Only a few types have fixed sizes, perhaps because they didn't originate in C. The integer types and most of those basic ones have, as you've seen, specified ranges they can fall into. Floats and doubles, however, are IEEE standards and have very specific layouts and sizes, which should never change (particularly since processors often have registers for them now, which follows the standard). They are something of an exception, but may be a useful one to you in this case. Commented Jan 12, 2014 at 2:55

3 Answers 3

2

In general, you should wrap data to be stored in binaries in your own data structures and implement platform independent read/write operations for those data structures - basically, size of binary data structure written to disk should be same for all platforms (max possible size of elementary data over all supported platforms).

When writing data on platform with smaller data size, data should be padded with extra 0 bytes to ensure size of recorded data stays same.

When reading, whole data can be read in fixed data blocks of known size, and conversion should be performed depending on platform it was written/it is being read on. This should take care of endianess too. You may want to include some header indicating sizes of data to distinguish between files recorded on different platforms when reading them.

this would give truly platform independent serialization for binary file.

Example for doubles

class CustomDouble
{
public:
double val;
static const int DISK_SIZE;

void toFile(std::ofstream &file)
{
   int bytesWritten(0);
   file.write(reinterpret_cast<const char*>(&val),sizeof(val));
   bytesWritten+=sizeof(val);
   while(bytesWritten<CustomDouble::DISK_SIZE)
   { 
      char byte(0);
      file.write(&byte,sizeof(byte));
      bytesWritten+=sizeof(byte);
   }

}
};
const int CustomDouble::DISK_SIZE = 8;

This ensures you always write 8 bytes regarding of size of double on your platform. When you read the file, you always read those 8 bytes still as binary, and do conversions if necessary depending whioch platform it was written on/ is being read on (you will probably add some small header to the file to identify platform it was recorded on)

While custom conversion does add some overhead, it is way less then those of storing values as text, and normally you will only perform conversions for incompatible platforms, while for same platform there will be no overhead.

Sign up to request clarification or add additional context in comments.

3 Comments

I am storing 2D array of double variables in my application. Hence the I/O is with the standard double variables. So you mean I should hard code a standard size for double for my application. Then depending on the environment the file I/O can be adjusted to suit this hard coded size? But this will result in underflow and overflow of data. The whole point of using binary files was to avoid conversion, otherwise I would have been happy with text files.
But you are assuming that the maximum size a double can have is 8 bytes. Is this correct?
@Cool_Code Yes, but just as example. It should be max possible size for supported platform. It is probably good idea to also put static_assert to compare sizeof(double) with CustomDouble::DISK_SIZE to prevent project from compiling on platform with higher size of double.
1

cstdint includes type definitions that are a fixed size, so int32_t will always be 4 bytes long. You can use these in place of regular types when the size of the type is important to you.

2 Comments

I am using double variables to store data. Any solution for this?
Doubles have a fixed size, so you're already doing this (in essence). For any other types, specifying the size will help.
1

Use Google Protocol Buffers or any other cross-platform serialization library. You can also roll out your own solution, based on fact, that char is guaranteed to be 1 byte (i.e. serialize anything into char arrays).

2 Comments

char is not guaranteed to by 1 byte IMHO. char is the standard for measuring the size of all other data types i.e. sizeof double = 8 * sizeof char. char may != 1 bytes.
And exactly how does Google Protocal Buffers answer my question? I dont think thats what I am asking.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.