I have a doubt regarding binary I/O for portability of the binary file. Lets say the PC running my software uses 8 bytes for storing double variable. The binary file generated will have 8 bytes for a double variable. Now say the file is being opened in a PC which uses 6 bytes for a double variable (just assuming). Then the application will read only 6 bytes from the file and store it in the double variable in memory. Not only does this result in underflow/overflow of data but also the data read after the double will definitely be incorrect because of the 2 byte offset created due to under reading. I want to support my application for not only 32/64 bit, but also Windows, Ubuntu PC's. So how do you make sure that the data read from the same file in any PC would be the same?
-
1There are more incompatibilities like endianess for example! Best is to use text or other standard formats.πάντα ῥεῖ– πάντα ῥεῖ2014-01-10 14:02:30 +00:00Commented Jan 10, 2014 at 14:02
-
Look at serialization/de-serialization.Jarod42– Jarod422014-01-10 14:02:47 +00:00Commented Jan 10, 2014 at 14:02
-
The size of data is too large to use a text file, typically in Giga Bytes. Reading writing these many double variables in text takes humongous time. Hence moving on to binary files.Cool_Coder– Cool_Coder2014-01-10 17:49:24 +00:00Commented Jan 10, 2014 at 17:49
-
1Doubles are an IEEE standard, so they're guaranteed to always be 64 bits. Size is not an issue here, just endianness (a much less likely problem to run into).ssube– ssube2014-01-10 19:41:10 +00:00Commented Jan 10, 2014 at 19:41
-
1@Cool_Coder Only a few types have fixed sizes, perhaps because they didn't originate in C. The integer types and most of those basic ones have, as you've seen, specified ranges they can fall into. Floats and doubles, however, are IEEE standards and have very specific layouts and sizes, which should never change (particularly since processors often have registers for them now, which follows the standard). They are something of an exception, but may be a useful one to you in this case.ssube– ssube2014-01-12 02:55:44 +00:00Commented Jan 12, 2014 at 2:55
3 Answers
In general, you should wrap data to be stored in binaries in your own data structures and implement platform independent read/write operations for those data structures - basically, size of binary data structure written to disk should be same for all platforms (max possible size of elementary data over all supported platforms).
When writing data on platform with smaller data size, data should be padded with extra 0 bytes to ensure size of recorded data stays same.
When reading, whole data can be read in fixed data blocks of known size, and conversion should be performed depending on platform it was written/it is being read on. This should take care of endianess too. You may want to include some header indicating sizes of data to distinguish between files recorded on different platforms when reading them.
this would give truly platform independent serialization for binary file.
Example for doubles
class CustomDouble
{
public:
double val;
static const int DISK_SIZE;
void toFile(std::ofstream &file)
{
int bytesWritten(0);
file.write(reinterpret_cast<const char*>(&val),sizeof(val));
bytesWritten+=sizeof(val);
while(bytesWritten<CustomDouble::DISK_SIZE)
{
char byte(0);
file.write(&byte,sizeof(byte));
bytesWritten+=sizeof(byte);
}
}
};
const int CustomDouble::DISK_SIZE = 8;
This ensures you always write 8 bytes regarding of size of double on your platform. When you read the file, you always read those 8 bytes still as binary, and do conversions if necessary depending whioch platform it was written on/ is being read on (you will probably add some small header to the file to identify platform it was recorded on)
While custom conversion does add some overhead, it is way less then those of storing values as text, and normally you will only perform conversions for incompatible platforms, while for same platform there will be no overhead.
3 Comments
cstdint includes type definitions that are a fixed size, so int32_t will always be 4 bytes long. You can use these in place of regular types when the size of the type is important to you.
2 Comments
Use Google Protocol Buffers or any other cross-platform serialization library. You can also roll out your own solution, based on fact, that char is guaranteed to be 1 byte (i.e. serialize anything into char arrays).