3

My problem is, that I need to load a binary file and work with single bits from the file. After that I need to save it out as bytes of course.

My main problem is - what datatype to choose to work in - char or long int? Can I somehow work with chars?

8
  • 2
    Btw, how long is your file? Is it really necessary to think about optimalization already? And do you have to change single bytes or are the 'single bits' chunks of bytes? Commented Mar 9, 2012 at 14:30
  • 5
    @Deepak: Using ints to parse binary data is just asking for endianness problems. Commented Mar 9, 2012 at 14:30
  • It depends on what operations he wants to do, ANDing 8 chars is equal to one int operation.(x64) Commented Mar 9, 2012 at 14:34
  • Deepak: sizeof(long int) is not always the same as sizeof(int). It's certainly not on the setup I'm typing this on. Commented Mar 9, 2012 at 14:38
  • @Deepak: when its the same, then why sizefo(long int) != sizeof(int) here? Commented Mar 9, 2012 at 14:41

6 Answers 6

6

Unless performance is mission-critical here, use whatever makes your code easiest to understand and maintain.

Sign up to request clarification or add additional context in comments.

4 Comments

Disregard my answer, this is rule #1
+1 And do not reinvent the wheel if possible, if you do not have to work with a predefined serialization format, don't go invent one.
Agree, even though it is so fun reinventing the wheel. "Look, mine is squared"
It's possible that a clarified question could invite a more detailed recommendation. Not clear to me that this needs to be overthought from the info to hand, though.
5

Before beginning to code any thing make sure you understand endianess, c++ type sizes, and how strange they might be.

The unsigned char is the only type that is a fixed size (natural byte of the machine, normally 8 bits). So if you design for portability that is a safe bet. But it isn't hard to just use the unsigned int or even a long long to speed up the process and use size_of to find out how many bits you are getting in each read, although the code gets more complex that way.

You should know that for true portability none of the internal types of c++ is fixed. An unsigned char might have 9 bits, and the int might be as small as in the range of 0 to 65535, as noted in this and this answer

Another alternative, as user1200129 suggests, is to use the boost integer library to reduce all these uncertainties. This is if you have boost available on your platform. Although if going for external libraries there are many serializing libraries to choose from.

But first and foremost before even start optimizing, make something simple that work. Then you can start profiling when you start experiencing timing issues.

2 Comments

Yeah, world of programming gets strange at once you start exploring alien platforms ;)
You can use boost integer.hpp for portable int types. For example, if you need to ensure you get 64 signed bits, you can use boost::int64_t across different compilers and operating systems and you'll always get the type you expect. This is especially important when you need to reinterpret_cast data.
3

It really just depends on what you are wanting to do, but I would say in general, the best speed will be to stick with the size of integers that your program is compiled in. So if you have a 32 bit program, then choose 32 bit integers, and if you have 64 bit, choose 64 bit.

This could be different if there are some bytes in your file, or if there are integers. Without knowing the exact structure of your file, it's difficult to determine what the optimal value is.

Comments

1

Your sentences are not really correct English, but as far as I can interpret the question you can beter use unsigned char (which is a byte) type to be able to modify each byte separately.

Edit: changed according to comment.

6 Comments

What's an unsigned byte? byte is an unsigned char.
Now it is somewhat proper English. :)
Since there is no definition for byte in C, you can't say if it's signed or not.
@Michel you edited it the wrong way round. you were looking for unsigned char.
Fixed (Friday Afternoon Syndrome)
|
1

If you are dealing with bytes then the best way to do this is to use a size specific type.

#include <algorithm>
#include <iterator>
#include <cinttypes>
#include <vector>
#include <fstream>

int main()
{
     std::vector<int8_t> file_data;
     std::ifstream file("file_name", std::ios::binary);

     //read
     std::copy(std::istream_iterator<int8_t>(file),
               std::istream_iterator<int8_t>(),
               std::back_inserter(file_data));

     //write
     std::ofstream out("outfile");           
     std::copy(file_data.begin(), file_data.end(),
               std::ostream_iterator<int8_t>(out));

}

EDIT fixed bug

2 Comments

the uint8_t are not guaranteed to be defined for all systems. But it much more clearly states the intent of the use.
The C99 standard has been around a long time, and almost all systems have <stdint.h>. (I can't think of one that doesn't, honestly. It's one of the easiest headers ever to provide.) The C++ equivalent might not be there, but that's easily worked around.
1

If you need to enforce how many bits are in an integer type, you need to be using the <stdint.h> header. It is present in both C and C++. It defines type such as uint8_t (8-bit unsigned integer), which are guaranteed to resolve to the proper type on the platform. It also tells other programmers who read your code that the number of bits is important.

If you're worrying about performance, you might want to use the larger-than-8-bits types, such as uint32_t. However, when reading and writing files, you will need to pay attention to the endianess of your system. Notably, if you have a little-endian system (e.g. x86, most all ARM), then the 32-bit value 0x12345678 will be written to the file as the four bytes 0x78 0x56 0x34 0x12, while if you have a big-endian system (e.g. Sparc, PowerPC, Cell, some ARM, and the Internet), it will be written as 0x12 0x34 0x56 0x78. (same goes or reading). You can, of course, work with 8-bit types and avoid this issue entirely.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.