Performance reading binary files

Question

I have a program that reads from a really big binary file (48 MB) and then passes the data to a matrix of custom structs named pixel:

struct pixel {
    int r;
    int g;
    int b;
};

Opening the file:

ifstream myFile(inputPath, ios::binary);
pixel **matrixPixel;

The read of the file is done this way:

int position = 0;

for (int i = 0; i < HEIGHT; ++i) {
        for (int j = 0; j < WIDTH; ++j) {
            if (!myFile.eof()) {
                myFile.seekg(position, ios::beg);
                myFile.read((char *) &matrixPixel[i][j].r, 1); // red byte
                myFile.seekg(position + HEIGHT * WIDTH, ios::beg);
                myFile.read((char *) &matrixPixel[i][j].g, 1); // green byte
                myFile.seekg(position + HEIGHT * WIDTH * 2, ios::beg);
                myFile.read((char *) &matrixPixel[i][j].b, 1); // blue byte
                ++position;
            }
        }
    }
myFile.close();

The thing is that, for a big file like the one at the beginning, it takes a lot of time (~7 min) and it's supposed to be optimized. How could I read from the file in less time?

How did you come up with that seekg business? No wonder that's slow. — Baum mit Augen
– Baum mit Augen ♦, Commented Nov 14, 2016 at 17:16
did u try just bit blitting it, seeking one per rgb triplet and reading all 3 in one IO. 3 int probably aligned OK — pm100
– pm100, Commented Nov 14, 2016 at 17:18
Anyway, you don't have to seekg, as @BaummitAugen said. It makes much, much,much more sense to access the file sequentially and jump around your matrixPixel instead of trying to jump around your file. — Marcus Müller
– Marcus Müller, Commented Nov 14, 2016 at 17:19
Really what you should do is store all of the pixles in a flat array/vector and then you can read them all in at one time with a read call. — NathanOliver
– NathanOliver, Commented Nov 14, 2016 at 17:19

Xirema · Accepted Answer · 2016-11-14 18:45:35Z

7

So, the structure of the data you're storing in memory looks like this:

rgbrgbrgbrgbrgbrgbrgbrgbrgbrgb..............rgb

But the structure of the file you're reading looks like this (assuming your code's logic is correct):

rrrrrrrrrrrrrrrrrrrrrrrrrrr....
ggggggggggggggggggggggggggg....
bbbbbbbbbbbbbbbbbbbbbbbbbbb....

And in your code, you're translating between the two. Fundamentally, that's going to be slow. And what's more, you've chosen to read the file by making manual seeks to arbitrary points in the file. That's going to slow things down even more.

The first thing you can do is streamline the Hard Disk reads:

for(int channel = 0; channel < 3; channel++) {
    for (int i = 0; i < HEIGHT; ++i) {
        for (int j = 0; j < WIDTH; ++j) {
            if (!myFile.eof()) {
                switch(channel) {
                    case 0: myFile.read((char *) &matrixPixel[i][j].r, 1); break;
                    case 1: myFile.read((char *) &matrixPixel[i][j].g, 1); break;
                    case 2: myFile.read((char *) &matrixPixel[i][j].b, 1); break;
                }
            }
        }
    }
}

That requires the fewest changes to your code, and will speed up your code, but the code will probably still be slow.

A better approach, which increases CPU use but dramatically reduces Hard Disk use (which, in the vast majority of applications, will result in a speed-up), would be to store the data like so:

std::vector<unsigned char> reds(WIDTH * HEIGHT);
std::vector<unsigned char> greens(WIDTH * HEIGHT);
std::vector<unsigned char> blues(WIDTH * HEIGHT);

myFile.read(reds.data(), WIDTH * HEIGHT); //Stream can be checked for errors resulting from EOF or other issues.
myFile.read(greens.data(), WIDTH * HEIGHT);
myFile.read(blues.data(), WIDTH * HEIGHT);

std::vector<pixel> pixels(WIDTH * HEIGHT);

for(size_t index = 0; index < WIDTH * HEIGHT; index++) {
    pixels[index].r = reds[index];
    pixels[index].g = greens[index];
    pixels[index].b = blues[index];
}

The final, best approach, is to change how the binary file is formatted, because the way it appears to be formatted is insane (from a performance perspective). If the file is reformatted to the rgbrgbrgbrgbrgb style (which is far more standard in the industry), your code simply becomes this:

struct pixel {
    unsigned char red, green, blue;
}; //You'll never read values above 255 when doing byte-length color values.
std::vector<pixel> pixels(WIDTH * HEIGHT);
myFile.read(reinterpret_cast<char*>(pixels.data()), WIDTH * HEIGHT * 3);

This is extremely short, and is probably going to outperform all the other methods. But of course, that may not be an option for you.

I haven't tested any of these methods (and there may be a typo or two) but all of these methods should be faster than what you're currently doing.

edited Nov 14, 2016 at 18:45

answered Nov 14, 2016 at 17:34

Xirema

19.9k4 gold badges37 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Martin Bonner supports Monica Over a year ago

The format is sane, if it is (say) an astronomical picture taken through three filters, and the full image has been formed by concatenating the "red", "green", and "blue" images.

Martin Bonner supports Monica Over a year ago

The first thing will probably reduce the time to read to pretty much the minimum.

Xirema Over a year ago

@MartinBonner Reading in bulk, like the second and third examples do, will dramatically reduce read speeds. Reading one character at a time, even sequentially, is slower than reading in bulk.

danielsto Over a year ago

@MartinBonner thanks, the first one is way faster. I'm still having some issues with the second version, but would it be the same for writing?

Thomas Matthews · Accepted Answer · 2016-11-14 17:35:10Z

0

A faster method would be to read the bitmap into a buffer:

uint8_t buffer[HEIGHT][WIDTH];
const unsigned int bitmap_size_in_bytes = sizeof(buffer);
myFile.read(buffer, bitmap_size_in_bytes);

An even faster method is to read more than one bitmap into memory.

answered Nov 14, 2016 at 17:35

Thomas Matthews

58.1k18 gold badges105 silver badges165 bronze badges

Collectives™ on Stack Overflow

Performance reading binary files

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related