0

I have a program that reads from a really big binary file (48 MB) and then passes the data to a matrix of custom structs named pixel:

struct pixel {
    int r;
    int g;
    int b;
};

Opening the file:

ifstream myFile(inputPath, ios::binary);
pixel **matrixPixel;

The read of the file is done this way:

int position = 0;

for (int i = 0; i < HEIGHT; ++i) {
        for (int j = 0; j < WIDTH; ++j) {
            if (!myFile.eof()) {
                myFile.seekg(position, ios::beg);
                myFile.read((char *) &matrixPixel[i][j].r, 1); // red byte
                myFile.seekg(position + HEIGHT * WIDTH, ios::beg);
                myFile.read((char *) &matrixPixel[i][j].g, 1); // green byte
                myFile.seekg(position + HEIGHT * WIDTH * 2, ios::beg);
                myFile.read((char *) &matrixPixel[i][j].b, 1); // blue byte
                ++position;
            }
        }
    }
myFile.close();

The thing is that, for a big file like the one at the beginning, it takes a lot of time (~7 min) and it's supposed to be optimized. How could I read from the file in less time?

8
  • 1
    How did you come up with that seekg business? No wonder that's slow. Commented Nov 14, 2016 at 17:16
  • did u try just bit blitting it, seeking one per rgb triplet and reading all 3 in one IO. 3 int probably aligned OK Commented Nov 14, 2016 at 17:18
  • 4
    Anyway, you don't have to seekg, as @BaummitAugen said. It makes much, much,much more sense to access the file sequentially and jump around your matrixPixel instead of trying to jump around your file. Commented Nov 14, 2016 at 17:19
  • Really what you should do is store all of the pixles in a flat array/vector and then you can read them all in at one time with a read call. Commented Nov 14, 2016 at 17:19
  • 1
    if (!myFile.eof()) isn't a so clever idea. Commented Nov 14, 2016 at 17:19

2 Answers 2

7

So, the structure of the data you're storing in memory looks like this:

rgbrgbrgbrgbrgbrgbrgbrgbrgbrgb..............rgb

But the structure of the file you're reading looks like this (assuming your code's logic is correct):

rrrrrrrrrrrrrrrrrrrrrrrrrrr....
ggggggggggggggggggggggggggg....
bbbbbbbbbbbbbbbbbbbbbbbbbbb....

And in your code, you're translating between the two. Fundamentally, that's going to be slow. And what's more, you've chosen to read the file by making manual seeks to arbitrary points in the file. That's going to slow things down even more.

The first thing you can do is streamline the Hard Disk reads:

for(int channel = 0; channel < 3; channel++) {
    for (int i = 0; i < HEIGHT; ++i) {
        for (int j = 0; j < WIDTH; ++j) {
            if (!myFile.eof()) {
                switch(channel) {
                    case 0: myFile.read((char *) &matrixPixel[i][j].r, 1); break;
                    case 1: myFile.read((char *) &matrixPixel[i][j].g, 1); break;
                    case 2: myFile.read((char *) &matrixPixel[i][j].b, 1); break;
                }
            }
        }
    }
}

That requires the fewest changes to your code, and will speed up your code, but the code will probably still be slow.

A better approach, which increases CPU use but dramatically reduces Hard Disk use (which, in the vast majority of applications, will result in a speed-up), would be to store the data like so:

std::vector<unsigned char> reds(WIDTH * HEIGHT);
std::vector<unsigned char> greens(WIDTH * HEIGHT);
std::vector<unsigned char> blues(WIDTH * HEIGHT);

myFile.read(reds.data(), WIDTH * HEIGHT); //Stream can be checked for errors resulting from EOF or other issues.
myFile.read(greens.data(), WIDTH * HEIGHT);
myFile.read(blues.data(), WIDTH * HEIGHT);

std::vector<pixel> pixels(WIDTH * HEIGHT);

for(size_t index = 0; index < WIDTH * HEIGHT; index++) {
    pixels[index].r = reds[index];
    pixels[index].g = greens[index];
    pixels[index].b = blues[index];
}

The final, best approach, is to change how the binary file is formatted, because the way it appears to be formatted is insane (from a performance perspective). If the file is reformatted to the rgbrgbrgbrgbrgb style (which is far more standard in the industry), your code simply becomes this:

struct pixel {
    unsigned char red, green, blue;
}; //You'll never read values above 255 when doing byte-length color values.
std::vector<pixel> pixels(WIDTH * HEIGHT);
myFile.read(reinterpret_cast<char*>(pixels.data()), WIDTH * HEIGHT * 3);

This is extremely short, and is probably going to outperform all the other methods. But of course, that may not be an option for you.

I haven't tested any of these methods (and there may be a typo or two) but all of these methods should be faster than what you're currently doing.

Sign up to request clarification or add additional context in comments.

4 Comments

The format is sane, if it is (say) an astronomical picture taken through three filters, and the full image has been formed by concatenating the "red", "green", and "blue" images.
The first thing will probably reduce the time to read to pretty much the minimum.
@MartinBonner Reading in bulk, like the second and third examples do, will dramatically reduce read speeds. Reading one character at a time, even sequentially, is slower than reading in bulk.
@MartinBonner thanks, the first one is way faster. I'm still having some issues with the second version, but would it be the same for writing?
0

A faster method would be to read the bitmap into a buffer:

uint8_t buffer[HEIGHT][WIDTH];
const unsigned int bitmap_size_in_bytes = sizeof(buffer);
myFile.read(buffer, bitmap_size_in_bytes);

An even faster method is to read more than one bitmap into memory.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.