1

I have a binary file (not a text file), about 20M in size, and I have a string which may or may not exist in that file. Normally (for a text file), I would use getline() to read the file line by line and then use find to detect it, something like:

bool found = false;
{
    std::string stringToLookFor("string to look for");
    std::ifstream ifs("myBinaryFile.bin");
    std::string line;
    while (!found && getline(ifs, line)) {
        found = (line.find(stringToLookFor, 0) != std::string::npos);
    }
    ifs.close();
}

However, I'm unsure if that's a wise thing to do for a binary file. My main concern is that the "lines" for such a file may be large. It may be that the entire 20M file contains no newlines so I may end up reading in a rather large string to search (there may well be other problems with this approach as well, hence my question).

Is this considered a viable approach or am I likely to run into problems? Is there a better way to search binary files than the normal textual line-by-line?

11
  • You could iterate over the characters in the file and advance the iterator after reading enough successive characters to disambiguate the string you are looking for. This is how compilers tokenise source code. Commented Nov 2, 2019 at 10:10
  • Does this answer your question? C++ searching text file for a particular string and returning the line number where that string is on Commented Nov 2, 2019 at 10:30
  • 1
    @JHBonarius: not really, no. I had actually looked at that one but it asks about how to search text files. I specifically made mention in this question the concerns I had on that. Commented Nov 2, 2019 at 11:20
  • 1
    20M is not that much. Why not load the entire file? Commented Nov 2, 2019 at 11:50
  • 1
    @AndriyTylychko, despite my reputation, I'm pretty certain there are others here who know more than me, at least in many areas. Well, I damn well hope so :-) Commented Nov 3, 2019 at 2:47

2 Answers 2

3

I'll bite the bait and try an answer. You are looking for this:

//...
std::ifstream is(file_name, std::ios::binary);
if (!is)
  return -1;
auto res = std::search(std::istream_iterator<char>(is), std::istream_iterator<char>(), pattern.begin(), pattern.end());
//...

It is fast and it is not loading the file all into memory at once. I do not know on what algorithm is based. The faster boyer_moore_searcher``boyer_moore_horspool_searcher cannot be used since it requires random iterators.

Sign up to request clarification or add additional context in comments.

1 Comment

Will check out on Monday. I assume you meant "not loading the file all into memory at once" since I can't figure out how it would search the file if it's never read from the disk at all :-)
1

The simplest and the fastest approach is, how @ZDF suggested in comments, to read the entire file into memory and then to search its content for your string:

#include <fstream>
#include <vector>
#include <algorithm>

std::ifstream ifs(filename, std::ios::binary);
ifs.seekg(0, std::ios::end);
auto size = ifs.tellg();
ifs.seekg(0);
std::vector<char> content(size, '\0');
ifs.read(content.data(), size);
auto res = std::search(content.begin(), content.end(), str.begin(), str.end());

2 Comments

Thanks, I'll check this out Monday. I wasn't keen on loading the entire file at once (this is an embedded platform) but it may be okay.
I`m actually working on this matter, this approach is very slow.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.