1

i need to check if content in a binary file in in other binary file.

i've tried to copy both files content into a array of chars with fread and check them with strstr, but strstr is always returning NULL even if the content supposed to be found in the other file.

Any ideas?

Thanks.

4
  • 1
    you can't use str*() functions on binary data - the binary data will naturally contain nulls, which will terminate the string operations. Commented May 15, 2015 at 16:52
  • 1
    strstr works only if you provide null terminated strings. Commented May 15, 2015 at 16:52
  • You apparently fail to understand what strstr() does, it expects a nul terminated sequence of bytes, which yours can be or maybe not, so you can't use strstr() in this case. Commented May 15, 2015 at 16:53
  • @user3121023 using memcmp will end up with O(kn) time complexity, where k and n are file sizes.. Commented May 15, 2015 at 17:00

2 Answers 2

2

Since the strstr function won't work here for an arbitrary binary data (it is working only for strings with \0. termination), I can see three approaches here:
1) Naive approach: iterate over one array of bytes, and use memcmp with the other array starting at different positions each time. Easy, but consumes O(k*n) time (k, n - sizes of the data).
2) Using the KMP algorithm. Requires some work on understanding and coding, but giving the best time complexity O(k+n).
3) If the performance is not important, and you don't want to mess with ANY somewhat non-trivial algorithms:
-- Convert your binary datas to strings, representing each byte with it's two digits HEX value.
-- Use strstr.

Update: After a little thinking about the third approach, there might be a case when it won't work right. Consider that you want to find the data represented by AA AA inside 1A AA A1. It shouldn't be found, since it is not there. But, if you represent the data as concatenated characters without delimiters, it will be like find AAAA in 1AAAA1, which will succeed. So adding some delimiter would be a good idea here.

Sign up to request clarification or add additional context in comments.

Comments

1

Do it yourself (notify me if there's a bug):

/* Returns location of substring in string. If not found, return -1.
 * ssize_t is defined by POSIX. */
ssize_t bin_strstr(void* data, size_t len, void* subdata, size_t sublen) {
    len -= sublen;
    for ( ; len >= 1; --len)
        if (memcmp(data + len, subdata, sublen) == 0)
            return len;
    return memcmp(data, subdata, sublen) ? 0 : -1;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.