Basically I want a "multiline grep that takes binary strings as patterns".
For example:
printf '\x00\x01\n\x02\x03' > big.bin
printf '\x01\n\x02' > small.bin
printf '\x00\n\x02' > small2.bin
Then the following should hold:
small.binis contained inbig.binsmall2.binis not contained inbig.bin
I don't want to have to convert the files to ASCII hex representation with xxd as shown e.g. at: https://unix.stackexchange.com/questions/217936/equivalent-command-to-grep-binary-files because that feels wasteful.
Ideally, the tool should handle large files that don't fit into memory.
Note that the following attempts don't work.
grep -f matches where it shouldn't because it must be splitting newlines:
grep -F -f small.bin big.bin
# Correct: Binary file big.bin matches
grep -F -f small2.bin big.bin
# Wrong: Binary file big.bin matches
Shell substitution as in $(cat) fails because it is impossible to handle null characters in Bash AFAIK, so the string just gets truncated at the first 0 I believe:
grep -F "$(cat small.bin)" big.bin
# Correct: Binary file big.bin matches
grep -F "$(cat small2.bin)" big.bin
# Wrong: Binary file big.bin matches
A C question has been asked at: How can i check if binary file's content is found in other binary file? but is it possible with any widely available CLI (hopefully POSIX, or GNU coreutils) tools?
Notably, implementing an non-naive algorithm such as Boyer-Moore is not entirely trivial.
I can hack up a working Python one liner as follows, but it won't work for files that don't fit into memory:
grepbin() ( python -c 'import sys;sys.exit(not open(sys.argv[1]).read() in open(sys.argv[2]).read())' "$1" "$2" )
grepbin small.bin big.bin && echo 1
grepbin small2.bin big.bin && echo 2
I could also find the following two tools on GitHub:
https://github.com/tmbinc/bgrep in C, installable with (amazing :-)):
curl -L 'https://github.com/tmbinc/bgrep/raw/master/bgrep.c' | gcc -O2 -x c -o /usr/local/bin/bgrep -https://github.com/gahag/bgrep in Rust, installable with:
cargo install bgrep
but they don't seem so support taking the pattern from a file, you provide the input as hex ASCII on the command line. I could use:
bgrep $(xxd -p small.bin | tr -d '\n') big.bin
since it does not matter as much if the small file gets converted with xxd, but it's not really nice.
In any case, if I were to implement the feature, I'd likely it to the Rust library above.
bgrep is also mentioned at: How does bgrep work?
Tested on Ubuntu 20.10.
rep -f matches where it shouldn't because it must be splitting newlines:and also it's parsing regex.grep "$(cat small.bin)"fails not only for zero bytes.grepexpects a regex. Note that youris it possible with any widely available CLIis falling into "seeking recommendation of tools" bin.-Ffor non regex. If it closes, I post in other places, the usual procedure."$(cat file-with-nulls)"is setting yourself up for failure, since NULs can't be stored in C strings, and all strings in bash are NUL-delimited C strings. For that matter, I'd be very surprised -- nay, astonished -- ifgrepused strings capable of containing NUL literals.greploads the whole regex pattern into memory. I don't think there's a tool that searches a binary byte stream (not byte buffer...) within a binary file. If the small file fits in memory and the large file is huge, you can write a simple C program that reads the small file, uses its size as a block size, reads blocks from the large file, and tries to find the small file withmemmem().