I am looking at compressing some very large strings (text fields) in Ruby before inserting them into database blob fields. Compression by itself is easy, I can just use Zlib.
However, I am also looking at instances when I may have similar copies of strings. Eg. I might have something already stored in the database - stringA. A modification gives me stringB. I want to store a compressed version of the difference between stringA and stringB, so that if I have stringA and the compressed diff, I can get stringB back.
Is there a suitable library for this?
Ideally, it would be a single step binary diff compression. I don't really want a human-readable text diff (which might waste more space). It only needs to be machine readable. Therefore please don't suggest that I compress using diff -u oldFile newFile > mods.diff and patch < mods.diff.
Answer
Edit: Thank you Mark Adler for part of the answer (didn't know there was a set_dictionary method). I want to do this in Ruby, thus, the relevant method name is set_dictionary. However, trying to get this done is much more difficult than without the dictionary.
Without using a dictionary, we can do:
A = "My super string to be compressed. Compress me now to " \
"save the space used to store this super string."
cA = Zlib::Deflate.deflate(A)
# => "x\234U\214\301\r\200 \020\004[\331\nh\302\267E n\224\a\034\271;4v..."
Zlib::Inflate.inflate(cA)
# => "My super string to be compressed. Compress me now to save the..."
But to use a dictionary, you need to make sure to pass Zlib::FINISH to deflate to flush the output, and allow a Zlib::NeedDict exception before adding the dictionary when inflating:
B = "A super string with differences, let's see how much " \
"extra space the differences will take in this super string!"
zlib_deflate = Zlib::Deflate.new
zlib_deflate .set_dictionary(A)
dB = zlib_deflate .deflate(B, Zlib::FINISH)
# => "x\2733\324$\230sD\265\242<\263$C!%3--\265(5/9\265XG!'\265D\035\250..."
zlib_inflate = Zlib::Inflate.new
zlib_inflate.inflate(dB) # Exception thrown
# => Exception: Zlib::NeedDict: need dictionary
zlib_inflate.set_dictionary(A)
zlib_inflate.inflate(dB)
# => "A super string with differences, let's see how much extra space the..."