1

I was wondering if git (at least in theory) would allow for a given file to be composed of multiple blobs.

This would be useful in situations such as:

commit-1: composed of big file F.
commit-2: edit on F, one line in its contents was edited.

If this were to happen, git could break down the original blob of F into 3 blobs, make commit-1 point to those three blobs (let's call them A, B and C) and now make commit-2 point to blobs A, B' and C. This in certain pathological scenarios could potentially save gigas in memory / disk-space.

From my understanding of git trees and blobs, git was not designed in such a way. Am I missing something?

Thanks

2 Answers 2

1

You are not. git was not really designed for dealing with large files and its storage mechanism shows it. Even Git LFS will not help here, regrettably.

Initially your new file will be written as a loose object - which is a zlib compressed full blob, even if it’s only a 1 byte change of an existing blob.

Eventually this file will be stored in a packfile where it may be delta compressed with adjacent blobs, but there is no guarantee.

You could build a custom storage backend in libgit2, adding your own mechanism that’s efficient for your own known data format. But you will not have any compatibility with command line git, so this would be an unfortunate situation for most uses.

Sign up to request clarification or add additional context in comments.

Comments

0

Git uses delta compression to pack blobs effectively. https://en.wikipedia.org/wiki/Delta_encoding#Git

3 Comments

That seems to only encompass data transfer from server to client if I understood it correctly. It does not affect in any way the way blobs are stored locally. Am I correct?
Your files will eventually be packed locally. You could force the creation of a packfile from local content but a) you’d need to create a loose object first which would be the full size of the file before it was delta compressed into a packfile, b) that’s a lot of unnecessary cpu, and c) there’s no guarantee your blobs would even get deltad.
@devouredelysium No, not only data transfer. Just read my and following links. The repack operation is typically performed as part of the "git gc" process, and gc automatically triggered by default.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.