3

After reading through:

How to remove a too large file in a commit when my branch is ahead of master by 5 commits

https://help.github.com/en/articles/working-with-large-files

https://rtyley.github.io/bfg-repo-cleaner/

https://help.github.com/en/articles/removing-sensitive-data-from-a-repository

Show commit size in git log

Git - get all commits and blobs they created

I couldn't find an elegant solution of removing commits that exceed a given size (on disk). These commits do not necessarily have large files, but are large in and of themselves (have many ~200 KB dependencies).

How can such commits be removed from the repository?

1
  • 3
    The answer that you linked to starts with: "The "size" of a commit can mean different things. If you mean how much disk storage it takes up... that's very tricky to tell in Git and probably unproductive." In short, there is no "an elegant solution of calculating commit size". Commented May 24, 2019 at 10:34

1 Answer 1

1

First a note :

git compresses files when it stores them in its .git/ structure, and tries to store similar files using only their diffs ;

in that sense, it is difficult to spot "what commit uses up the most space in my .git/ folder".


If you want to measure how much space the files in a commit take up when checked out :

git ls-tree -r -l <commitid>

will list the files along with their individual sizes

git ls-tree -r -l <commitid> | awk '{ sum += $4 } END { print sum }'

will print the total size of these files.


You can put the above shortcut in a script and see what commits take up more than xx bytes, the next thing is : can you get rid of said commits ?

You may tell git to delete the end of a branch :

If all 'B's mark 'big commits' :

               +-- create a new branch here
               v
*--*--*--*--*--*--B--B--B--B <- branchA
    \              \
     \              \-B--B <- branchB
      \
       *--*--*--* <- branchC
              \
               \--B <- branchD

In the above diagram, you can tell git to forget branchA, branchB and branchD (and possibly create a new reference to keep the first "no so big" commits),

but when a commit appears in the middle of a branch :

*--*--B--B--*--* <- branchE

your notion of "delete the two Bs" depends heavily on what is stored in your git repo and how you can remove these commits from a branch's history.

The general advice is : do not delete commits.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. In my case "B" commits are interleaved with normal sized commits (and are not located at the end of the branch (I added a filter in .gitignore for dependencies (with a few exceptions)). I agree that commits shouldn't be (generally) deleted, but in this case they pollute the repo (and are not critical to the project).
@Sebi : ok. Do you see how to edit the history of your repo ?
Yes. I can remove the commits individually (I'll write a bash script for this).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.