-3

I want to reduce the size of my repository by removing all commits older than the 5th commit ago.

This question is different than other questions because I am looking for answers only for that very specific way to reduce the size.

I have read the other similar questions and the answers are confusing because there are so many options. I am hoping by making my request very specific that I can get very specific answer that will be easy execute.

I am hoping that this can be a specific enumerated list of instructions starting with a git clone myrepo and ending with a git push -force myrepo or something like that.

3
  • 3
    You don’t mention what questions are not like this one. But this looks like those how to squash all history into one commit except for 5 instead of 1 (but we just as well could generalize to N). The answer is only a little more involved than the case of N=1. Commented Jan 16 at 9:46
  • More info in OPs Reddit question on the use-case although the use-case might not be relevant for the Q/A framing here. Commented Jan 16 at 10:12
  • Repository where? Locally? Because it's incredibly unlikely that you can about local size: if you do, you're going to have to edit your post to explain the incredibly particular situation in which this makes sense. Remotely? Remotely where? Github? GitLab? Some other remote? Commented Jan 16 at 16:27

5 Answers 5

4

The concept "by removing all commits older than the 5th commit ago" is not really coherent. However, let us presume the simplest possible case, that what you mean is: you have a main branch main, and you want the repo to consist only of commits main~5 thru main, and none of these is a merge commit.

That in itself is not really coherent either, because by removing any earlier commits, we will be changing history — and a commit's history can never be changed. So what we really want are new commits that look like commits main~5 thru main, and they should be, in effect, the only commits in the history of main.

Very well. You will be working in the local repo (which we will presume matches the remote repo, because, as you have said, you cloned it, or because you fetched it and updated your local main). As an illustration I will presume the history initially looks something like this:

* 1cebb20 (HEAD -> main) h
* 44285d5 g
* ab95e96 f
* 7896607 e
* 8d0a11a d
* 214a6c5 c
* dd7cb4c b
* 769dfff a

There may be commits before a but never mind that. The point then is that we want, in effect, the first commit with any content in the history to be just like c (because it is 5 before h; it is main~5).

That part of the task is the hardest. We need a commit that is not c (because we are going to change parentage) but whose contents look like those of c. This commit will need a parent that is itself parentless — what Git calls an "orphan". To effect this, you would first say:

% git switch --orphan temp
% git commit --allow-empty -mnewroot 
% git cat-file commit main~5
tree 04a59185a0c5f4047e4fd3fa87b0c84e671b00ee
parent ...
author ...
committer ...

Okay, we have made an empty parentless commit, pointed to by temp. And Git has told us how to get at the content of the commit 5 before main. We want to take that content and pour it into a new commit whose parent is the temp branch we have just created. We do this by using the tree SHA that we were just given, like this:

% git commit-tree -p temp -m 'c' 04a59185a0c5f4047e4fd3fa87b0c84e671b00ee
b1fa80953a368fa6cc7f58b2018be19d2adf2b69

(Naturally, your numbers here will be different.)

Okay, so we have made a new commit that looks like c and has the empty parentless (orphan) temp commit as its parent. Git has also told us the SHA of this new commit — the new c. The rest is easy: we simply rebase the remainder of main onto that commit:

% git rebase --onto b1fa80953a main~5 main

Done! The history now looks like this:

* 3581893 (HEAD -> main) h
* 95227d0 g
* df95fd3 f
* f2f1edf e
* a910be2 d
* b1fa809 c
* 8f41473 (temp) newroot

We can now delete temp, as its job is done.

% git branch -D temp

And then of course, as you rightly suggest, you would need to push with force in order to update an existing remote repo; but it would be much simpler and more efficient at this point just to delete the remote repo and make a new one (and GitHub will then give you instructions for associating your local repo with this new repo and pushing main).

Sign up to request clarification or add additional context in comments.

2 Comments

Observe that this whole procedure is essentially undoable, so try it and, if it doesn't give you the local repo you want, just use the reflog to reset main hard back to where it was before the rebase.
git-commit-tree(1) + git-rebase(1) for the four remaining commits seems like the best solution for this problem.
2

I'll skip over the full hour-long "this is a very bad idea" exhortatory sermon except for its last two sentences: Don't say you weren't warned. This is a very bad idea.

removing all commits older than the 5th commit ago

Assuming by "ago" you're referring to commit date, here's a starter kit. The monkey on my back made me fix it up to run pretty well, I tested running the commands this prints on the Git history, it takes like two seconds:

since=$(git log -1 --skip 4 --branches --pretty=%cI);
git rev-list --parents --branches --since=$since --reverse \
| awk '{ ++keep[$1]
         for (f=NF;f>1;--f) if (!keep[$f]) {++drop[$f];$f=""}
         print "git replace --graft", $0
       }
       END { print "GIT_FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f -- --branches \\"
             for (k in keep) if (keep[k]) print "--ancestry-path="k" \\"
             for (d in drop) if (drop[d]) print "^"d" \\"
             print ";"
       }'

and that will print the commands to replace your existing history with one lacking any commits made before the fifth commit ago -- except any branches that would be entirely deleted haven't been, yet. When you run them they will just temporarily rewire ancestry of existing commits.

If you don't like what you see with git log --oneline --branches --since=$since after that, you can git fetch -u . +refs/original/*:*; git replace -d $(git replace) to undo the truncations, no harm done.

If instead you then follow executing the rewrites it prints with the commands below, you'll finish baking in the rewrites, delete any completely-outdated branches and compact the repository:


# this batch makes backout in this clone impossible:
git replace -d $(git replace)
GIT_FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f --setup exit
`git config core.bare` || git checkout --detach
git log --no-walk --branches --before=$since --pretty='git branch -D %S' | sh
`git config core.bare` || git checkout -
git reflog expire --all --expire-unreachable=now
git for-each-ref refs/remotes --format='delete %(refname)' | git update-ref --stdin
git repack -ad

# this puts the upstream on the path to unrecoverability too:
git push -f --branches --prune

and your upstream repo will lose the old history too, once it gets garbage-collected after the pruning interval expires.

This does not attempt to account for tags, it ignores their existence. If you're using tags on a five-commit history, I'll leave extending this to rewrite tags that should also be re-hung and delete tags that shouldn't to you.

Comments

1

Another approach: cherry-picking into an orphan branch. 1st, create an orphan branch:

git switch --orphan=new-master

Cleanup the directory:

git clean -fdx

Copy and old commit completely to create a basis for further cherry-picking:

git restore -s master~4 .
git add -A .
git commit -C master~4

Now cherry-pick 4 commits back to the tip of master:

git cherry-pick master~4..master

Delete branch master and rename the current branch to master:

git branch -D master
git branch -m master

PS. Preserve the full repo backup for some time.

Comments

-1

Make a backup and try this shell script on a freshly cloned repository:

FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch \
    --prune-empty --setup "COMMIT_COUNT=`git rev-list --count HEAD`" \
    --index-filter '
        if [ "$COMMIT_COUNT" -gt 5 ]; then
            git rm -r --quiet .
            COMMIT_COUNT=`expr $COMMIT_COUNT - 1`
        fi
    ' HEAD

5 Comments

Readers, note the FILTER_BRANCH_SQUELCH_WARNING=1 ;)
@Guildenstern A stupid warning, I dare say. For many tasks git filter-branch still works better than git filter-repo. Slower but better.
Tested and works on 23,563 commits. It took 1318 seconds.
@Guildenstern I couldn't solve the task using git filter-repo --commit-callback. I tried and got some results but commit.skip() skipped too much. Though I admit it skipped everything quite quick. ;-)
@Guildenstern "For many tasks git filter-branch still works better than git filter-repo." I want to take that back at least pattially. Today I discovered git filter-repo --file-info-callback and it's quite powerful. Some of the problems that could only be solved using git filter-branch can now be solved with git filter-repo
-1

Disable your internet connection and clone the repository you want to vacuum locally without checking out a branch.

Enter the new work tree and create a new, empty branch.

Now point git to the previous work tree and archive the previous-last revision you would like to keep and tar-pipe it into your new work tree and commit it as the base.

Then cherry pick via the remote branch references the five commits you want to have.

Then remove the remote.

Rename the branch.

Run the garbage collection.

Now compare how much bytes this has saved. This may greatly vary, but at least you can directly compare between the local clones.

If it's good, delete the old location so that you can move the new clone into the old place.

Add the online remote you want to push to and go online again.

Now delete all references on the remote.

When the remote repository is completely empty, push the new history.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.