3

We are trying to shrink our git repository to under 500MB due to deployment issues.

To achieve that, we have created a new branch where we have moved all old images, videos and fonts to AWS S3.

I can easily get the list of files with git diff --name-only --diff-filter=D master -- public/assets/.

Now, I have tried to run BFG-repo-cleaner 1.14.0 on each file. But I have 400 files and it is taking ages to delete each files separately (still running as I'm writing this).

git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | xargs -i bfg --delete-files '{}'

Since each file is distinct, I can not really use a glob pattern, as suggested at Delete multiple files from multiple branch using bfg repo cleaner.

I tried to separate each file with a comma but that resulted in BFG-repo-cleaner telling me:

BFG aborting: No refs to update - no dirty commits found??

Is there a way to provide multiple files to BFG-repo-cleaner without a glob pattern?

PS. The command I tried with multiple files is: git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | sed -z 's/\n/,/g;s/,$/\n/' | xargs -i bfg --delete-files '{}' && git reflog expire --expire=now --all && git gc --prune=now --aggressive

PPS. The bfg command is on my PATH as a simple bash script with java -jar /tools/BFG-repo-cleaner/bfg-1.14.0.jar "$@"

1 Answer 1

3

But I have 400 files and it is taking ages to delete each files separately

That is why the tool to use (python-based) is newren/git-filter-repo (see installation)

That way, you can feed that tool a file, with the list of files in it:

git filter-repo --paths-from-file <filename> --invert-paths

From the documentation:

Similarly, you could use --paths-from-file to delete many files.

For example, you could run git filter-repo --analyze to get reports, look in one such as .git/filter-repo/analysis/path-deleted-sizes.txt and copy all the filenames into a file such as /tmp/files-i-dont-want-anymore.txt, and then run:

git filter-repo --invert-paths \
                --paths-from-file /tmp/files-i-dont-want-anymore.txt

to delete them all.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks @VonC! Looks like a much better tool. However, I get an error from a fresh git clone (possibly due to switching from main branch) that I don't understand. > Aborting: Refusing to destructively overwrite repo history since this does not look like a fresh clone. (expected at most one entry in the reflog for HEAD) Please operate on a fresh clone instead. If you want to proceed anyway, use --force.
@dotnetCarpenter Make sure your git status is clean before launching that command.
Our .git directory shrank from 454MB to 338MB after using git filter-repo, git reflog expire and git gc. :) Does git filter-repo also takes care of other branches and tags?
Nice! I see that git filter-repo also fixed our branches. And it is super fast!
@dotnetCarpenter Well done. It can operate on all branches.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.