As explained at How do I clone a subdirectory only of a Git repository? the best way I've found so far to download all files in a Git subdirectory only is:
git clone --depth 1 --filter=blob:none --sparse \
https://github.com/cirosantilli/test-git-partial-clone-big-small
cd test-git-partial-clone-big-small
git sparse-checkout set small
which is my best attempt so far at downloading only the small/ directory.
However, as soon as I run:
git clone --depth 1 --filter=blob:none --sparse \
https://github.com/cirosantilli/test-git-partial-clone-big-small
any files (but not directories) present on the root directory are downloaded and appear in the repository, in the case of that test repo I get the unwanted file:
generate.sh
How to prevent that from happening, to obtain only the subdirectories that I'm interested in, without any root directory files?
I've checked on other repositories e.g. https://github.com/torvalds/linux , and having a large number of small files on toplevel does not slow down the download significantly (by downloading them one by one separately), so this would only be a problem if there are large files on toplevel.
Tested on Git 2.37.2, Ubuntu 22.10, February 2023.
git clone --sparsedoes exactly that: Employ a sparse-checkout, with only files in the toplevel directory initially being present. (Emphasize mine — phd). That is, you cannot do what you want withgit clone. You can setup sparse checkout and then usegit fetchbut fetch doesn't allow filters AFAIK. So you have to choose between one way or the other.