diff options
| author | Derrick Stolee <stolee@gmail.com> | 2025-05-16 18:11:57 +0000 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2025-05-16 12:15:39 -0700 |
| commit | 5f711504d9540ef5c8ca4c9d43e6f5a1d0259d3c (patch) | |
| tree | f1271c8225a00f782250a6c4bab5e434030f3a15 /builtin | |
| parent | 6e95bf80b5066abca85277bd8c142bf36c5fbe0d (diff) | |
| download | git-5f711504d9540ef5c8ca4c9d43e6f5a1d0259d3c.tar.gz | |
repack: add --path-walk option
Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.
Add the --path-walk option to the performance tests in p5313.
For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the --path-walk tests in p5313 look like this:
Test this tree
-------------------------------------------------------------------------
5313.18: thin pack with --path-walk 0.08(0.06+0.02)
5313.19: thin pack size with --path-walk 18.4K
5313.20: big pack with --path-walk 2.10(7.80+0.26)
5313.21: big pack size with --path-walk 19.8M
5313.22: shallow fetch pack with --path-walk 1.62(3.38+0.17)
5313.23: shallow pack size with --path-walk 33.6M
5313.24: repack with --path-walk 81.29(96.08+0.71)
5313.25: repack size with --path-walk 142.5M
[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08
Along with the earlier tests in p5313, I'll instead reformat the
comparison as follows:
Repack Method Pack Size Time
---------------------------------------
Hash v1 439.4M 87.24s
Hash v2 161.7M 21.51s
Path Walk 142.5M 81.29s
There are a few things to notice here:
1. The benefits of --name-hash-version=2 over --name-hash-version=1 are
significant, but --path-walk still compresses better than that
option.
2. The --path-walk command is still using --name-hash-version=1 for the
second pass of delta computation, using the increased name hash
collisions as a potential method for opportunistic compression on
top of the path-focused compression.
3. The --path-walk algorithm is currently sequential and does not use
multiple threads for delta compression. Threading will be
implemented in a future change so the computation time will improve
to better compete in this metric.
There are small benefits in size for my copy of the Git repository:
Repack Method Pack Size Time
---------------------------------------
Hash v1 248.8M 30.44s
Hash v2 249.0M 30.15s
Path Walk 213.2M 142.50s
As well as in the nodejs/node repository [3]:
Repack Method Pack Size Time
---------------------------------------
Hash v1 739.9M 71.18s
Hash v2 764.6M 67.82s
Path Walk 698.1M 208.10s
[3] https://github.com/nodejs/node
This benefit also repeats in my copy of the Linux kernel repository:
Repack Method Pack Size Time
---------------------------------------
Hash v1 2.5G 554.41s
Hash v2 2.5G 549.62s
Path Walk 2.2G 1562.36s
It is important to see that even when the repository shape does not have
many name-hash collisions, there is a slight space boost to be found
using this method.
As this repacking strategy was released in Git for Windows 2.47.0, some
users have reported cases where the --path-walk compression is slightly
worse than the --name-hash-version=2 option. In those cases, it may be
beneficial to combine the two options. However, there has not been a
released version of Git that has both options and I don't have access to
these repos for testing.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'builtin')
| -rw-r--r-- | builtin/repack.c | 7 |
1 files changed, 6 insertions, 1 deletions
diff --git a/builtin/repack.c b/builtin/repack.c index 75e3752353..d7f798280c 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -43,7 +43,7 @@ static char *packdir, *packtmp_name, *packtmp; static const char *const git_repack_usage[] = { N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n" "[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]\n" - "[--write-midx] [--name-hash-version=<n>]"), + "[--write-midx] [--name-hash-version=<n>] [--path-walk]"), NULL }; @@ -63,6 +63,7 @@ struct pack_objects_args { int quiet; int local; int name_hash_version; + int path_walk; struct list_objects_filter_options filter_options; }; @@ -313,6 +314,8 @@ static void prepare_pack_objects(struct child_process *cmd, strvec_pushf(&cmd->args, "--no-reuse-object"); if (args->name_hash_version) strvec_pushf(&cmd->args, "--name-hash-version=%d", args->name_hash_version); + if (args->path_walk) + strvec_pushf(&cmd->args, "--path-walk"); if (args->local) strvec_push(&cmd->args, "--local"); if (args->quiet) @@ -1212,6 +1215,8 @@ int cmd_repack(int argc, N_("pass --no-reuse-object to git-pack-objects")), OPT_INTEGER(0, "name-hash-version", &po_args.name_hash_version, N_("specify the name hash version to use for grouping similar objects by path")), + OPT_BOOL(0, "path-walk", &po_args.path_walk, + N_("pass --path-walk to git-pack-objects")), OPT_NEGBIT('n', NULL, &run_update_server_info, N_("do not run git-update-server-info"), 1), OPT__QUIET(&po_args.quiet, N_("be quiet")), |
