diff options
| author | Junio C Hamano <gitster@pobox.com> | 2025-02-12 10:08:51 -0800 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2025-02-12 10:08:51 -0800 |
| commit | aae91a86fb2a71ff89a71b63ccec3a947b26ca51 (patch) | |
| tree | 3bfc421ac1b1f445d22bae71a3b3524ec993fb4c /Documentation/git-pack-objects.txt | |
| parent | 388218fac77d0405a5083cd4b4ee20f6694609c3 (diff) | |
| parent | b4cf68476a983ff063846b43cd46ee9805f2c0bb (diff) | |
| download | git-aae91a86fb2a71ff89a71b63ccec3a947b26ca51.tar.gz | |
Merge branch 'ds/name-hash-tweaks'
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
Diffstat (limited to 'Documentation/git-pack-objects.txt')
| -rw-r--r-- | Documentation/git-pack-objects.txt | 32 |
1 files changed, 31 insertions, 1 deletions
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index e32404c6aa..7f69ae4855 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -15,7 +15,8 @@ SYNOPSIS [--revs [--unpacked | --all]] [--keep-pack=<pack-name>] [--cruft] [--cruft-expiration=<time>] [--stdout [--filter=<filter-spec>] | <base-name>] - [--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list> + [--shallow] [--keep-true-parents] [--[no-]sparse] + [--name-hash-version=<n>] < <object-list> DESCRIPTION @@ -345,6 +346,35 @@ raise an error. Restrict delta matches based on "islands". See DELTA ISLANDS below. +--name-hash-version=<n>:: + While performing delta compression, Git groups objects that may be + similar based on heuristics using the path to that object. While + grouping objects by an exact path match is good for paths with + many versions, there are benefits for finding delta pairs across + different full paths. Git collects objects by type and then by a + "name hash" of the path and then by size, hoping to group objects + that will compress well together. ++ +The default name hash version is `1`, which prioritizes hash locality by +considering the final bytes of the path as providing the maximum magnitude +to the hash function. This version excels at distinguishing short paths +and finding renames across directories. However, the hash function depends +primarily on the final 16 bytes of the path. If there are many paths in +the repo that have the same final 16 bytes and differ only by parent +directory, then this name-hash may lead to too many collisions and cause +poor results. At the moment, this version is required when writing +reachability bitmap files with `--write-bitmap-index`. ++ +The name hash version `2` has similar locality features as version `1`, +except it considers each path component separately and overlays the hashes +with a shift. This still prioritizes the final bytes of the path, but also +"salts" the lower bits of the hash using the parent directory names. This +method allows for some of the locality benefits of version `1` while +breaking most of the collisions from a similarly-named file appearing in +many different directories. At the moment, this version is not allowed +when writing reachability bitmap files with `--write-bitmap-index` and it +will be automatically changed to version `1`. + DELTA ISLANDS ------------- |
