aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/git-pack-objects.txt
diff options
context:
space:
mode:
authorJunio C Hamano <gitster@pobox.com>2025-02-12 10:08:51 -0800
committerJunio C Hamano <gitster@pobox.com>2025-02-12 10:08:51 -0800
commitaae91a86fb2a71ff89a71b63ccec3a947b26ca51 (patch)
tree3bfc421ac1b1f445d22bae71a3b3524ec993fb4c /Documentation/git-pack-objects.txt
parent388218fac77d0405a5083cd4b4ee20f6694609c3 (diff)
parentb4cf68476a983ff063846b43cd46ee9805f2c0bb (diff)
downloadgit-aae91a86fb2a71ff89a71b63ccec3a947b26ca51.tar.gz
Merge branch 'ds/name-hash-tweaks'
"git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version
Diffstat (limited to 'Documentation/git-pack-objects.txt')
-rw-r--r--Documentation/git-pack-objects.txt32
1 files changed, 31 insertions, 1 deletions
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index e32404c6aa..7f69ae4855 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -15,7 +15,8 @@ SYNOPSIS
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
[--cruft] [--cruft-expiration=<time>]
[--stdout [--filter=<filter-spec>] | <base-name>]
- [--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list>
+ [--shallow] [--keep-true-parents] [--[no-]sparse]
+ [--name-hash-version=<n>] < <object-list>
DESCRIPTION
@@ -345,6 +346,35 @@ raise an error.
Restrict delta matches based on "islands". See DELTA ISLANDS
below.
+--name-hash-version=<n>::
+ While performing delta compression, Git groups objects that may be
+ similar based on heuristics using the path to that object. While
+ grouping objects by an exact path match is good for paths with
+ many versions, there are benefits for finding delta pairs across
+ different full paths. Git collects objects by type and then by a
+ "name hash" of the path and then by size, hoping to group objects
+ that will compress well together.
++
+The default name hash version is `1`, which prioritizes hash locality by
+considering the final bytes of the path as providing the maximum magnitude
+to the hash function. This version excels at distinguishing short paths
+and finding renames across directories. However, the hash function depends
+primarily on the final 16 bytes of the path. If there are many paths in
+the repo that have the same final 16 bytes and differ only by parent
+directory, then this name-hash may lead to too many collisions and cause
+poor results. At the moment, this version is required when writing
+reachability bitmap files with `--write-bitmap-index`.
++
+The name hash version `2` has similar locality features as version `1`,
+except it considers each path component separately and overlays the hashes
+with a shift. This still prioritizes the final bytes of the path, but also
+"salts" the lower bits of the hash using the parent directory names. This
+method allows for some of the locality benefits of version `1` while
+breaking most of the collisions from a similarly-named file appearing in
+many different directories. At the moment, this version is not allowed
+when writing reachability bitmap files with `--write-bitmap-index` and it
+will be automatically changed to version `1`.
+
DELTA ISLANDS
-------------