git.git - The core git plumbing

Age	Commit message (Collapse)	Author	Files	Lines
2025-11-30	Merge branch 'jk/asan-bonanza'	Junio C Hamano	1	-4/+25
	Various issues detected by Asan have been corrected. * jk/asan-bonanza: t: enable ASan's strict_string_checks option fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated fsck: remove redundant date timestamp check fsck: avoid strcspn() in fsck_ident() fsck: assert newline presence in fsck_ident() cache-tree: avoid strtol() on non-string buffer Makefile: turn on NO_MMAP when building with ASan pack-bitmap: handle name-hash lookups in incremental bitmaps compat/mmap: mark unused argument in git_munmap()
2025-11-18	pack-bitmap: handle name-hash lookups in incremental bitmaps	Jeff King	1	-4/+25
	If a bitmap has a name-hash cache, it is an array of 32-bit integers, one per entry in the bitmap, which we've mmap'd from the .bitmap file. We access it directly like this: if (bitmap_git->hashes) hash = get_be32(bitmap_git->hashes + index_pos); That works for both regular pack bitmaps and for non-incremental midx bitmaps. There is one bitmap_index with one "hashes" array, and index_pos is within its bounds (we do the bounds-checking when we load the bitmap). But for an incremental midx bitmap, we have a linked list of bitmap_index structs, and each one has only its own small slice of the name-hash array. If index_pos refers to an object that is not in the first bitmap_git of the chain, then we'll access memory outside of the bounds of its "hashes" array, and often outside of the mmap. Instead, we should walk through the list until we find the bitmap_index which serves our index_pos, and use its hash (after adjusting index_pos to make it relative to the slice we found). This is exactly what we do elsewhere for incremental midx lookups (like the pack_pos_to_midx() call a few lines above). But we can't use existing helpers like midx_for_object() here, because we're walking through the chain of bitmap_index structs (each of which refers to a midx), not the chain of incremental multi_pack_index structs themselves. The problem is triggered in the test suite, but we don't get a segfault because the out-of-bounds index is too small. The OS typically rounds our mmap up to the nearest page size, so we just end up accessing some extra zero'd memory. Nor do we catch it with ASan, since it doesn't seem to instrument mmaps at all. But if we build with NO_MMAP, then our maps are replaced with heap allocations, which ASan does check. And so: make NO_MMAP=1 SANITIZE=address cd t ./t5334-incremental-multi-pack-index.sh does show the problem (and this patch makes it go away). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16	packfile: introduce macro to iterate through packs	Patrick Steinhardt	1	-3/+3
	We have a bunch of different sites that want to iterate through all packs of a given `struct packfile_store`. This pattern is somewhat verbose and repetitive, which makes it somewhat cumbersome. Introduce a new macro `repo_for_each_pack()` that removes some of the boilerplate. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24	packfile: refactor `get_all_packs()` to work on packfile store	Patrick Steinhardt	1	-2/+2
	The `get_all_packs()` function prepares the packfile store and then returns its packfiles. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11	midx: compute paths via their source	Patrick Steinhardt	1	-6/+4
	With the preceding commits we started to always have the object database source available when we load, write or access multi-pack indices. With this in place we can change how MIDX paths are computed so that we don't have to pass in the combination of a hash algorithm and object directory anymore, but only the object database source. Refactor the code accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11	midx: stop duplicating info redundant with its owning source	Patrick Steinhardt	1	-6/+7
	Multi-pack indices store some information that is redundant with their owning source: - The locality bit that tracks whether the source is the primary object source or an alternate. - The object directory path the multi-pack index is located in. - The pointer to the owning parent directory. All of this information is already contained in `struct odb_source`. So now that we always have that struct available when loading a multi-pack index we have it readily accessible. Drop the redundant information and instead store a pointer to the object source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11	midx: drop redundant `struct repository` parameter	Patrick Steinhardt	1	-2/+2
	There are a couple of functions that take both a `struct repository` and a `struct multi_pack_index`. This provides redundant information though without much benefit given that the multi-pack index already has a pointer to its owning repository. Drop the `struct repository` parameter from such functions. While at it, reorder the list of parameters of `fill_midx_entry()` so that the MIDX comes first to better align with our coding guidelines. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-29	Merge branch 'ps/object-store-midx' into ps/object-store-midx-dedup-info	Junio C Hamano	1	-6/+15
	* ps/object-store-midx: midx: remove now-unused linked list of multi-pack indices packfile: stop using linked MIDX list in `get_all_packs()` packfile: stop using linked MIDX list in `find_pack_entry()` packfile: refactor `get_multi_pack_index()` to work on sources midx: stop using linked list when closing MIDX packfile: refactor `prepare_packed_git_one()` to work on sources midx: start tracking per object database source
2025-07-15	Merge branch 'ly/load-bitmap-leakfix'	Junio C Hamano	1	-24/+64
	Leakfix with a new and a bit invasive test. * ly/load-bitmap-leakfix: pack-bitmap: add load corrupt bitmap test pack-bitmap: reword comments in test_bitmap_commits() pack-bitmap: fix memory leak if load_bitmap() failed
2025-07-15	Merge branch 'ps/object-store'	Junio C Hamano	1	-5/+5
	Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`
2025-07-15	packfile: refactor `get_multi_pack_index()` to work on sources	Patrick Steinhardt	1	-6/+15
	The function `get_multi_pack_index()` loads multi-pack indices via `prepare_packed_git()` and then returns the linked list of multi-pack indices that is stored in `struct object_database`. That list is in the process of being removed though in favor of storing the MIDX as part of the object database source it belongs to. Refactor `get_multi_pack_index()` so that it returns the multi-pack index for a single object source. Callers are now expected to call this function for each source they are interested in. This requires them to iterate through alternates, so we have to prepare alternate object sources before doing so. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-09	Merge branch 'ps/object-store' into ps/object-store-midx	Junio C Hamano	1	-5/+5
	* ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`
2025-07-01	odb: rename `oid_object_info()`	Patrick Steinhardt	1	-4/+4
	Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01	object-store: rename files to "odb.{c,h}"	Patrick Steinhardt	1	-1/+1
	In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01	pack-bitmap: add load corrupt bitmap test	Lidong Yan	1	-5/+57
	t5310 lacks a test to ensure git works correctly when commit bitmap data is corrupted. So this patch add test helper in pack-bitmap.c to list each commit bitmap position in bitmap file and `load corrupt bitmap` test case in t/t5310 to corrupt a commit bitmap before loading it. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01	pack-bitmap: reword comments in test_bitmap_commits()	Lidong Yan	1	-2/+3
	The comment in pack-bitmap.c:test_bitmap_commits(), suggests that we can avoid reading the commit table altogether. However, this comment is misleading. The reason we load bitmap entries here is because test_bitmap_commits() needs to print the commit IDs from the bitmap, and we must read the bitmap entries to obtain those commit IDs. So reword this comment. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01	pack-bitmap: fix memory leak if load_bitmap() failed	Taylor Blau	1	-17/+4
	After going through the "failed" label, load_bitmap() will return -1, and its caller (either prepare_bitmap_walk() or prepare_bitmap_git()) will then call free_bitmap_index(). That function would have done: struct stored_bitmap *sb; kh_foreach_value(b->bitmaps, sb { ewah_pool_free(sb->root); free(sb); }); , but won't since load_bitmap() already called kh_destroy_oid_map() and NULL'd the "bitmaps" pointer from within its "failed" label. Thus if you got part of the way through loading bitmap entries and then failed, you would leak all of the previous entries that you were able to load successfully. The solution is to remove the error handling code in load_bitmap(), because its caller will always call free_bitmap_index() in case of an error. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-09	pack-bitmap: remove checks before bitmap_free	Lidong Yan	1	-2/+2
	In pack-bitmap.c:find_boundary_objects(), the roots_bitmap is only freed if cascade_pseudo_merges_1() fails. However, cascade_pseudo_merges_1() uses roots_bitmap as a mutable reference without taking ownership of it. As a result, if cascade_pseudo_merges_1() succeeds, roots_bitmap is leaked. And this leak currently lacks a dedicated test to detect it. To fix this leak, remove if cascade_pseudo_merges_1() succeed check and always calling bitmap_free(roots_bitmap); To trigger this leak, we need roots_bitmap that contains at least one pseudo merge. So that we can use pseudo merge bitmap when we compute roots reachable bitmap. Here we create two commits: first A then B. Add A to the pseudo-merge and perform a traversal over the range A..B. In this scenario, the "haves" set will be {A}, and cascade_pseudo_merges_1 will succeed, thereby exposing the leak due to the missing roots_bitmap cleanup. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12	pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed	Lidong Yan	1	-4/+4
	In pack-bitmap.c:load_bitmap_entries_v1, the function `read_bitmap_1` allocates a bitmap and reads index data into it. However, if any of the validation checks following the allocation fail, the allocated bitmap is not freed, resulting in a memory leak. To avoid this, the validation checks should be performed before the bitmap is allocated. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24	Merge branch 'ps/object-file-cleanup'	Junio C Hamano	1	-2/+1
	Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-16	Merge branch 'ps/cat-file-filter-batch'	Junio C Hamano	1	-9/+72
	"git cat-file --batch" and friends learned to allow "--filter=" to omit certain objects, just like the transport layer does. * ps/cat-file-filter-batch: builtin/cat-file: use bitmaps to efficiently filter by object type builtin/cat-file: deduplicate logic to iterate over all objects pack-bitmap: introduce function to check whether a pack is bitmapped pack-bitmap: add function to iterate over filtered bitmapped objects pack-bitmap: allow passing payloads to `show_reachable_fn()` builtin/cat-file: support "object:type=" objects filter builtin/cat-file: support "blob:limit=" objects filter builtin/cat-file: support "blob:none" objects filter builtin/cat-file: wire up an option to filter objects builtin/cat-file: introduce function to report object status builtin/cat-file: rename variable that tracks usage
2025-04-15	Merge branch 'ps/object-wo-the-repository'	Junio C Hamano	1	-7/+8
	The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-15	object-store: merge "object-store-ll.h" and "object-store.h"	Patrick Steinhardt	1	-1/+1
	The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15	object-file: move `git_open_cloexec()` to "compat/open.c"	Patrick Steinhardt	1	-1/+0
	The `git_open_cloexec()` wrapper function provides the ability to open a file with `O_CLOEXEC` in a platform-agnostic way. This function is provided by "object-file.c" even though it is not specific to the object subsystem at all. Move the file into "compat/open.c". This file already exists before this commit, but has only been compiled conditionally depending on whether or not open(3p) may return EINTR. With this change we now unconditionally compile the object, but wrap `git_open_with_retry()` in an ifdef. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07	pack-bitmap: introduce function to check whether a pack is bitmapped	Patrick Steinhardt	1	-0/+15
	Introduce a function that allows us to verify whether a pack is bitmapped or not. This functionality will be used in a subsequent commit. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07	pack-bitmap: add function to iterate over filtered bitmapped objects	Patrick Steinhardt	1	-6/+53
	Introduce a function that allows the caller to iterate over all bitmapped objects that match a given filter. This mechanism will be used in a subsequent commit to optimize object filters in git-cat-file(1). Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07	pack-bitmap: allow passing payloads to `show_reachable_fn()`	Patrick Steinhardt	1	-7/+8
	The `show_reachable_fn` callback is used by a couple of functions to present reachable objects to the caller. The function does not provide a way for the caller to pass a payload though, which is functionality that we'll require in a subsequent commit. Change the callback type to accept a payload and adapt all callsites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators	Taylor Blau	1	-15/+27
	Now that we have initialized arrays for each bitmap layer's type bitmaps in the previous commit, adjust existing callers to use them in preparation for multi-layered bitmaps. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-bitmap.c: keep track of each layer's type bitmaps	Taylor Blau	1	-4/+53
	Prepare for reading the type-level bitmaps from previous bitmap layers by maintaining an array for each type, where each element in that type's array corresponds to one layer's bitmap for that type. These fields will be used in a later commit to instantiate the 'struct ewah_or_iterator' for each type. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs	Taylor Blau	1	-3/+8
	Prepare for using pseudo-merges with incremental MIDX bitmaps by attempting to apply pseudo-merges from each layer when encountering a given commit during a walk. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-bitmap.c: compute disk-usage with incremental MIDXs	Taylor Blau	1	-2/+2
	In a similar fashion as previous commits, use nth_midxed_pack() instead of accessing the MIDX's ->packs array directly to support incremental MIDXs. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs	Taylor Blau	1	-21/+86
	Implement support for the special `--test-bitmap` mode of `git rev-list` when using incremental MIDXs. The bitmap_test_data structure is extended to contain a "base" pointer that mirrors the structure of the bitmap chain that it is being used to test. When we find a commit to test, we first chase down the ->base pointer to find the appropriate bitmap_test_data for the bitmap layer that the given commit is contained within, and then perform the test on that bitmap. In order to implement this, light modifications are made to bitmap_for_commit() to reimplement it in terms of a new function, find_bitmap_for_commit(), which fills out a pointer which indicates the bitmap layer which contains the given commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs	Taylor Blau	1	-3/+8
	In a similar fashion as previous commits in the first phase of incremental MIDXs, enumerate not just the packs in the current incremental MIDX layer, but previous ones as well. Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a single pack from a MIDX, use the oldest layer's preferred pack as it is likely to contain the largest number of reusable sections. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs	Taylor Blau	1	-1/+1
	Since we may ask for a pack_id that is in an earlier MIDX layer relative to the one corresponding to our bitmap, use nth_midxed_pack() instead of accessing the ->packs array directly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs	Taylor Blau	1	-4/+7
	The pack-bitmap machinery uses `bitmap_for_commit()` to locate the EWAH-compressed bitmap corresponding to some given commit object. Teach this function about incremental MIDX bitmaps by teaching it to recur on earlier bitmap layers when it fails to find a given commit in the current layer. The changes to do so are as follows: - Avoid initializing hash_pos at its declaration, since bitmap_for_commit() is now a recursive function and may receive a NULL bitmap_index pointer as its first argument. - In cases where we would previously return NULL (to indicate that a lookup failed and the given bitmap_index does not contain an entry corresponding to the given commit), recursively call the function on the previous bitmap layer. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-bitmap.c: open and store incremental bitmap layers	Taylor Blau	1	-14/+48
	Prepare the pack-bitmap machinery to work with incremental MIDXs by adding a new "base" field to keep track of the bitmap index associated with the previous MIDX layer. The changes in this commit are mostly boilerplate to open the correct bitmap(s), add them to the chain of bitmap layers along the "base" pointer, ensure that the correct packs and their reverse indexes are loaded across MIDX layers, etc. While we're at it, keep track of a base_nr field to indicate how many bitmap layers (including the current bitmap) exist. This will be used in a future commit to allocate an array of 'struct ewah_bitmap' pointers to collect all of the respective type bitmaps among all layers to initialize a multi-EWAH iterator. Subsequent commits will teach the functions within the pack-bitmap machinery how to interact with these new fields. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21	pack-revindex: prepare for incremental MIDX bitmaps	Taylor Blau	1	-12/+31
	Prepare the reverse index machinery to handle object lookups in an incremental MIDX bitmap. These changes are broken out across a few functions: - load_midx_revindex() learns to use the appropriate MIDX filename depending on whether the given 'struct multi_pack_index ' is incremental or not. - pack_pos_to_midx() and midx_to_pack_pos() now both take in a global object position in the MIDX pseudo-pack order, and find the earliest containing MIDX (similar to midx.c::midx_for_object(). - midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the number of objects in the base (since 'vb - midx->revindx_data' is relative to the containing MIDX, and pack_pos_to_midx() expects a global position). Likewise, this function adjusts its output by adding m->num_objects_in_base to return a global position out through the `pos` pointer. Together, these changes are sufficient to use the multi-pack index's reverse index format for incremental multi-pack reachability bitmaps. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10	object: stop depending on `the_repository`	Patrick Steinhardt	1	-3/+3
	There are a couple of functions exposed by "object.c" that implicitly depend on `the_repository`. Remove this dependency by injecting the repository via a parameter. Adapt callers accordingly by simply using `the_repository`, except in cases where the subsystem is already free of the repository. In that case, we instead pass the repository provided by the caller's context. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10	csum-file: stop depending on `the_repository`	Patrick Steinhardt	1	-4/+5
	There are multiple sites in "csum-file.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. Refactor the code to stop using `the_repository` by adapting functions to receive required data as parameters. Adapt callsites accordingly by either using `the_repository->hash_algo`, or by using a context-provided hash algorithm in case the subsystem already got rid of its dependency on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21	Merge branch 'ps/the-repository'	Junio C Hamano	1	-1/+3
	More code paths have a repository passed through the callchain, instead of assuming the primary the_repository object. * ps/the-repository: match-trees: stop using `the_repository` graph: stop using `the_repository` add-interactive: stop using `the_repository` tmp-objdir: stop using `the_repository` resolve-undo: stop using `the_repository` credential: stop using `the_repository` mailinfo: stop using `the_repository` diagnose: stop using `the_repository` server-info: stop using `the_repository` send-pack: stop using `the_repository` serve: stop using `the_repository` trace: stop using `the_repository` pager: stop using `the_repository` progress: stop using `the_repository`
2024-12-23	Merge branch 'tb/bitmap-fix-pack-reuse'	Junio C Hamano	1	-23/+18
	Code to reuse objects based on bitmap contents have been tightened to avoid race condition even when multiple packs are involved. * tb/bitmap-fix-pack-reuse: pack-bitmap.c: ensure pack validity for all reuse packs
2024-12-23	Merge branch 'ps/build-sign-compare'	Junio C Hamano	1	-0/+2
	Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-18	progress: stop using `the_repository`	Patrick Steinhardt	1	-1/+3
	Stop using `the_repository` in the "progress" subsystem by passing in a repository when initializing `struct progress`. Furthermore, store a pointer to the repository in that struct so that we can pass it to the trace2 API when logging information. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18	Merge branch 'ps/build-sign-compare' into ps/the-repository	Junio C Hamano	1	-0/+2
	* ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-18	pack-bitmap.c: ensure pack validity for all reuse packs	Taylor Blau	1	-23/+18
	Commit 44f9fd6496 (pack-bitmap.c: check preferred pack validity when opening MIDX bitmap, 2022-05-24) prevents a race condition whereby the preferred pack disappears between opening the MIDX bitmap and attempting verbatim reuse out of its packs. That commit forces open_midx_bitmap_1() to ensure the validity of the MIDX's preferred pack, meaning that we have an open file handle on the .pack, ensuring that we can reuse bytes out of verbatim later on in the process[^1]. But 44f9fd6496 was not extended to cover multi-pack reuse, meaning that this same race condition exists for non-preferred packs during verbatim reuse. Work around that race in the same way by only marking valid packs as reuse-able. For packs that aren't reusable, skip over them but include the number of objects they have to ensure we allocate a large enough 'reuse' bitmap (e.g. if a pack in the middle of the MIDX disappeared but we still want to reuse later packs). Since we're ensuring the validity of these packs within the verbatim reuse code, we no longer have to special-case the preferred pack and open it within the open_midx_bitmap_1() function. An alternative approach to the one taken here would be to open all MIDX'd packs from within open_midx_bitmap_1(). But that would be both slower and make the bitmaps less useful, since we can still perform some pack reuse among the packs that still exist when the .bitmap is opened. After applying this patch, we can simulate the new behavior after instrumenting Git like so: diff --git a/packfile.c b/packfile.c index 9560f0a33c..aedce72524 100644 --- a/packfile.c +++ b/packfile.c @@ -557,6 +557,11 @@ static int open_packed_git_1(struct packed_git p) ; / nothing / p->pack_fd = git_open(p->pack_name); + { + const char delete = getenv("GIT_RACILY_DELETE"); + if (delete && !strcmp(delete, pack_basename(p))) + return -1; + } if (p->pack_fd < 0 \|\| fstat(p->pack_fd, &st)) return -1; pack_open_fds++; and adding the following test: test_expect_success 'disappearing packs' ' git init disappearing-packs && ( cd disappearing-packs && git config pack.allowPackReuse multi && test_commit A && test_commit B && test_commit C && A="$(echo "A" \| git pack-objects --revs $packdir/pack-A)" && B="$(echo "A..B" \| git pack-objects --revs $packdir/pack-B)" && C="$(echo "B..C" \| git pack-objects --revs $packdir/pack-C)" && git multi-pack-index write --bitmap --preferred-pack=pack-A-$A.idx && test_pack_objects_reused_all 9 3 && test_env GIT_RACILY_DELETE=pack-A-$A.pack \ test_pack_objects_reused_all 6 2 && test_env GIT_RACILY_DELETE=pack-B-$B.pack \ test_pack_objects_reused_all 6 2 && test_env GIT_RACILY_DELETE=pack-C-$C.pack \ test_pack_objects_reused_all 6 2 ) ' Note that we could relax the single-pack version of this which was most recently addressed in dc1daacdcc (pack-bitmap: check pack validity when opening bitmap, 2021-07-23), but only partially. Because we still need to know the object count in the pack, we'd still have to open the pack's *.idx, so the savings there are marginal. Note likewise that we add a new "if (!packs_nr)" early return in the pack reuse code to avoid a potentially expensive allocation on the 'reuse' bitmap in the case that no packs are available for reuse. [^1]: Unless we run out of open file handles. If that happens and we are forced to close the only open file handle of a file that has been removed from underneath us, there is nothing we can do. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-13	Merge branch 'kn/midx-wo-the-repository'	Junio C Hamano	1	-38/+58
	Yet another "pass the repository through the callchain" topic. * kn/midx-wo-the-repository: midx: inline the `MIDX_MIN_SIZE` definition midx: pass down `hash_algo` to functions using global variables midx: pass `repository` to `load_multi_pack_index` midx: cleanup internal usage of `the_repository` and `the_hash_algo` midx-write: pass down repository to `write_midx_file[_only]` write-midx: add repository field to `write_midx_context` midx-write: use `revs->repo` inside `read_refs_snapshot` midx-write: pass down repository to static functions packfile.c: remove unnecessary prepare_packed_git() call midx: add repository to `multi_pack_index` struct config: make `packed_git_(limit\|window_size)` non-global variables config: make `delta_base_cache_limit` a non-global variable packfile: pass down repository to `for_each_packed_object` packfile: pass down repository to `has_object[_kept]_pack` packfile: pass down repository to `odb_pack_name` packfile: pass `repository` to static function in the file packfile: use `repository` from `packed_git` directly packfile: add repository to struct `packed_git`
2024-12-06	global: mark code units that generate warnings with `-Wsign-compare`	Patrick Steinhardt	1	-0/+1
	Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04	midx: pass down `hash_algo` to functions using global variables	Karthik Nayak	1	-3/+3
	The functions `get_split_midx_filename_ext()`, `get_midx_filename()` and `get_midx_filename_ext()` use `hash_to_hex()` which internally uses the `the_hash_algo` global variable. Remove this dependency on global variables by passing down the `hash_algo` through to the functions mentioned and instead calling `hash_to_hex_algop()` along with the obtained `hash_algo`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04	Merge branch 'tb/boundary-traversal-fix'	Junio C Hamano	1	-1/+1
	A trivial "correctness" fix that does not yet matter in practice. * tb/boundary-traversal-fix: pack-bitmap.c: typofix in `find_boundary_objects()`
2024-12-04	midx: add repository to `multi_pack_index` struct	Karthik Nayak	1	-35/+55
	The `multi_pack_index` struct represents the MIDX for a repository. Here, we add a pointer to the repository in this struct, allowing direct use of the repository variable without relying on the global `the_repository` struct. With this addition, we can determine the repository associated with a `bitmap_index` struct. A `bitmap_index` points to either a `packed_git` or a `multi_pack_index`, both of which have direct repository references. To support this, we introduce a static helper function, `bitmap_repo`, in `pack-bitmap.c`, which retrieves a repository given a `bitmap_index`. With this, we clear up all usages of `the_repository` within `pack-bitmap.c` and also remove the `USE_THE_REPOSITORY_VARIABLE` definition. Bringing us another step closer to remove all global variable usage. Although this change also opens up the potential to clean up `midx.c`, doing so would require additional refactoring to pass the repository struct to functions where the MIDX struct is created: a task better suited for future patches. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04	packfile: pass down repository to `has_object[_kept]_pack`	Karthik Nayak	1	-1/+1
	The functions `has_object[_kept]_pack` currently rely on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from these functions and any related ones. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-22	pack-bitmap.c: typofix in `find_boundary_objects()`	Taylor Blau	1	-1/+1
	In the boundary-based bitmap traversal, we use the given 'rev_info' structure to first do a commit-only walk in order to determine the boundary between interesting and uninteresting objects. That walk only looks at commit objects, regardless of the state of revs->blob_objects, revs->tree_objects, and so on. In order to do this, we store the state of these variables in temporary fields before setting them back to zero, performing the traversal, and then setting them back. But there is a typo here that dates back to b0afdce5da (pack-bitmap.c: use commit boundary during bitmap traversal, 2023-05-08), where we incorrectly store the value of the "tags" field as "revs->blob_objects". This could lead to problems later on if, say, the caller wants tag objects but not blob objects. In the pre-image behavior, we'd set revs->tag_objects back to the old value of revs->blob_objects, thus emitting fewer objects than expected back to the caller. Fix that by correctly assigning the value of 'revs->tag_objects' to the 'tmp_tags' field. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-25	packfile: use object_id in find_pack_entry_one()	Jeff King	1	-2/+2
	The main function we use to search a pack index for an object is find_pack_entry_one(). That function still takes a bare pointer to the hash, despite the fact that its underlying bsearch_pack() function needs an object_id struct. And so we end up making an extra copy of the hash into the struct just to do a lookup. As it turns out, all callers but one already have such an object_id. So we can just take a pointer to that struct and use it directly. This avoids the extra copy and provides a more type-safe interface. The one exception is get_delta_base() in packfile.c, when we are chasing a REF_DELTA from inside the pack (and thus we have a pointer directly to the mmap'd pack memory, not a struct). We can just bump the hashcpy() from inside find_pack_entry_one() to this one caller that needs it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-09-30	pseudo-merge: fix various memory leaks	Patrick Steinhardt	1	-2/+2
	Fix various memory leaks hit by the pseudo-merge machinery. These leaks are exposed by t5333, but plugging them does not yet make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27	pack-bitmap.c: avoid repeated `pack_pos_to_offset()` during reuse	Taylor Blau	1	-4/+7
	When calling `try_partial_reuse()`, the (sole) caller from the function `reuse_partial_packfile_from_bitmap_1()` has to translate its bit position to a pack position. In the MIDX bitmap case, the caller translates from the bit position, to a position in the MIDX's pseudo-pack order (with `pack_pos_to_midx()`), then get a pack offset (with `nth_midxed_offset()`) before finally working backwards to get the pack position in the source pack by calling `offset_to_pack_pos()`. In the non-MIDX bitmap case, we can use the bit position as the pack position directly (see the comment at the beginning of the `reuse_partial_packfile_from_bitmap_1()` function for why). In either case, the first thing that `try_partial_reuse()` does after being called is determine the offset of the object at the given pack position by calling `pack_pos_to_offset()`. But we already have that information in the MIDX case! Avoid re-computing that information by instead passing it in. In the MIDX case, we already have that information stored. In the non-MIDX case, the call to `pack_pos_to_offset()` moves from the function `try_partial_reuse()` to its caller. In total, we'll save one call to `pack_pos_to_offset()` when processing MIDX bitmaps. (On my machine, there is a slight speed-up on the order of ~2ms, but it is within the margin of error over 10 runs, so I think you'd have to have a truly gigantic repository to confidently measure any significant improvement here). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27	pack-bitmap: tag bitmapped packs with their corresponding MIDX	Taylor Blau	1	-0/+1
	The next commit will need to use the bitmap's MIDX (if one exists) to translate bit positions into pack-relative positions in the source pack. Ordinarily, we'd use the "midx" field of the bitmap_index struct. But since that struct is defined within pack-bitmap.c, and our caller is in a separate compilation unit, we do not have access to the MIDX field. Instead, add a "from_midx" field to the bitmapped_pack structure so that we can use that piece of data from outside of pack-bitmap.c. The caller that uses this new piece of information will be added in the following commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02	Merge branch 'ps/use-the-repository'	Junio C Hamano	1	-1/+4
	A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-06-24	Merge branch 'tb/pseudo-merge-reachability-bitmap'	Junio C Hamano	1	-11/+353
	The pseudo-merge reachability bitmap to help more efficient storage of the reachability bitmap in a repository with too many refs has been added. * tb/pseudo-merge-reachability-bitmap: (26 commits) pack-bitmap.c: ensure pseudo-merge offset reads are bounded Documentation/technical/bitmap-format.txt: add missing position table t/perf: implement performance tests for pseudo-merge bitmaps pseudo-merge: implement support for finding existing merges ewah: `bitmap_equals_ewah()` pack-bitmap: extra trace2 information pack-bitmap.c: use pseudo-merges during traversal t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` pack-bitmap: implement test helpers for pseudo-merge ewah: implement `ewah_bitmap_popcount()` pseudo-merge: implement support for reading pseudo-merge commits pack-bitmap.c: read pseudo-merge extension pseudo-merge: scaffolding for reads pack-bitmap: extract `read_bitmap()` function pack-bitmap-write.c: write pseudo-merge table pseudo-merge: implement support for selecting pseudo-merge commits config: introduce `git_config_double()` pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` pack-bitmap-write: support storing pseudo-merge commits ...
2024-06-14	pack-bitmap.c: ensure pseudo-merge offset reads are bounded	Taylor Blau	1	-0/+5
	After reading the pseudo-merge extension's metadata table, we allocate an array to store information about each pseudo-merge, including its byte offset within the .bitmap file itself. This is done like so: pseudo_merge_ofs = index_end - 24 - (index->pseudo_merges.nr * sizeof(uint64_t)); for (i = 0; i < index->pseudo_merges.nr; i++) { index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs); pseudo_merge_ofs += sizeof(uint64_t); } But if the pseudo-merge table is corrupt, we'll keep calling get_be64() past the end of the pseudo-merge extension, potentially reading off the end of the mmap'd region. Prevent this by ensuring that we have at least `table_size - 24` many bytes available to read (adding 24 to the left-hand side of our inequality to account for the length of the metadata component). This is sufficient to prevent us from reading off the end of the pseudo-merge extension, and ensures that all of the get_be64() calls below are in bounds. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14	global: introduce `USE_THE_REPOSITORY_VARIABLE` macro	Patrick Steinhardt	1	-0/+2
	Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14	hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`	Patrick Steinhardt	1	-1/+2
	Many of our hash functions have two variants, one receiving a `struct git_hash_algo` and one that derives it via `the_repository`. Adapt all of those functions to always require the hash algorithm as input and drop the variants that do not accept one. As those functions are now independent of `the_repository`, we can move them from "hash.h" to "hash-ll.h". Note that both in this and subsequent commits in this series we always just pass `the_repository->hash_algo` as input even if it is obvious that there is a repository in the context that we should be using the hash from instead. This is done to be on the safe side and not introduce any regressions. All callsites should eventually be amended to use a repo passed via parameters, but this is outside the scope of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11	pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse	Taylor Blau	1	-0/+13
	When performing multi-pack reuse, reuse_partial_packfile_from_bitmap() is responsible for generating an array of bitmapped_pack structs from which to perform reuse. In the multi-pack case, we loop over the MIDXs packs and copy the result of calling `nth_bitmapped_pack()` to construct the list of reusable paths. But we may also want to do pack-reuse over a single pack, either because we only had one pack to perform reuse over (in the case of single-pack bitmaps), or because we explicitly asked to do single pack reuse even with a MIDX[^1]. When this is the case, the array we generate of reusable packs contains only a single element, which is either (a) the pack attached to the single-pack bitmap, or (b) the MIDX's preferred pack. In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks, 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap() function and stopped assigning the pack_int_id field when reusing only the MIDX's preferred pack. This results in an uninitialized read down in try_partial_reuse() like so: ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8 #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8 #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3 #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3 #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27 #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3 #6 0x55c5ccc8fac8 in run_builtin git.c:474:11 which happens when try_partial_reuse() tries to call midx_pair_to_pack_pos() when it tries to reject cross-pack deltas. Avoid the uninitialized read by ensuring that the pack_int_id field is set in the single-pack reuse case by setting it to either the MIDX preferred pack's pack_int_id, or '-1', in the case of single-pack bitmaps. In the latter case, we never read the pack_int_id field, so the choice of '-1' is intentional as a "garbage in, garbage out" measure. Guard against further regressions in this area by adding a test which ensures that we do not throw out deltas from the preferred pack as "cross-pack" due to an uninitialized pack_int_id. [^1]: This can happen for a couple of reasons, either because the repository is configured with 'pack.allowPackReuse=(true\|single)', or because the MIDX was generated prior to the introduction of the BTMP chunk, which contains information necessary to perform multi-pack reuse. Reported-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-30	pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper	Taylor Blau	1	-3/+2
	Now that we have the `get_midx_filename_ext()` helper, we can reimplement the `midx_bitmap_filename()` function in terms of it. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-24	pseudo-merge: implement support for finding existing merges	Taylor Blau	1	-0/+32
	This patch implements support for reusing existing pseudo-merge commits when writing bitmaps when there is an existing pseudo-merge bitmap which has exactly the same set of parents as one that we are about to write. Note that unstable pseudo-merges are likely to change between consecutive repacks, and so are generally poor candidates for reuse. However, stable pseudo-merges (see the configuration option 'bitmapPseudoMerge.<name>.stableThreshold') are by definition unlikely to change between runs (as they represent long-running branches). Because there is no index from a set of pseudo-merge parents to a matching pseudo-merge bitmap, we have to construct the bitmap corresponding to the set of parents for each pending pseudo-merge commit and see if a matching bitmap exists. This is technically quadratic in the number of pseudo-merges, but is OK in practice for a couple of reasons: - non-matching pseudo-merge bitmaps are rejected quickly as soon as they differ in a single bit - already-matched pseudo-merge bitmaps are discarded from subsequent rounds of search - the number of pseudo-merges is generally small, even for large repositories In order to do this, implement (a) a function that finds a matching pseudo-merge given some uncompressed bitset describing its parents, (b) a function that computes the bitset of parents for a given pseudo-merge commit, and (c) call that function before computing the set of reachable objects for some pending pseudo-merge. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-24	pack-bitmap: extra trace2 information	Taylor Blau	1	-1/+25
	Add some extra trace2 lines to capture the number of bitmap lookups that are hits versus misses, as well as the number of reachability roots that have bitmap coverage (versus those that do not). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-24	pack-bitmap.c: use pseudo-merges during traversal	Taylor Blau	1	-1/+111
	Now that all of the groundwork has been laid to support reading and using pseudo-merges, make use of that work in this commit by teaching the pack-bitmap machinery to use pseudo-merge(s) when available during traversal. The basic operation is as follows: - When enumerating objects on either side of a reachability query, first see if any subset of the roots satisfies some pseudo-merge bitmap. If it does, apply that pseudo-merge bitmap. - If any pseudo-merge bitmap(s) were applied in the previous step, OR them into the result[^1]. Then repeat the process over all pseudo-merge bitmaps (we'll refer to this as "cascading" pseudo-merges). Once this is done, OR in the resulting bitmap. - If there is no fill-in traversal to be done, return the bitmap for that side of the reachability query. If there is fill-in traversal, then for each commit we encounter via show_commit(), check to see if any unsatisfied pseudo-merges containing that commit as one of its parents has been made satisfied by the presence of that commit. If so, OR in the object set from that pseudo-merge bitmap, and then cascade. If not, continue traversal. A similar implementation is present in the boundary-based bitmap traversal routines. [^1]: Importantly, we cannot OR in the entire set of roots along with the objects reachable from whatever pseudo-merge bitmaps were satisfied. This may leave some dangling bits corresponding to any unsatisfied root(s) getting OR'd into the resulting bitmap, tricking other parts of the traversal into thinking we already have a reachability closure over those commit(s) when we do not. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-24	pack-bitmap: implement test helpers for pseudo-merge	Taylor Blau	1	-0/+126
	Implement three new sub-commands for the "bitmap" test-helper: - t/helper test-tool bitmap dump-pseudo-merges - t/helper test-tool bitmap dump-pseudo-merge-commits <n> - t/helper test-tool bitmap dump-pseudo-merge-objects <n> These three helpers dump the list of pseudo merges, the "parents" of the nth pseudo-merges, and the set of objects reachable from those parents, respectively. These helpers will be useful in subsequent patches when we add test coverage for pseudo-merge bitmaps. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-24	pack-bitmap.c: read pseudo-merge extension	Taylor Blau	1	-0/+39
	Now that the scaffolding for reading the pseudo-merge extension has been laid, teach the pack-bitmap machinery to read the pseudo-merge extension when present. Note that pseudo-merges themselves are not yet used during traversal, this step will be taken by a future commit. In the meantime, read the table and initialize the pseudo_merge_map structure introduced by a previous commit. When the pseudo-merge extension is present, `load_bitmap_header()` performs basic sanity checks to make sure that the table is well-formed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-24	pack-bitmap: extract `read_bitmap()` function	Taylor Blau	1	-9/+15
	The pack-bitmap machinery uses the `read_bitmap_1()` function to read a bitmap from within the mmap'd region corresponding to the .bitmap file. As as side-effect of calling this function, `read_bitmap_1()` increments the `index->map_pos` variable to reflect the number of bytes read. Extract the core of this routine to a separate function (that operates over a `const unsigned char `, a `size_t` and a `size_t ` pointer) instead of a `struct bitmap_index *` pointer. This function (called `read_bitmap()`) is part of the pack-bitmap.h API so that it can be used within the upcoming portion of the implementation in pseduo-merge.ch. Rewrite the existing function, `read_bitmap_1()`, in terms of its more generic counterpart. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-23	Merge branch 'ps/missing-btmp-fix'	Junio C Hamano	1	-20/+21
	GIt 2.44 introduced a regression that makes the updated code to barf in repositories with multi-pack index written by older versions of Git, which has been corrected. * ps/missing-btmp-fix: pack-bitmap: gracefully handle missing BTMP chunks
2024-04-15	pack-bitmap: gracefully handle missing BTMP chunks	Patrick Steinhardt	1	-20/+21
	In 0fea6b73f1 (Merge branch 'tb/multi-pack-verbatim-reuse', 2024-01-12) we have introduced multi-pack verbatim reuse of objects. This series has introduced a new BTMP chunk, which encodes information about bitmapped objects in the multi-pack index. Starting with dab60934e3 (pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions, 2023-12-14) we use this information to figure out objects which we can reuse from each of the packfiles. One thing that we glossed over though is backwards compatibility with repositories that do not yet have BTMP chunks in their multi-pack index. In that case, `nth_bitmapped_pack()` would return an error, which causes us to emit a warning followed by another error message. These warnings are visible to users that fetch from a repository: ``` $ git fetch ... remote: error: MIDX does not contain the BTMP chunk remote: warning: unable to load pack: 'pack-f6bb7bd71d345ea9fe604b60cab9ba9ece54ffbe.idx', disabling pack-reuse remote: Enumerating objects: 40, done. remote: Counting objects: 100% (40/40), done. remote: Compressing objects: 100% (39/39), done. remote: Total 40 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0) ... ``` While the fetch succeeds the user is left wondering what they did wrong. Furthermore, as visible both from the warning and from the reuse stats, pack-reuse is completely disabled in such repositories. What is quite interesting is that this issue can even be triggered in case `pack.allowPackReuse=single` is set, which is the default value. One could have expected that in this case we fall back to the old logic, which is to use the preferred packfile without consulting BTMP chunks at all. But either we fail with the above error in case they are missing, or we use the first pack in the multi-pack-index. The former case disables pack-reuse altogether, whereas the latter case may result in reusing objects from a suboptimal packfile. Fix this issue by partially reverting the logic back to what we had before this patch series landed. Namely, in the case where we have no BTMP chunks or when `pack.allowPackReuse=single` are set, we use the preferred pack instead of consulting the BTMP chunks. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06	Merge branch 'tb/pack-bitmap-drop-unused-struct-member'	Junio C Hamano	1	-7/+0
	Code clean-up. * tb/pack-bitmap-drop-unused-struct-member: pack-bitmap: drop unused `reuse_objects`
2024-01-29	pack-bitmap: drop unused `reuse_objects`	Taylor Blau	1	-7/+0
	This variable is no longer used for doing verbatim pack-reuse (or anywhere within pack-bitmap.c) since d2ea031046 (pack-bitmap: don't rely on bitmap_git->reuse_objects, 2019-12-18). Remove it to avoid an unused struct member. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14	pack-bitmap: enable reuse from all bitmapped packs	Taylor Blau	1	-10/+24
	Now that both the pack-bitmap and pack-objects code are prepared to handle marking and using objects from multiple bitmapped packs for verbatim reuse, allow marking objects from all bitmapped packs as eligible for reuse. Within the `reuse_partial_packfile_from_bitmap()` function, we no longer only mark the pack whose first object is at bit position zero for reuse, and instead mark any pack contained in the MIDX as a reuse candidate. Provide a handful of test cases in a new script (t5332) exercising interesting behavior for multi-pack reuse to ensure that we performed all of the previous steps correctly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14	pack-bitmap: prepare to mark objects from multiple packs for reuse	Taylor Blau	1	-66/+106
	Now that the pack-objects code is equipped to handle reusing objects from multiple packs, prepare the pack-bitmap code to mark objects from multiple packs as reuse candidates. In order to prepare the pack-bitmap code for this change, remove the same set of assumptions we unwound in previous commits from the helper function `reuse_partial_packfile_from_bitmap_1()`, in preparation for it to be called in a loop over the set of bitmapped packs in a following commit. Most importantly, we can no longer assume that the bit position corresponding to the first object in a given reuse pack candidate is at the beginning of the bitmap itself. For the single pack that this assumption is still true for (in MIDX bitmaps, this is the preferred pack, in single-pack bitmaps it is the pack the bitmap is tied to), we can still use our whole-words optimization. But for all subsequent packs, we can not make use of this optimization, since it assumes that all delta bases are being sent from the same pack, which would break if we are sending OFS_DELTAs down to the client. To understand why, consider two packs, P1 and P2 where: - P1 has object A which is a delta on base B - P2 has its own copy of B, in addition to other objects Suppose that the MIDX which covers P1 and P2 selected its copy of A from P1, but selected its copy of B from P2. Since A is a delta of B, but the base was selected from a different pack, sending the bytes corresponding to A as an OFS_DELTA verbatim from P1 would be incorrect, since we don't guarantee that B is in the same place relative to A in the generated pack as in P1. For now, we detect and reject these cross-pack deltas by searching for the (pack_id, offset) pair for the delta's base object (using the same pack_id as the pack containing the delta'd object) in the MIDX. If we find a match, that means that the MIDX did indeed pick the base object from the same pack, and we are OK to reuse the delta. If we don't find a match, however, that means that the base object was selected from a different pack in the MIDX, and we can let the slower path handle re-delta'ing our candidate object. In the future, there are a couple of other things we could do, namely: - Turn any cross-pack deltas (which are stored as OFS_DELTAs) into REF_DELTAs. We already do this today when reusing an OFS_DELTA without `--delta-base-offset` enabled, so it's not a huge stretch to do the same for cross-pack deltas even when `--delta-base-offset` is enabled. This would work, but would obviously result in larger-than-necessary packs, as we in theory could represent these cross-pack deltas by patching an existing OFS_DELTA. But it's not clear how much that would matter in practice. I suspect it would have a lot to do with how you pack your repository in the first place. - Finally, we could patch OFS_DELTAs across packs in a similar fashion as we do today for OFS_DELTAs within a single pack on either side of a gap. This would result in the smallest packs of the three options here, but implementing this would be more involved. At minimum, you'd have to keep the reusable chunks list for all reused packs, not just the one we're currently processing. And you'd have to ensure that any bases which are a part of cross-pack deltas appear before the delta. I think this is possible to do, but would require assembling the reusable chunks list potentially in a different order than they appear in the source packs. For now, let's pursue the simplest approach and reject any cross-pack deltas. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14	midx: implement `midx_preferred_pack()`	Taylor Blau	1	-10/+7
	When performing a binary search over the objects in a MIDX's bitmap (i.e. in pseudo-pack order), the reader reconstructs the pseudo-pack ordering using a combination of (a) the preferred pack, (b) the pack's lexical position in the MIDX based on pack names, and (c) the object offset within the pack. In order to perform this binary search, the reader must know the identity of the preferred pack. This could be stored in the MIDX, but isn't for historical reasons, mostly because it can easily be inferred at read-time by looking at the object in the first bit position and finding out which pack it was selected from in the MIDX, like so: nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); In midx_to_pack_pos() which performs this binary search, we look up the identity of the preferred pack before each search. This is relatively quick, since it involves two table-driven lookups (one in the MIDX's revindex for `pack_pos_to_midx()`, and another in the MIDX's object table for `nth_midxed_pack_int_id()`). But since the preferred pack does not change after the MIDX is written, it is safe to cache this value on the MIDX itself. Write a helper to do just that, and rewrite all of the existing call-sites that care about the identity of the preferred pack in terms of this new helper. This will prepare us for a subsequent patch where we will need to binary search through the MIDX's pseudo-pack order multiple times. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14	pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()`	Taylor Blau	1	-2/+4
	Further prepare for enabling verbatim pack-reuse over multiple packfiles by changing the signature of reuse_partial_packfile_from_bitmap() to populate an array of `struct bitmapped_pack `'s instead of a pointer to a single packfile. Since the array we're filling out is sized dynamically[^1], add an additional `size_t ` parameter which will hold the number of reusable packs (equal to the number of elements in the array). Note that since we still have not implemented true multi-pack reuse, these changes aren't propagated out to the rest of the caller in builtin/pack-objects.c. In the interim state, we expect that the array has a single element, and we use that element to fill out the static `reuse_packfile` variable (which is a bog-standard `struct packed_git *`). Future commits will continue to push this change further out through the pack-objects code. [^1]: That is, even though we know the number of packs which are candidates for pack-reuse, we do not know how many of those candidates we can actually reuse. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14	pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature	Taylor Blau	1	-9/+7
	The signature of `reuse_partial_packfile_from_bitmap()` currently takes in a bitmap, as well as three output parameters (filled through pointers, and passed as arguments), and also returns an integer result. The output parameters are filled out with: (a) the packfile used for pack-reuse, (b) the number of objects from that pack that we can reuse, and (c) a bitmap indicating which objects we can reuse. The return value is either -1 (when there are no objects to reuse), or 0 (when there is at least one object to reuse). Some of these parameters are redundant. Notably, we can infer from the bitmap how many objects are reused by calling bitmap_popcount(). And we can similar compute the return value based on that number as well. As such, clean up the signature of this function to drop the "*entries" parameter, as well as the int return value, since the single caller of this function can infer these values themself. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14	pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions	Taylor Blau	1	-31/+87
	When trying to assemble a pack with bitmaps using `--use-bitmap-index`, `pack-objects` asks the pack-bitmap machinery for a bitmap which indicates the set of objects we can "reuse" verbatim from on-disk. This set is roughly comprised of: a prefix of objects in the bitmapped pack (or preferred pack, in the case of a multi-pack reachability bitmap), plus any other objects not included in the prefix, excluding any deltas whose base we are not sending in the resulting pack. The pack-bitmap machinery is responsible for computing this bitmap, and does so with the following functions: - reuse_partial_packfile_from_bitmap() - try_partial_reuse() In the existing implementation, the first function is responsible for (a) marking the prefix of objects in the reusable pack, and then (b) calling try_partial_reuse() on any remaining objects to ensure that they are also reusable (and removing them from the bitmapped set if they are not). Likewise, the `try_partial_reuse()` function is responsible for checking whether an isolated object (that is, an object from the bitmapped pack/preferred pack not contained in the prefix from earlier) may be reused, i.e. that it isn't a delta of an object that we are not sending in the resulting pack. These functions are based on two core assumptions, which we will unwind in this and the following commits: 1. There is only a single pack from the bitmap which is eligible for verbatim pack-reuse. For single-pack bitmaps, this is trivially the bitmapped pack. For multi-pack bitmaps, this is (currently) the MIDX's preferred pack. 2. The pack eligible for reuse has its first object in bit position 0, and all objects from that pack follow in pack-order from that first bit position. In order to perform verbatim pack reuse over multiple packs, we must unwind these two assumptions. Most notably, in order to reuse bits from a given packfile, we need to know the first bit position occupied by an object form that packfile. To propagate this information around, pass a `struct bitmapped_pack ` anywhere we previously passed a `struct packed_git `, since the former contains the bitmap position we're interested in (as well as a pointer to the latter). As an additional step, factor out a sub-routine from the main `reuse_partial_packfile_from_bitmap()` function, called `reuse_partial_packfile_from_bitmap_1()`. This new function will be responsible for figuring out which objects may be reused from a single pack, and the existing function will dispatch multiple calls to its new helper function for each reusable pack. Consequently, `reuse_partial_packfile_from_bitmap()` will now maintain an array of reusable packs instead of a single such pack. We currently expect that array to have only a single element, so this awkward state is short-lived. It will serve as useful scaffolding in subsequent commits as we begin to work towards enabling multi-pack reuse. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14	pack-bitmap: plug leak in find_objects()	Taylor Blau	1	-0/+2
	The `find_objects()` function creates an object_list for any tips of the reachability query which do not have corresponding bitmaps. The object_list is not used outside of `find_objects()`, but we never free it with `object_list_free()`, resulting in a leak. Let's plug that leak by calling `object_list_free()`, which results in t6113 becoming leak-free. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-08	Merge branch 'tb/rev-list-unpacked-fix'	Junio C Hamano	1	-0/+27
	"git rev-list --unpacked --objects" failed to exclude packed non-commit objects, which has been corrected. * tb/rev-list-unpacked-fix: pack-bitmap: drop --unpacked non-commit objects from results list-objects: drop --unpacked non-commit objects from results
2023-11-07	pack-bitmap: drop --unpacked non-commit objects from results	Taylor Blau	1	-0/+27
	When performing revision queries with `--objects` and `--use-bitmap-index`, the output may incorrectly contain objects which are packed, even when the `--unpacked` option is given. This affects traversals, but also other querying operations, like `--count`, `--disk-usage`, etc. Like in the previous commit, the fix is to exclude those objects from the result set before they are shown to the user (or, in this case, before the bitmap containing the result of the traversal is enumerated and its objects listed). This is performed by a new function in pack-bitmap.c, called `filter_packed_objects_from_bitmap()`. Note that we do not have to inspect individual bits in the result bitmap, since we know that the first N (where N is the number of objects in the bitmap's pack/MIDX) bits correspond to objects which packed by definition. In other words, for an object to have a bitmap position (not in the extended index), it must appear in either the bitmap's pack or one of the packs in its MIDX. This presents an appealing optimization to us, which is that we can simply memset() the corresponding number of `eword_t`'s to zero, provided that we handle any objects which spill into the next word (but don't occupy all 64 bits of the word itself). We only have to handle objects in the bitmap's extended index. These objects may (or may not) appear in one or more pack(s). Since these objects are known to not appear in either the bitmap's MIDX or pack, they may be stored as loose, appear in other pack(s), or both. Before returning a bitmap containing the result of the traversal back to the caller, drop any bits from the extended index which appear in one or more packs. This implements the correct behavior for rev-list operations which use the bitmap index to compute their result. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-29	pack-bitmap: mark unused parameters in show_object callback	Jeff King	1	-2/+3
	This is similar to the cases in c50dca2a18 (list-objects: mark unused callback parameters, 2023-02-24), but was added after that commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-25	Merge branch 'tb/object-access-overflow-protection'	Junio C Hamano	1	-5/+7
	Various offset computation in the code that accesses the packfiles and other data in the object layer has been hardened against arithmetic overflow, especially on 32-bit systems. * tb/object-access-overflow-protection: commit-graph.c: prevent overflow in `verify_commit_graph()` commit-graph.c: prevent overflow in `write_commit_graph()` commit-graph.c: prevent overflow in `merge_commit_graph()` commit-graph.c: prevent overflow in `split_graph_merge_strategy()` commit-graph.c: prevent overflow in `load_tree_for_commit()` commit-graph.c: prevent overflow in `fill_commit_in_graph()` commit-graph.c: prevent overflow in `fill_commit_graph_info()` commit-graph.c: prevent overflow in `load_oid_from_graph()` commit-graph.c: prevent overflow in add_graph_to_chain() commit-graph.c: prevent overflow in `write_commit_graph_file()` pack-bitmap.c: ensure that eindex lookups don't overflow midx.c: prevent overflow in `fill_included_packs_batch()` midx.c: prevent overflow in `write_midx_internal()` midx.c: store `nr`, `alloc` variables as `size_t`'s midx.c: prevent overflow in `nth_midxed_offset()` midx.c: prevent overflow in `nth_midxed_object_oid()` midx.c: use `size_t`'s for fanout nr and alloc packfile.c: use checked arithmetic in `nth_packed_object_offset()` packfile.c: prevent overflow in `load_idx()` packfile.c: prevent overflow in `nth_packed_object_id()`
2023-07-14	pack-bitmap.c: ensure that eindex lookups don't overflow	Taylor Blau	1	-5/+7
	When a bitmap is used to answer some reachability query, it creates a pseudo-bitmap called the "extended index" on top of any existing bitmaps to store objects that are relevant to the query, but not mentioned in the bitmap. When looking up the ith object in the extended index in a bitmap, it is common to write something like: bitmap_get(result, i + bitmap_num_objects(bitmap_git)) , indicating that we want the ith object following all other objects mentioned in the bitmap_git. Since the type of `i` and the return type of `bitmap_num_objects()` are both `uint32_t`s, But if there are either a large number of objects in the bitmap, or a large number of objects in the extended index (or both), this addition can overflow when the sum is greater than 2^32-1. Having that large of a bitmap position is entirely acceptable, but we need to ensure that the computed bitmap position for that object is performed using 64-bits and doesn't overflow. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05	git-compat-util: move alloc macros to git-compat-util.h	Calvin Wan	1	-1/+0
	alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29	Merge branch 'en/header-split-cache-h-part-3'	Junio C Hamano	1	-1/+1
	Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...
2023-06-23	Merge branch 'tb/open-midx-bitmap-fallback'	Junio C Hamano	1	-3/+5
	Gracefully deal with a stale MIDX file that lists a packfile that no longer exists. * tb/open-midx-bitmap-fallback: pack-bitmap.c: gracefully degrade on failure to load MIDX'd pack
2023-06-22	Merge branch 'tb/pack-bitmap-traversal-with-boundary'	Junio C Hamano	1	-39/+202
	The object traversal using reachability bitmap done by "pack-object" has been tweaked to take advantage of the fact that using "boundary" commits as representative of all the uninteresting ones can save quite a lot of object enumeration. * tb/pack-bitmap-traversal-with-boundary: pack-bitmap.c: use commit boundary during bitmap traversal pack-bitmap.c: extract `fill_in_bitmap()` object: add object_array initializer helper function
2023-06-21	object-store-ll.h: split this header out of object-store.h	Elijah Newren	1	-1/+1
	The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store \| sort \| uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12	pack-bitmap.c: gracefully degrade on failure to load MIDX'd pack	Taylor Blau	1	-3/+5
	When opening a MIDX bitmap, we the pack-bitmap machinery eagerly calls `prepare_midx_pack()` on each of the packs contained in the MIDX. This is done in order to populate the array of `struct packed_git `s held by the MIDX, which we need later on in `load_reverse_index()`, since it calls `load_pack_revindex()` on each of the MIDX'd packs, and requires that the caller provide a pointer to a `struct packed_git`. When opening one of these packs fails, the pack-bitmap code will `die()` indicating that it can't open one of the packs in the MIDX. This indicates that the MIDX is somehow broken with respect to the current state of the repository. When this is the case, we indeed cannot make use of the MIDX bitmap to speed up reachability traversals. However, it does not mean that we can't perform reachability traversals at all. In other failure modes, that same function calls `warning()` and then returns -1, indicating to its caller (`open_bitmap()`) that we should either look for a pack bitmap if one is available, or perform normal object traversal without using bitmaps at all. There is no reason why this case should cause us to die. If we instead continued (by jumping to `cleanup` as this patch does) and avoid using bitmaps altogether, we may again try and query the MIDX, which will also fail. But when trying to call `fill_midx_entry()` fails, it also returns a signal of its failure, and prompts the caller to try and locate the object elsewhere. In other words, the normal object traversal machinery works fine in the presence of a corrupt MIDX, so there is no reason that the MIDX bitmap machinery should abort in that case when we could easily continue. Note that we could* in theory try again to load a MIDX bitmap after calling `reprepare_packed_git()`. Even though the `prepare_packed_git()` code is careful to avoid adding a pack that we already have, `prepare_midx_pack()` is not. So if we got part of the way through calling `prepare_midx_pack()` on a stale MIDX, and then tried again on a fresh MIDX that contains some of the same packs, we would end up with a loop through the `->next` pointer. For now, let's do the simplest thing possible and fallback to the non-bitmap code when we detect a stale MIDX so that the complete fix as above can be implemented carefully. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-08	pack-bitmap.c: use commit boundary during bitmap traversal	Taylor Blau	1	-13/+169
	When reachability bitmap coverage exists in a repository, Git will use a different (and hopefully faster) traversal to compute revision walks. Consider a set of positive and negative tips (which we'll refer to with their standard bitmap parlance by "wants", and "haves"). In order to figure out what objects exist between the tips, the existing traversal in `prepare_bitmap_walk()` does something like: 1. Consider if we can even compute the set of objects with bitmaps, and fall back to the usual traversal if we cannot. For example, pathspec limiting traversals can't be computed using bitmaps (since they don't know which objects are at which paths). The same is true of certain kinds of non-trivial object filters. 2. If we can compute the traversal with bitmaps, partition the (dereferenced) tips into two object lists, "haves", and "wants", based on whether or not the objects have the UNINTERESTING flag, respectively. 3. Fall back to the ordinary object traversal if either (a) there are more than zero haves, none of which are in the bitmapped pack or MIDX, or (b) there are no wants. 4. Construct a reachability bitmap for the "haves" side by walking from the revision tips down to any existing bitmaps, OR-ing in any bitmaps as they are found. 5. Then do the same for the "wants" side, stopping at any objects that appear in the "haves" bitmap. 6. Filter the results if any object filter (that can be easily computed with bitmaps alone) was given, and then return back to the caller. When there is good bitmap coverage relative to the traversal tips, this walk is often significantly faster than an ordinary object traversal because it can visit far fewer objects. But in certain cases, it can be significantly slower than the usual object traversal. Why? Because we need to compute complete bitmaps on either side of the walk. If either one (or both) of the sides require walking many (or all!) objects before they get to an existing bitmap, the extra bitmap machinery is mostly or all overhead. One of the benefits, however, is that even if the walk is slower, bitmap traversals are guaranteed to provide an exact answer. Unlike the traditional object traversal algorithm, which can over-count the results by not opening trees for older commits, the bitmap walk builds an exact reachability bitmap for either side, meaning the results are never over-counted. But producing non-exact results is OK for our traversal here (both in the bitmap case and not), as long as the results are over-counted, not under. Relaxing the bitmap traversal to allow it to produce over-counted results gives us the opportunity to make some significant improvements. Instead of the above, the new algorithm only has to walk from the boundary down to the nearest bitmap, instead of from each of the UNINTERESTING tips. The boundary-based approach still has degenerate cases, but we'll show in a moment that it is often a significant improvement. The new algorithm works as follows: 1. Build a (partial) bitmap of the haves side by first OR-ing any bitmap(s) that already exist for UNINTERESTING commits between the haves and the boundary. 2. For each commit along the boundary, add it as a fill-in traversal tip (where the traversal terminates once an existing bitmap is found), and perform fill-in traversal. 3. Build up a complete bitmap of the wants side as usual, stopping any time we intersect the (partial) haves side. 4. Return the results. And is more-or-less equivalent to using the old algorithm with this invocation: $ git rev-list --objects --use-bitmap-index $WANTS --not \ $(git rev-list --objects --boundary $WANTS --not $HAVES \| perl -lne 'print $1 if /^-(.)/') The new result performs significantly better in many cases, particularly when the distance from the boundary commit(s) to an existing bitmap is shorter than the distance from (all of) the have tips to the nearest bitmapped commit. Note that when using the old bitmap traversal algorithm, the results can be slower* than without bitmaps! Under the new algorithm, the result is computed faster with bitmaps than without (at the cost of over-counting the true number of objects in a similar fashion as the non-bitmap traversal): # (Computing the number of tagged objects not on any branches # without bitmaps). $ time git rev-list --count --objects --tags --not --branches 20 real 0m1.388s user 0m1.092s sys 0m0.296s # (Computing the same query using the old bitmap traversal). $ time git rev-list --count --objects --tags --not --branches --use-bitmap-index 19 real 0m22.709s user 0m21.628s sys 0m1.076s # (this commit) $ time git.compile rev-list --count --objects --tags --not --branches --use-bitmap-index 19 real 0m1.518s user 0m1.234s sys 0m0.284s The new algorithm is still slower than not using bitmaps at all, but it is nearly a 15-fold improvement over the existing traversal. In a more realistic setting (using my local copy of git.git), I can observe a similar (if more modest) speed-up: $ argv="--count --objects --branches --not --tags" hyperfine \ -n 'no bitmaps' "git.compile rev-list $argv" \ -n 'existing traversal' "git.compile rev-list --use-bitmap-index $argv" \ -n 'boundary traversal' "git.compile -c pack.useBitmapBoundaryTraversal=true rev-list --use-bitmap-index $argv" Benchmark 1: no bitmaps Time (mean ± σ): 124.6 ms ± 2.1 ms [User: 103.7 ms, System: 20.8 ms] Range (min … max): 122.6 ms … 133.1 ms 22 runs Benchmark 2: existing traversal Time (mean ± σ): 368.6 ms ± 3.0 ms [User: 325.3 ms, System: 43.1 ms] Range (min … max): 365.1 ms … 374.8 ms 10 runs Benchmark 3: boundary traversal Time (mean ± σ): 167.6 ms ± 0.9 ms [User: 139.5 ms, System: 27.9 ms] Range (min … max): 166.1 ms … 169.2 ms 17 runs Summary 'no bitmaps' ran 1.34 ± 0.02 times faster than 'boundary traversal' 2.96 ± 0.05 times faster than 'existing traversal' Here, the new algorithm is also still slower than not using bitmaps, but represents a more than 2-fold improvement over the existing traversal in a more modest example. Since this algorithm was originally written (nearly a year and a half ago, at the time of writing), the bitmap lookup table shipped, making the new algorithm's result more competitive. A few other future directions for improving bitmap traversal times beyond not using bitmaps at all: - Decrease the cost to decompress and OR together many bitmaps together (particularly when enumerating the uninteresting side of the walk). Here we could explore more efficient bitmap storage techniques, like Roaring+Run and/or use SIMD instructions to speed up ORing them together. - Store pseudo-merge bitmaps, which could allow us to OR together fewer "summary" bitmaps (which would also help with the above). Helped-by: Jeff King <peff@peff.net> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-08	pack-bitmap.c: extract `fill_in_bitmap()`	Taylor Blau	1	-29/+36
	To prepare for the boundary-based bitmap walk to perform a fill-in traversal using the boundary of either side as the tips, extract routine used to perform fill-in traversal by `find_objects()` so that it can be used in both places. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-02	fsck: verify checksums of all .bitmap files	Derrick Stolee	1	-0/+45
	If a filesystem-level corruption occurs in a .bitmap file, Git can react poorly. This could take the form of a run-time error due to failing to parse an EWAH bitmap or be more subtle such as returning the wrong set of objects to a fetch or clone. A natural first response to either of these kinds of errors is to run 'git fsck' to see if any files are corrupt. This currently ignores all .bitmap files. Add checks to 'git fsck' for all .bitmap files that are currently associated with a multi-pack-index or pack file. Verify their checksums using the hashfile API. We iterate through all multi-pack-indexes and pack-files to be sure to check all .bitmap files, not just the one that would be read by the process. For example, a multi-pack-index bitmap overrules a pack-bitmap. However, if the multi-pack-index is removed, the pack-bitmap may be selected instead. Be thorough to include every file that could become active in such a way. This includes checking files in alternates. There is potential that we could extend this effort to check the structure of the reachability bitmaps themselves, but it is very expensive to do so. At minimum, it's as expensive as generating the bitmaps in the first place, and that's assuming that we don't use the trivial algorithm of verifying each bitmap individually. The trivial algorithm will result in quadratic behavior (number of objects times number of bitmapped commits) while the bitmap building operation constructs a lattice of commits to build bitmaps incrementally and then generate the final bitmaps from a subset of those commits. If we were to extend 'git fsck' to check .bitmap file contents more closely like this, then we would likely want to hide it behind an option that signals the user is more willing to do expensive operations such as this. For testing, set up a repository with a pack-bitmap _and_ a multi-pack-index bitmap. This requires some file movement to avoid deleting the pack-bitmap during the repack that creates the multi-pack-index bitmap. We can then verify that 'git fsck' is checking all files, not just the "active" bitmap. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27	Merge branch 'ds/fsck-pack-revindex'	Junio C Hamano	1	-2/+2
	"git fsck" learned to validate the on-disk pack reverse index files. * ds/fsck-pack-revindex: fsck: validate .rev file header fsck: check rev-index position values fsck: check rev-index checksums fsck: create scaffolding for rev-index checks
2023-04-27	Merge branch 'tb/pack-revindex-on-disk'	Junio C Hamano	1	-10/+13
	The on-disk reverse index that allows mapping from the pack offset to the object name for the object stored at the offset has been enabled by default. * tb/pack-revindex-on-disk: t: invert `GIT_TEST_WRITE_REV_INDEX` config: enable `pack.writeReverseIndex` by default pack-revindex: introduce `pack.readReverseIndex` pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK pack-revindex: make `load_pack_revindex` take a repository t5325: mark as leak-free pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-25	Merge branch 'en/header-split-cache-h'	Junio C Hamano	1	-1/+3
	Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-17	fsck: validate .rev file header	Derrick Stolee	1	-2/+2
	While parsing a .rev file, we check the header information to be sure it makes sense. This happens before doing any additional validation such as a checksum or value check. In order to differentiate between a bad header and a non-existent file, we need to update the API for loading a reverse index. Make load_pack_revindex_from_disk() non-static and specify that a positive value means "the file does not exist" while other errors during parsing are negative values. Since an invalid header prevents setting up the structures we would use for further validations, we can stop at that point. The place where we can distinguish between a missing file and a corrupt file is inside load_revindex_from_disk(), which is used both by pack rev-indexes and multi-pack-index rev-indexes. Some tests in t5326 demonstrate that it is critical to take some conditions to allow positive error signals. Add tests that check the three header values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13	pack-revindex: make `load_pack_revindex` take a repository	Taylor Blau	1	-10/+13
	In a future commit, we will introduce a `pack.readReverseIndex` configuration, which forces Git to generate the reverse index from scratch instead of loading it from disk. In order to avoid reading this configuration value more than once, we'll use the `repo_settings` struct to lazily load this value. In order to access the `struct repo_settings`, add a repository argument to `load_pack_revindex`, and update all callers to pass the correct instance (in all cases, `the_repository`). In certain instances, a new function-local variable is introduced to take the place of a `struct repository *` argument to the function itself to avoid propagating the new parameter even further throughout the tree. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11	treewide: remove cache.h inclusion due to object-file.h changes	Elijah Newren	1	-1/+1
	Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11	object-file.h: move declarations for object-file.c functions from cache.h	Elijah Newren	1	-0/+1
	Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11	treewide: be explicit about dependence on trace.h & trace2.h	Elijah Newren	1	-0/+1
	Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06	Merge branch 'en/header-split-cleanup'	Junio C Hamano	1	-1/+2
	Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06	Merge branch 'ab/config-multi-and-nonbool'	Junio C Hamano	1	-1/+5
	Assorted config API updates. * ab/config-multi-and-nonbool: for-each-repo: with bad config, don't conflate <path> and <cmd> config API: add "string" version of _value_multi(), fix segfaults config API users: test for _get_value_multi() segfaults for-each-repo: error on bad --config config API: have _multi() return an "int" and take a "dest" versioncmp.c: refactor config reading next commit config API: add and use a "git_config_get()" family of functions config tests: add "NULL" tests for _get_value_multi() config tests: cover blind spots in git_die_config() tests
2023-03-28	config API: add "string" version of *_value_multi(), fix segfaults	Ævar Arnfjörð Bjarmason	1	-1/+1
	Fix numerous and mostly long-standing segfaults in consumers of the _config_value_multi() API. As discussed in the preceding commit an empty key in the config syntax yields a "NULL" string, which these users would give to strcmp() (or similar), resulting in segfaults. As this change shows, most users users of the _config_value_multi() API didn't really want such an an unsafe and low-level API, let's give them something with the safety of git_config_get_string() instead. This fix is similar to what the _string() functions and others acquired in[1] and [2]. Namely introducing and using a safer "_get_string_multi()" variant of the low-level "_value_multi()" function. This fixes segfaults in code introduced in: - d811c8e17c6 (versionsort: support reorder prerelease suffixes, 2015-02-26) - c026557a373 (versioncmp: generalize version sort suffix reordering, 2016-12-08) - a086f921a72 (submodule: decouple url and submodule interest, 2017-03-17) - a6be5e6764a (log: add log.excludeDecoration config option, 2020-04-16) - 92156291ca8 (log: add default decoration filter, 2022-08-05) - 50a044f1e40 (gc: replace config subprocesses with API calls, 2022-09-27) There are now two users ofthe low-level API: - One in "builtin/for-each-repo.c", which we'll convert in a subsequent commit. - The "t/helper/test-config.c" code added in [3]. As seen in the preceding commit we need to give the "t/helper/test-config.c" caller these "NULL" entries. We could also alter the underlying git_configset_get_value_multi() function to be "string safe", but doing so would leave no room for other variants of "_get_value_multi()" that coerce to other types. Such coercion can't be built on the string version, since as we've established "NULL" is a true value in the boolean context, but if we coerced it to "" for use in a list of strings it'll be subsequently coerced to "false" as a boolean. The callback pattern being used here will make it easy to introduce e.g. a "multi" variant which coerces its values to "bool", "int", "path" etc. 1. 40ea4ed9032 (Add config_error_nonbool() helper function, 2008-02-11) 2. 6c47d0e8f39 (config.c: guard config parser from value=NULL, 2008-02-11). 3. 4c715ebb96a (test-config: add tests for the config_set API, 2014-07-28) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28	config API: have *_multi() return an "int" and take a "dest"	Ævar Arnfjörð Bjarmason	1	-1/+5
	Have the "git_configset_get_value_multi()" function and its siblings return an "int" and populate a "*dest" parameter like every other git_configset_get_()" in the API. As we'll take advantage of in subsequent commits, this fixes a blind spot in the API where it wasn't possible to tell whether a list was empty from whether a config key existed. For now we don't make use of those new return values, but faithfully convert existing API users. Most of this is straightforward, commentary on cases that stand out: - To ensure that we'll properly use the return values of this function in the future we're using the "RESULT_MUST_BE_USED" macro introduced in [1]. As git_die_config() now has to handle this return value let's have it BUG() if it can't find the config entry. As tested for in a preceding commit we can rely on getting the config list in git_die_config(). - The loops after getting the "list" value in "builtin/gc.c" could also make use of "unsorted_string_list_has_string()" instead of using that loop, but let's leave that for now. - In "versioncmp.c" we now use the return value of the functions, instead of checking if the lists are still non-NULL. 1. 1e8697b5c4e (submodule--helper: check repo{_submodule,}_init() return values, 2022-09-01), Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21	csum-file.h: remove unnecessary inclusion of cache.h	Elijah Newren	1	-1/+1
	With the change in the last commit to move several functions to write-or-die.h, csum-file.h no longer needs to include cache.h. However, removing that include forces several other C files, which directly or indirectly dependend upon csum-file.h's inclusion of cache.h, to now be more explicit about their dependencies. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21	treewide: be explicit about dependence on gettext.h	Elijah Newren	1	-0/+1
	Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17	Merge branch 'jk/unused-post-2.39-part2'	Junio C Hamano	1	-2/+4
	More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24	list-objects: mark unused callback parameters	Jeff King	1	-2/+4
	Our graph-traversal functions take callbacks for showing commits and objects, but not all callbacks need each parameter. Likewise for the similar traverse_bitmap_commit_list(), which has a different interface but serves the same purpose. And the include_check mechanism, which passes along a void pointer which is not always used. Mark the unused ones to to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23	cache.h: remove dependence on hex.h; make other files include it explicitly	Elijah Newren	1	-0/+1
	Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23	alloc.h: move ALLOC_GROW() functions from cache.h	Elijah Newren	1	-1/+2
	This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-29	pack-bitmap.c: trace bitmap ignore logs when midx-bitmap is found	Jeff King	1	-5/+12
	When we find a midx bitmap, we do not bother checking for pack bitmaps, since we can use only one. But since we will warn of unused bitmaps via trace2, let's continue looking for pack bitmaps when tracing is enabled. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-29	pack-bitmap.c: break out of the bitmap loop early if not tracing	Jeff King	1	-1/+8
	After opening a bitmap successfully, we try opening others only because we want to report that other bitmap files are ignored in the trace2 log. When trace2 is not enabled, we do not have to do any of that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-14	pack-bitmap.c: avoid exposing absolute paths	Teng Long	1	-4/+6
	In "open_midx_bitmap_1()" and "open_pack_bitmap_1()", when we find that there are multiple bitmaps, we will only open the first one and then leave warnings about the remaining pack information, the information will contain the absolute path of the repository, for example in a alternates usage scenario. So let's hide this kind of potentially sensitive information in this commit. Found-by: XingXin <moweng.xx@antgroup.com> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-14	pack-bitmap.c: remove unnecessary "open_pack_index()" calls	Teng Long	1	-3/+0
	When trying to open a pack bitmap, we call open_pack_bitmap_1() in a loop, during which it tries to open up the pack index corresponding with each available pack. It's likely that we'll end up relying on objects in that pack later in the process (in which case we're doing the work of opening the pack index optimistically), but not guaranteed. For instance, consider a repository with a large number of small packs, and one large pack with a bitmap. If we see that bitmap pack last in our loop which calls open_pack_bitmap_1(), the current code will have opened all pack index files in the repository. If the request can be served out of the bitmapped pack alone, then the time spent opening these idx files was wasted.S Since open_pack_bitmap_1() calls is_pack_valid() later on (which in turns calls open_pack_index() itself), we can just drop the earlier call altogether. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-09-26	Merge branch 'ds/bitmap-lookup-remove-tracing'	Junio C Hamano	1	-2/+1
	Perf-fix. * ds/bitmap-lookup-remove-tracing: pack-bitmap: remove trace2 region from hot path
2022-09-26	pack-bitmap: remove trace2 region from hot path	Derrick Stolee	1	-2/+1
	The trace2 region around the call to lazy_bitmap_for_commit() in bitmap_for_commit() was added in 28cd730680d (pack-bitmap: prepare to read lookup table extension, 2022-08-14). While adding trace2 regions is typically helpful for tracking performance, this method is called possibly thousands of times as a commit walk explores commit history looking for a matching bitmap. When trace2 output is enabled, this region is emitted many times and performance is throttled by that output. For now, remove these regions entirely. This is a critical path, and it would be valuable to measure that the time spent in bitmap_for_commit() does not increase when using the commit lookup table. The best way to do that would be to use a mechanism that sums the time spent in a region and reports a single value at the end of the process. This technique was introduced but not merged by [1] so maybe this example presents some justification to revisit that approach. [1] https://lore.kernel.org/git/pull.1099.v2.git.1640720202.gitgitgadget@gmail.com/ To help with the 'git blame' output in this region, add a comment that warns against adding a trace2 region. Delete a test from t5310 that used that trace output to check that this lookup optimization was activated. To create this kind of test again in the future, the stopwatch traces mentioned earlier could be used as a signal that we activated this code path. Helpedy-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-23	pack-bitmap: improve grammar of "xor chain" error message	Alex Henrie	1	-1/+1
	Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-26	pack-bitmap: prepare to read lookup table extension	Abhradeep Chakraborty	1	-9/+281
	Earlier change teaches Git to write bitmap lookup table. But Git does not know how to parse them. Teach Git to parse the existing bitmap lookup table. The older versions of Git are not affected by it. Those versions ignore the lookup table. Mentored-by: Taylor Blau <me@ttaylorr.com> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-27	Merge branch 'tl/pack-bitmap-error-messages'	Junio C Hamano	1	-45/+58
	Tweak various messages that come from the pack-bitmap codepaths. * tl/pack-bitmap-error-messages: pack-bitmap.c: continue looping when first MIDX bitmap is found pack-bitmap.c: using error() instead of silently returning -1 pack-bitmap.c: do not ignore error when opening a bitmap file pack-bitmap.c: rename "idx_name" to "bitmap_name" pack-bitmap.c: mark more strings for translations pack-bitmap.c: fix formatting of error messages
2022-07-18	pack-bitmap.c: continue looping when first MIDX bitmap is found	Teng Long	1	-2/+3
	In "open_midx_bitmap()", we do a loop with the MIDX(es) in repo, when the first one has been found, then will break out by a "return" directly. But actually, it's better to continue the loop until we have visited both the MIDX in our repository, as well as any alternates (along with _their_ alternates, recursively). The reason for this is, there may exist more than one MIDX file in a repo. The "multi_pack_index" struct is actually designed as a singly linked list, and if a MIDX file has been already opened successfully, then the other MIDX files will be skipped and left with a warning "ignoring extra bitmap file." to the output. The discussion link of community: https://public-inbox.org/git/YjzCTLLDCby+kJrZ@nand.local/ Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: using error() instead of silently returning -1	Teng Long	1	-1/+5
	In "open_pack_bitmap_1()" and "open_midx_bitmap_1()", it's better to return error() instead of "-1" when some unexpected error occurs like "stat bitmap file failed", "bitmap header is invalid" or "checksum mismatch", etc. There are places where we do not replace, such as when the bitmap does not exist (no bitmap in repository is allowed) or when another bitmap has already been opened (in which case it should be a warning rather than an error). Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: do not ignore error when opening a bitmap file	Teng Long	1	-5/+12
	Calls to git_open() to open the pack bitmap file and multi-pack bitmap file do not report any error when they fail. These files are optional and it is not an error if open failed due to ENOENT, but we shouldn't be ignoring other kinds of errors. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: rename "idx_name" to "bitmap_name"	Teng Long	1	-7/+7
	In "open_pack_bitmap_1()" and "open_midx_bitmap_1()" we use a var named "idx_name" to represent the bitmap filename which is computed by "midx_bitmap_filename()" or "pack_bitmap_filename()" before we open it. There may bring some confusion in this "idx_name" naming, which might lead us to think of ".idx "or" multi-pack-index" files, although bitmap is essentially can be understood as a kind of index, let's define this name a little more accurate here. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: mark more strings for translations	Teng Long	1	-24/+24
	In pack-bitmap.c, some printed texts are translated, some are not. Let's support the translations of the bitmap related output. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: fix formatting of error messages	Teng Long	1	-23/+24
	There are some text output issues in 'pack-bitmap.c', they exist in die(), error() etc. This includes issues with capitalization the first letter, newlines, error() instead of BUG(), and substitution that don't have quotes around them. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-03	Merge branch 'tb/midx-race-in-pack-objects'	Junio C Hamano	1	-2/+16
	The multi-pack-index code did not protect the packfile it is going to depend on from getting removed while in use, which has been corrected. * tb/midx-race-in-pack-objects: builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects builtin/pack-objects.c: ensure included `--stdin-packs` exist builtin/pack-objects.c: avoid redundant NULL check pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
2022-05-24	pack-bitmap.c: check preferred pack validity when opening MIDX bitmap	Taylor Blau	1	-2/+16
	When pack-objects adds an entry to its packing list, it marks the packfile and offset containing the object, which we may later use during verbatim reuse (c.f., `write_reused_pack_verbatim()`). If the packfile in question is deleted in the background (e.g., due to a concurrent `git repack`), we'll die() as a result of calling use_pack(), unless we have an open file descriptor on the pack itself. 4c08018204 (pack-objects: protect against disappearing packs, 2011-10-14) worked around this by opening the pack ahead of time before recording it as a valid source for reuse. 4c08018204's treatment meant that we could tolerate disappearing packs, since it ensures we always have an open file descriptor on any pack that we mark as a valid source for reuse. This tightens the race to only happen when we need to close an open pack's file descriptor (c.f., the caller of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted, in which case we'll complain that a pack could not be accessed and die(). The pack bitmap code does this, too, since prior to dc1daacdcc (pack-bitmap: check pack validity when opening bitmap, 2021-07-23) it was vulnerable to the same race. The MIDX bitmap code does not do this, and is vulnerable to the same race. Apply the same treatment as dc1daacdcc to the routine responsible for opening the multi-pack bitmap's preferred pack to close this race. This patch handles the "preferred" pack (c.f., the section "multi-pack-index reverse indexes" in Documentation/technical/pack-format.txt) specially, since pack-objects depends on reusing exact chunks of that pack verbatim in reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded, the utility of a bitmap is significantly diminished. Similar to dc1daacdcc, we could technically just add this check in reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX .bitmap without needing to open any of its packs. But it's simpler to do the check as early as possible, covering all direct uses of the preferred pack. Note that doing this check early requires us to call prepare_midx_pack() early, too, so move the relevant part of that loop from load_reverse_index() into open_midx_bitmap_1(). Subsequent patches handle the non-preferred packs in a slightly different fashion. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20	Merge branch 'ep/maint-equals-null-cocci'	Junio C Hamano	1	-7/+7
	Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02	Merge branch 'ep/maint-equals-null-cocci' for maint-2.35	Junio C Hamano	1	-7/+7
	* ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02	tree-wide: apply equals-null.cocci	Junio C Hamano	1	-7/+7
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09	list-objects: consolidate traverse_commit_list[_filtered]	Derrick Stolee	1	-3/+3
	Now that all consumers of traverse_commit_list_filtered() populate the 'filter' member of 'struct rev_info', we can drop that parameter from the method prototype to simplify things. In addition, the only thing different now between traverse_commit_list_filtered() and traverse_commit_list() is the presence of the 'omitted' parameter, which is only non-NULL for one caller. We can consolidate these two methods by having one call the other and use the simpler form everywhere the 'omitted' parameter would be NULL. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09	pack-bitmap: drop filter in prepare_bitmap_walk()	Derrick Stolee	1	-11/+9
	Now that all consumers of prepare_bitmap_walk() have populated the 'filter' member of 'struct rev_info', we can drop that extra parameter from the method and access it directly from the 'struct rev_info'. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27	pack-bitmap.c: gracefully fallback after opening pack/MIDX	Taylor Blau	1	-0/+4
	When opening a MIDX/pack-bitmap, we call open_midx_bitmap_1() or open_pack_bitmap_1() respectively in a loop over the set of MIDXs/packs. By design, these functions are supposed to be called over every pack and MIDX, since only one of them should have a valid bitmap. Ordinarily we return '0' from these two functions in order to indicate that we successfully loaded a bitmap To signal that we couldn't load a bitmap corresponding to the MIDX/pack (either because one doesn't exist, or because there was an error with loading it), we can return '-1'. In either case, the callers each enumerate all MIDXs/packs to ensure that at most one bitmap per-kind is present. But when we fail to load a bitmap that does exist (for example, loading a MIDX bitmap without finding a corresponding reverse index), we'll return -1 but leave the 'midx' field non-NULL. So when we fallback to loading a pack bitmap, we'll complain that the bitmap we're trying to populate already is "opened", even though it isn't. Rectify this by setting the '->pack' and '->midx' field back to NULL as appropriate. Two tests are added: one to ensure that the MIDX-to-pack bitmap fallback works, and another to ensure we still complain when there are multiple pack bitmaps in a repository. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-10	Merge branch 'jk/test-bitmap-fix'	Junio C Hamano	1	-1/+1
	Tighten code for testing pack-bitmap. * jk/test-bitmap-fix: test_bitmap_hashes(): handle repository without bitmaps
2021-11-05	test_bitmap_hashes(): handle repository without bitmaps	Jeff King	1	-1/+1
	If prepare_bitmap_git() returns NULL (one easy-to-trigger cause being that the repository does not have bitmaps at all), then we'll segfault accessing bitmap_git->hashes: $ t/helper/test-tool bitmap dump-hashes Segmentation fault We should treat this the same as a repository with bitmaps but no name-hashes, and quietly produce an empty output. The later call to free_bitmap_index() in the cleanup label is OK, as it treats a NULL pointer as a noop. This isn't a big deal in practice, as this function is intended for and used only by test-tool. It's probably worth fixing to avoid confusion, but not worth adding coverage for this to the test suite. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-28	pack-bitmap.c: more aggressively free in free_bitmap_index()	Taylor Blau	1	-0/+8
	The function free_bitmap_index() is somewhat lax in what it frees. There are two notable examples: - While it does call kh_destroy_oid_map on the "bitmaps" map, which maps commit OIDs to their corresponding bitmaps, the bitmaps themselves are not freed. Note here that we recycle already-freed ewah_bitmaps into a pool, but these are handled correctly by ewah_pool_free(). - We never bother to free the extended index's "positions" map, which we always allocate in load_bitmap(). Fix both of these. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-28	pack-bitmap.c: don't leak type-level bitmaps	Taylor Blau	1	-0/+6
	test_bitmap_walk() is used to implement `git rev-list --test-bitmap`, which compares the result of the on-disk bitmaps with ones generated on-the-fly during a revision walk. In fa95666a40 (pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps, 2021-08-24), we hardened those tests to also check the four special type-level bitmaps, but never freed those bitmaps. We should have, since each required an allocation when we EWAH-decompressed them. Free those, plugging that leak, and also free the base (the scratch-pad bitmap), too. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-28	midx.c: write MIDX filenames to strbuf	Taylor Blau	1	-5/+10
	To ask for the name of a MIDX and its corresponding .rev file, callers invoke get_midx_filename() and get_midx_rev_filename(), respectively. These both invoke xstrfmt(), allocating a chunk of memory which must be freed later on. This makes callers in pack-bitmap.c somewhat awkward. Specifically, midx_bitmap_filename(), which is implemented like: return xstrfmt("%s-%s.bitmap", get_midx_filename(midx->object_dir), hash_to_hex(get_midx_checksum(midx))); this leaks the second argument to xstrfmt(), which itself was allocated with xstrfmt(). This caller could assign both the result of get_midx_filename() and the outer xstrfmt() to a temporary variable, remembering to free() the former before returning. But that involves a wasteful copy. Instead, get_midx_filename() and get_midx_rev_filename() take a strbuf as an output parameter. This way midx_bitmap_filename() can manipulate and pass around a temporary buffer which it detaches back to its caller. That allows us to implement the function without copying or open-coding get_midx_filename() in a way that doesn't leak. Update the other callers of get_midx_filename() and get_midx_rev_filename() accordingly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-18	Merge branch 'tb/repack-write-midx'	Junio C Hamano	1	-1/+1
	"git repack" has been taught to generate multi-pack reachability bitmaps. * tb/repack-write-midx: test-read-midx: fix leak of bitmap_index struct builtin/repack.c: pass `--refs-snapshot` when writing bitmaps builtin/repack.c: make largest pack preferred builtin/repack.c: support writing a MIDX while repacking builtin/repack.c: extract showing progress to a variable builtin/repack.c: rename variables that deal with non-kept packs builtin/repack.c: keep track of existing packs unconditionally midx: preliminary support for `--refs-snapshot` builtin/multi-pack-index.c: support `--stdin-packs` mode midx: expose `write_midx_file_only()` publicly
2021-09-28	builtin/repack.c: make largest pack preferred	Taylor Blau	1	-1/+1
	When repacking into a geometric series and writing a multi-pack bitmap, it is beneficial to have the largest resulting pack be the preferred object source in the bitmap's MIDX, since selecting the large packs can lead to fewer broken delta chains and better compression. Teach 'git repack' to identify this pack and pass it to the MIDX write machinery in order to mark it as preferred. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-14	pack-bitmap.c: propagate namehash values from existing bitmaps	Taylor Blau	1	-6/+8
	When an old bitmap exists while writing a new one, we load it and build a "reposition" table which maps bit positions of objects from the old bitmap to their respective positions in the new bitmap. This can help when we encounter a commit which was selected in both the old and new bitmap, since we only need to permute its bit (not recompute it from scratch). We do not, however, repurpose existing namehash values in the case of the hash-cache extension. There has been thus far no good reason to do so, since all of the namehash values for objects in the new bitmap would be populated during the traversal that was just performed by pack-objects when generating single-pack reachability bitmaps. But this isn't the case for multi-pack bitmaps, which are written via `git multi-pack-index write --bitmap` and do not perform any traversal. In this case all namehash values are set to zero, but we don't even bother to check the `pack.writeBitmapHashcache` option anyway, so it fails to matter. There are two approaches we could take to fill in non-zero hash-cache values: - have either the multi-pack-index builtin run its own traversal to attempt to fill in some values, or let a hypothetical caller (like `pack-objects` when `repack` eventually drives the `multi-pack-index` builtin) fill in the values they found during their traversal - or copy any existing namehash values that were stored in an existing bitmap to their corresponding positions in the new bitmap In a system where a repository is generally repacked with `git repack --geometric=<d>` and occasionally repacked with `git repack -a`, the hash-cache coverage will tend towards all objects. Since populating the hash-cache is additive (i.e., doing so only helps our delta search), any intermediate lack of full coverage is just fine. So let's start by just propagating any values from the existing hash-cache if we see one. The next patch will respect the `pack.writeBitmapHashcache` option while writing MIDX bitmaps, and then test this new behavior. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-14	t/helper/test-bitmap.c: add 'dump-hashes' mode	Taylor Blau	1	-0/+27
	The pack-bitmap writer code is about to learn how to propagate values from an existing hash-cache. To prepare, teach the test-bitmap helper to dump the values from a bitmap's hash-cache extension in order to test those changes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09	pack-bitmap: drop bitmap_index argument from try_partial_reuse()	Jeff King	1	-3/+2
	Starting in commit 0f533c7284 (pack-bitmap: read multi-pack bitmaps, 2021-08-31), we no longer look at the "struct bitmap_index" passed to try_partial_reuse(). This is because we only handle verbatim reuse from a single pack: either the pack whose bitmap we're looking at, or the "preferred" pack of a midx bitmap. And thus the primary item we look at is the "pack" parameter added by that same commit, and not the bitmap_git->pack parameter (which would be NULL for a midx bitmap). It's our caller, reuse_partial_packfile_from_bitmap(), which decides which pack to use and passes it in to us. Drop the unused parameter to prevent confusion. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09	pack-bitmap: drop repository argument from prepare_midx_bitmap_git()	Jeff King	1	-2/+1
	We never look at the repository argument which is passed. This makes sense, since the multi_pack_index struct already tells us everything we need to access the files in its associated object directory. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01	pack-bitmap: read multi-pack bitmaps	Taylor Blau	1	-38/+319
	This prepares the code in pack-bitmap to interpret the new multi-pack bitmaps described in Documentation/technical/bitmap-format.txt, which mostly involves converting bit positions to accommodate looking them up in a MIDX. Note that there are currently no writers who write multi-pack bitmaps, and that this will be implemented in the subsequent commit. Note also that get_midx_checksum() and get_midx_filename() are made non-static so they can be called from pack-bitmap.c. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01	pack-bitmap.c: avoid redundant calls to try_partial_reuse	Taylor Blau	1	-11/+29
	try_partial_reuse() is used to mark any bits in the beginning of a bitmap whose objects can be reused verbatim from the pack they came from. Currently this function returns void, and signals nothing to the caller when bits could not be reused. But multi-pack bitmaps would benefit from having such a signal, because they may try to pass objects which are in bounds, but from a pack other than the preferred one. Any extra calls are noops because of a conditional in reuse_partial_packfile_from_bitmap(), but those loop iterations can be avoided by letting try_partial_reuse() indicate when it can't accept any more bits for reuse, and then listening to that signal. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01	pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'	Taylor Blau	1	-0/+16
	In a recent commit, pack-objects learned support for the 'pack.preferBitmapTips' configuration. This patch prepares the multi-pack bitmap code to respect this configuration, too. The yet-to-be implemented code will find that it is more efficient to check whether each reference contains a prefix found in the configured set of values rather than doing an additional traversal. Implement a function 'bitmap_is_preferred_refname()' which will perform that check. Its caller will be added in a subsequent patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01	pack-bitmap.c: introduce 'nth_bitmap_object_oid()'	Taylor Blau	1	-3/+10
	A subsequent patch to support reading MIDX bitmaps will be less noisy after extracting a generic function to fetch the nth OID contained in the bitmap. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01	pack-bitmap.c: introduce 'bitmap_num_objects()'	Taylor Blau	1	-16/+21
	A subsequent patch to support reading MIDX bitmaps will be less noisy after extracting a generic function to return how many objects are contained in a bitmap. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24	pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps	Taylor Blau	1	-0/+48
	The special `--test-bitmap` mode of `git rev-list` is used to compare the result of an object traversal with a bitmap to check its integrity. This mode does not, however, assert that the types of reachable objects are stored correctly. Harden this mode by teaching it to also check that each time an object's bit is marked, the corresponding bit should be set in exactly one of the type bitmaps (whose type matches the object's true type). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-02	Merge branch 'jk/check-pack-valid-before-opening-bitmap'	Junio C Hamano	1	-0/+5
	A race between repacking and using pack bitmaps has been corrected. * jk/check-pack-valid-before-opening-bitmap: pack-bitmap: check pack validity when opening bitmap
2021-08-02	Merge branch 'tb/bitmap-type-filter-comment-fix'	Junio C Hamano	1	-5/+6
	In-code comment update. * tb/bitmap-type-filter-comment-fix: pack-bitmap: clarify comment in filter_bitmap_exclude_type()
2021-07-23	pack-bitmap: check pack validity when opening bitmap	Jeff King	1	-0/+5
	When pack-objects adds an entry to its list of objects to pack, it may mark the packfile and offset that contains the file, which we can later use to output the object verbatim. If the packfile is deleted while we are running (e.g., by another process running "git repack"), we may die in use_pack() if the pack file cannot be opened. We worked around this in 4c08018204 (pack-objects: protect against disappearing packs, 2011-10-14) by making sure we can open the pack before recording it as a source. This detects a pack which has already disappeared while generating the packing list, and because we keep the pack's file descriptor (or an mmap window) open, it means we can access it later (unless you exceed core.packedgitlimit). The bitmap code that was added later does not do this; it adds entries to the packlist without checking that the packfile is still valid, and is vulnerable to this race. It needs the same treatment as 4c08018204. However, rather than add it in just that one spot, it makes more sense to simply open and check the packfile when we open the bitmap. Technically you can use the .bitmap without even looking in the .pack file (e.g., if you are just printing a list of objects without accessing them), but it's much simpler to do it early. That covers all later direct uses of the pack (due to the cached descriptor) without having to check each one directly. For example, in pack-objects we need to protect the packlist entries, but we also access the pack directly as part of the reuse_partial_pack_from_bitmap() feature. This patch covers both cases. There's no test here, because the problem is inherently racy. I reproduced and verified the fix with this script: rm -rf parent.git push.git fetch.git push() { ( cd push.git && echo content >>file && git add file && git commit -qm "change $1" && git push -q origin HEAD && echo "push $1..." ) && ( cd parent.git && git repack -ad -q && echo "repack $1..." ) } fetch() { rm -rf fetch.git && git clone -q file://$PWD/parent.git fetch.git && echo "fetch $1..." } git init --bare parent.git && git --git-dir=parent.git config transfer.unpacklimit 1 && git clone parent.git push.git && (for i in `seq 1 1000`; do push $i \|\| break; done) & pusher=$! (for i in `seq 1 1000`; do fetch $i \|\| break; done) & fetcher=$! wait $fetcher kill $pusher That simulates a race between a client cloning and a push triggering a repack on the server. Without this patch, it generally fails within a couple hundred iterations with: remote: fatal: packfile ./objects/pack/.tmp-1377349-pack-498afdec371232bdb99d1757872f5569331da61e.pack cannot be accessed error: git upload-pack: git-pack-objects died with error. fatal: git upload-pack: aborting due to possible repository corruption on the remote side. remote: aborting due to possible repository corruption on the remote side. fatal: early EOF fatal: fetch-pack: invalid index-pack output With this patch, it reliably runs through all thousand attempts. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20	pack-bitmap: clarify comment in filter_bitmap_exclude_type()	Taylor Blau	1	-5/+6
	The code that eventually became filter_bitmap_exclude_type() was originally introduced in 4f3bd5606a (pack-bitmap: implement BLOB_NONE filtering, 2020-02-14) to accelerate BLOB_NONE filters with bitmaps. In 856e12c18a (pack-bitmap.c: make object filtering functions generic, 2020-05-04), it became filter_bitmap_exclude_type(). But not all of the comments were updated to be agnostic to the provided type. Remove the remaining comments which should have been updated in 856e12c18a to reflect the type-agnostic nature of the function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-15	bitmaps: don't recurse into trees already in the bitmap	Jeff King	1	-0/+18
	If an object is already mentioned in a reachability bitmap we are building, then by definition so are all of the objects it can reach. We have an optimization to stop traversing commits when we see they are already in the bitmap, but we don't do the same for trees. It's generally unavoidable to recurse into trees for commits not yet covered by bitmaps (since most commits generally do have unique top-level trees). But they usually have subtrees that are shared with other commits (i.e., all of the subtrees the commit _didn't_ touch). And some of those commits (and their trees) may be covered by the bitmap. Usually this isn't _too_ big a deal, because we'll visit those subtrees only once in total for the whole walk. But if you have a large number of unbitmapped commits, and if your tree is big, then you may end up opening a lot of sub-trees for no good reason. We can use the same optimization we do for commits here: when we are about to open a tree, see if it's in the bitmap (either the one we are building, or the "seen" bitmap which covers the UNINTERESTING side of the bitmap when doing a set-difference). This works especially well because we'll visit all commits before hitting any trees. So even in a history like: A -- B if "A" has a bitmap on disk but "B" doesn't, we'll already have OR-ed in the results from A before looking at B's tree (so we really will only look at trees touched by B). For most repositories, the timings produced by p5310 are unspectacular. Here's linux.git: Test HEAD^ HEAD -------------------------------------------------------------------- 5310.4: simulated clone 6.00(5.90+0.10) 5.98(5.90+0.08) -0.3% 5310.5: simulated fetch 2.98(5.45+0.18) 2.85(5.31+0.18) -4.4% 5310.7: rev-list (commits) 0.32(0.29+0.03) 0.33(0.30+0.03) +3.1% 5310.8: rev-list (objects) 1.48(1.44+0.03) 1.49(1.44+0.05) +0.7% Any improvement there is within the noise (the +3.1% on test 7 has to be noise, since we are not recursing into trees, and thus the new code isn't even run). The results for git.git are likewise uninteresting. But here are numbers from some other real-world repositories (that are not public). This one's tree is comparable in size to linux.git, but has ~16k refs (and so less complete bitmap coverage): Test HEAD^ HEAD ------------------------------------------------------------------------- 5310.4: simulated clone 38.34(39.86+0.74) 33.95(35.53+0.76) -11.5% 5310.5: simulated fetch 2.29(6.31+0.35) 2.20(5.97+0.41) -3.9% 5310.7: rev-list (commits) 0.99(0.86+0.13) 0.96(0.85+0.11) -3.0% 5310.8: rev-list (objects) 11.32(11.04+0.27) 6.59(6.37+0.21) -41.8% And here's another with a very large tree (~340k entries), and a fairly large number of refs (~10k): Test HEAD^ HEAD ------------------------------------------------------------------------- 5310.3: simulated clone 53.83(54.71+1.54) 39.77(40.76+1.50) -26.1% 5310.4: simulated fetch 19.91(20.11+0.56) 19.79(19.98+0.67) -0.6% 5310.6: rev-list (commits) 0.54(0.44+0.11) 0.51(0.43+0.07) -5.6% 5310.7: rev-list (objects) 24.32(23.59+0.73) 9.85(9.49+0.36) -59.5% This patch provides substantial improvements in these larger cases, and have any drawbacks for smaller ones (the cost of the bitmap check is quite small compared to an actual tree traversal). Note that we have to add a version of revision.c's include_check callback which handles non-commits. We could possibly consolidate this into a single callback for all objects types, as there's only one user of the feature which would need converted (pack-bitmap.c:should_include). That would in theory let us avoid duplicating any logic. But when I tried it, the code ended up much worse to read, with lots of repeated "if it's a commit do this, otherwise do that". Having two separate callbacks splits that naturally, and matches the existing split of show_commit/show_object callbacks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-07	Merge branch 'ps/rev-list-object-type-filter'	Junio C Hamano	1	-5/+40
	"git rev-list" learns the "--filter=object:type=<type>" option, which can be used to exclude objects of the given kind from the packfile generated by pack-objects. * ps/rev-list-object-type-filter: rev-list: allow filtering of provided items pack-bitmap: implement combined filter pack-bitmap: implement object type filter list-objects: implement object type filter list-objects: support filtering by tag and commit list-objects: move tag processing into its own function revision: mark commit parents as NOT_USER_GIVEN uploadpack.txt: document implication of `uploadpackfilter.allow`
2021-05-07	Merge branch 'jk/prune-with-bitmap-fix'	Junio C Hamano	1	-0/+3
	When the reachability bitmap is in effect, the "do not lose recently created objects and those that are reachable from them" safety to protect us from races were disabled by mistake, which has been corrected. * jk/prune-with-bitmap-fix: prune: save reachable-from-recent objects with bitmaps pack-bitmap: clean up include_check after use
2021-04-29	pack-bitmap: clean up include_check after use	Jeff King	1	-0/+3
	When a bitmap walk has to traverse (to fill in non-bitmapped objects), we use rev_info's include_check mechanism to let us stop the traversal early. But after setting the function and its data parameter, we never clean it up. This means that if the rev_info is used for a subsequent traversal without bitmaps, it will unexpectedly call into our include_check function (worse, it will do so pointing to a now-defunct stack variable in include_check_data, likely resulting in a segfault). There's no code which does this now, but it's an accident waiting to happen. Let's clean up after ourselves in the bitmap code. Reported-by: David Emett <dave@sp4m.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19	rev-list: allow filtering of provided items	Patrick Steinhardt	1	-2/+4
	When providing an object filter, it is currently impossible to also filter provided items. E.g. when executing `git rev-list HEAD` , the commit this reference points to will be treated as user-provided and is thus excluded from the filtering mechanism. This makes it harder than necessary to properly use the new `--filter=object:type` filter given that even if the user wants to only see blobs, he'll still see commits of provided references. Improve this by introducing a new `--filter-provided-objects` option to the git-rev-parse(1) command. If given, then all user-provided references will be subject to filtering. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19	pack-bitmap: implement combined filter	Patrick Steinhardt	1	-0/+10
	When the user has multiple objects filters specified, then this is internally represented by having a "combined" filter. These combined filters aren't yet supported by bitmap indices and can thus not be accelerated. Fix this by implementing support for these combined filters. The implementation is quite trivial: when there's a combined filter, we simply recurse into `filter_bitmap()` for all of the sub-filters. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19	pack-bitmap: implement object type filter	Patrick Steinhardt	1	-3/+26
	The preceding commit has added a new object filter for git-rev-list(1) which allows to filter objects by type. Implement the equivalent filter for packfile bitmaps so that we can answer these queries fast. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-13	Merge branch 'tb/pack-preferred-tips-to-give-bitmap'	Junio C Hamano	1	-0/+24
	A configuration variable has been added to force tips of certain refs to be given a reachability bitmap. * tb/pack-preferred-tips-to-give-bitmap: builtin/pack-objects.c: respect 'pack.preferBitmapTips' t/helper/test-bitmap.c: initial commit pack-bitmap: add 'test_bitmap_commits()' helper
2021-04-07	Merge branch 'ps/pack-bitmap-optim'	Junio C Hamano	1	-0/+1
	Optimize "rev-list --use-bitmap-index --objects" corner case that uses negative tags as the stopping points. * ps/pack-bitmap-optim: pack-bitmap: avoid traversal of objects referenced by uninteresting tag
2021-03-31	builtin/pack-objects.c: respect 'pack.preferBitmapTips'	Taylor Blau	1	-0/+6
	When writing a new pack with a bitmap, it is sometimes convenient to indicate some reference prefixes which should receive priority when selecting which commits to receive bitmaps. A truly motivated caller could accomplish this by setting 'pack.islandCore', (since all commits in the core island are similarly marked as preferred) but this requires callers to opt into using delta islands, which they may or may not want to do. Introduce a new multi-valued configuration, 'pack.preferBitmapTips' to allow callers to specify a list of reference prefixes. All references which have a prefix contained in 'pack.preferBitmapTips' will mark their tips as "preferred" in the same way as commits are marked as preferred for selection by 'pack.islandCore'. The choice of the verb "prefer" is intentional: marking the NEEDS_BITMAP flag on an object does not guarantee that that object will receive a bitmap. It merely guarantees that that commit will receive a bitmap over any other commit in the same window by bitmap_writer_select_commits(). The test this patch adds reflects this quirk, too. It only tests that a commit (which didn't receive bitmaps by default) is selected for bitmaps after changing the value of 'pack.preferBitmapTips' to include it. Other commits may lose their bitmaps as a byproduct of how the selection process works (bitmap_writer_select_commits() ignores the remainder of a window after seeing a commit with the NEEDS_BITMAP flag). This configuration will aide in selecting important references for multi-pack bitmaps, since they do not respect the same pack.islandCore configuration. (They could, but doing so may be confusing, since it is packs--not bitmaps--which are influenced by the delta-islands configuration). In a fork network repository (one which lists all forks of a given repository as remotes), for example, it is useful to set pack.preferBitmapTips to 'refs/remotes/<root>/heads' and 'refs/remotes/<root>/tags', where '<root>' is an opaque identifier referring to the repository which is at the base of the fork chain. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-31	pack-bitmap: add 'test_bitmap_commits()' helper	Taylor Blau	1	-0/+18
	The next patch will add a 'bitmap' test-tool which prints the list of commits that have bitmaps computed. The test helper could implement this itself, but it would need access to the 'bitmaps' field of the 'pack_bitmap' struct. To avoid exposing this private detail, implement the entirety of the helper behind a test_bitmap_commits() function in pack-bitmap.c. There is some precedence for this with test_bitmap_walk() which is used to implement the '--test-bitmap' flag in 'git rev-list' (and is also implemented in pack-bitmap.c). A caller will be added in the next patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-22	pack-bitmap: avoid traversal of objects referenced by uninteresting tag	Patrick Steinhardt	1	-0/+1
	When preparing the bitmap walk, we first establish the set of of have and want objects by iterating over the set of pending objects: if an object is marked as uninteresting, it's declared as an object we already have, otherwise as an object we want. These two sets are then used to compute which transitively referenced objects we need to obtain. One special case here are tag objects: when a tag is requested, we resolve it to its first not-tag object and add both resolved objects as well as the tag itself into either the have or want set. Given that the uninteresting-property always propagates to referenced objects, it is clear that if the tag is uninteresting, so are its children and vice versa. But we fail to propagate the flag, which effectively means that referenced objects will always be interesting except for the case where they have already been marked as uninteresting explicitly. This mislabeling does not impact correctness: we now have it in our "wants" set, and given that we later do an `AND NOT` of the bitmaps of "wants" and "haves" sets it is clear that the result must be the same. But we now start to needlessly traverse the tag's referenced objects in case it is uninteresting, even though we know that each referenced object will be uninteresting anyway. In the worst case, this can lead to a complete graph walk just to establish that we do not care for any object. Fix the issue by propagating the `UNINTERESTING` flag to pointees of tag objects and add a benchmark with negative revisions to p5310. This shows some nice performance benefits, tested with linux.git: Test HEAD~ HEAD --------------------------------------------------------------------------------------------------------------- 5310.3: repack to disk 193.18(181.46+16.42) 194.61(183.41+15.83) +0.7% 5310.4: simulated clone 25.93(24.88+1.05) 25.81(24.73+1.08) -0.5% 5310.5: simulated fetch 2.64(5.30+0.69) 2.59(5.16+0.65) -1.9% 5310.6: pack to file (bitmap) 58.75(57.56+6.30) 58.29(57.61+5.73) -0.8% 5310.7: rev-list (commits) 1.45(1.18+0.26) 1.46(1.22+0.24) +0.7% 5310.8: rev-list (objects) 15.35(14.22+1.13) 15.30(14.23+1.07) -0.3% 5310.9: rev-list with tag negated via --not --all (objects) 22.49(20.93+1.56) 0.11(0.09+0.01) -99.5% 5310.10: rev-list with negative tag (objects) 0.61(0.44+0.16) 0.51(0.35+0.16) -16.4% 5310.11: rev-list count with blob:none 12.15(11.19+0.96) 12.18(11.19+0.99) +0.2% 5310.12: rev-list count with blob:limit=1k 17.77(15.71+2.06) 17.75(15.63+2.12) -0.1% 5310.13: rev-list count with tree:0 1.69(1.31+0.38) 1.68(1.28+0.39) -0.6% 5310.14: simulated partial clone 20.14(19.15+0.98) 19.98(18.93+1.05) -0.8% 5310.16: clone (partial bitmap) 12.78(13.89+1.07) 12.72(13.99+1.01) -0.5% 5310.17: pack to file (partial bitmap) 42.07(45.44+2.72) 41.44(44.66+2.80) -1.5% 5310.18: rev-list with tree filter (partial bitmap) 0.44(0.29+0.15) 0.46(0.32+0.14) +4.5% While most benchmarks are probably in the range of noise, the newly added 5310.9 and 5310.10 benchmarks consistenly perform better. Signed-off-by: Patrick Steinhardt <ps@pks.im>. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13	use CALLOC_ARRAY	René Scharfe	1	-2/+2
	Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-11	rev-list: add --disk-usage option for calculating disk usage	Jeff King	1	-0/+81
	It can sometimes be useful to see which refs are contributing to the overall repository size (e.g., does some branch have a bunch of objects not found elsewhere in history, which indicates that deleting it would shrink the size of a clone). You can find that out by generating a list of objects, getting their sizes from cat-file, and then summing them, like: git rev-list --objects --no-object-names main..branch git cat-file --batch-check='%(objectsize:disk)' \| perl -lne '$total += $_; END { print $total }' Though note that the caveats from git-cat-file(1) apply here. We "blame" base objects more than their deltas, even though the relationship could easily be flipped. Still, it can be a useful rough measure. But one problem is that it's slow to run. Teaching rev-list to sum up the sizes can be much faster for two reasons: 1. It skips all of the piping of object names and sizes. 2. If bitmaps are in use, for objects that are in the bitmapped packfile we can skip the oid_object_info() lookup entirely, and just ask the revindex for the on-disk size. This patch implements a --disk-usage option which produces the same answer in a fraction of the time. Here are some timings using a clone of torvalds/linux: [rev-list piped to cat-file, no bitmaps] $ time git rev-list --objects --no-object-names --all \| git cat-file --buffer --batch-check='%(objectsize:disk)' \| perl -lne '$total += $_; END { print $total }' 1459938510 real 0m29.635s user 0m38.003s sys 0m1.093s [internal, no bitmaps] $ time git rev-list --disk-usage --objects --all 1459938510 real 0m31.262s user 0m30.885s sys 0m0.376s Even though the wall-clock time is slightly worse due to parallelism, notice the CPU savings between the two. We saved 21% of the CPU just by avoiding the pipes. But the real win is with bitmaps. If we use them without the new option: [rev-list piped to cat-file, bitmaps] $ time git rev-list --objects --no-object-names --all --use-bitmap-index \| git cat-file --batch-check='%(objectsize:disk)' \| perl -lne '$total += $_; END { print $total }' 1459938510 real 0m6.244s user 0m8.452s sys 0m0.311s then we're faster to generate the list of objects, but we still spend a lot of time piping and looking things up. But if we do both together: [internal, bitmaps] $ time git rev-list --disk-usage --objects --all --use-bitmap-index 1459938510 real 0m0.219s user 0m0.169s sys 0m0.049s then we get the same answer much faster. For "--all", that answer will correspond closely to "du objects/pack", of course. But we're actually checking reachability here, so we're still fast when we ask for more interesting things: $ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10 374798628 real 0m0.429s user 0m0.356s sys 0m0.072s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13	rebuild_existing_bitmaps(): convert to new revindex API	Taylor Blau	1	-3/+2
	Remove another instance of looking at the revindex directly by instead calling 'pack_pos_to_index()'. Unlike other patches, this caller only cares about the index position of each object in the loop. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13	try_partial_reuse(): convert to new revindex API	Taylor Blau	1	-8/+5
	Remove another instance of direct revindex manipulation by calling 'pack_pos_to_offset()' instead (the caller here does not care about the index position of the object at position 'pos'). Note that we cannot just use the existing "offset" variable to store the value we get from pack_pos_to_offset(). It is incremented by unpack_object_header(), but we later need the original value. Since we'll no longer have revindex->offset to read it from, we'll store that in a separate variable ("header" since it points to the entry's header bytes). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13	get_size_by_pos(): convert to new revindex API	Taylor Blau	1	-4/+4
	Remove another caller that holds onto a 'struct revindex_entry' by replacing the direct indexing with calls to 'pack_pos_to_offset()' and 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13	show_objects_for_type(): convert to new revindex API	Taylor Blau	1	-6/+7
	Avoid storing the revindex entry directly, since this structure will soon be removed from the public interface. Instead, store the offset and index position by calling 'pack_pos_to_offset()' and 'pack_pos_to_index()', respectively. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-13	bitmap_position_packfile(): convert to new revindex API	Taylor Blau	1	-1/+4
	Replace find_revindex_position() with its counterpart in the new API, offset_to_pack_pos(). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08	pack-bitmap: factor out 'add_commit_to_bitmap()'	Taylor Blau	1	-15/+21
	'find_objects()' currently needs to interact with the bitmaps khash pretty closely. To make 'find_objects()' read a little more straightforwardly, remove some of the khash-level details into a new function that describes what it does: 'add_commit_to_bitmap()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08	pack-bitmap: factor out 'bitmap_for_commit()'	Taylor Blau	1	-14/+19
	A couple of callers within pack-bitmap.c duplicate logic to lookup a given object id in the bitamps khash. Factor this out into a new function, 'bitmap_for_commit()' to reduce some code duplication. Make this new function non-static, since it will be used in later commits from outside of pack-bitmap.c. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08	pack-bitmap-write: ignore BITMAP_FLAG_REUSE	Jeff King	1	-40/+6
	The on-disk bitmap format has a flag to mark a bitmap to be "reused". This is a rather curious feature, and works like this: - a run of pack-objects would decide to mark the last 80% of the bitmaps it generates with the reuse flag - the next time we generate bitmaps, we'd see those reuse flags from the last run, and mark those commits as special: - we'd be more likely to select those commits to get bitmaps in the new output - when generating the bitmap for a selected commit, we'd reuse the old bitmap as-is (rearranging the bits to match the new pack, of course) However, neither of these behaviors particularly makes sense. Just because a commit happened to be bitmapped last time does not make it a good candidate for having a bitmap this time. In particular, we may choose bitmaps based on how recent they are in history, or whether a ref tip points to them, and those things will change. We're better off re-considering fresh which commits are good candidates. Reusing the existing bitmap _is_ a reasonable thing to do to save computation. But only reusing exact bitmaps is a weak form of this. If we have an old bitmap for A and now want a new bitmap for its child, we should be able to compute that only by looking at trees and that are new to the child. But this code would consider only exact reuse (which is perhaps why it was eager to select those commits in the first place). Furthermore, the recent switch to the reverse-edge algorithm for generating bitmaps dropped this optimization entirely (and yet still performs better). So let's do a few cleanups: - drop the whole "reusing bitmaps" phase of generating bitmaps. It's not helping anything, and is mostly unused code (or worse, code that is using CPU but not doing anything useful) - drop the use of the on-disk reuse flag to select commits to bitmap - stop setting the on-disk reuse flag in bitmaps we generate (since nothing respects it anymore) We will keep a few innards of the reuse code, which will help us implement a more capable version of the "reuse" optimization: - simplify rebuild_existing_bitmaps() into a function that only builds the mapping of bits between the old and new orders, but doesn't actually convert any bitmaps - make rebuild_bitmap() public; we'll call it lazily to convert bitmaps as we traverse (using the mapping created above) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08	pack-bitmap.c: check reads more aggressively when loading	Taylor Blau	1	-1/+6
	Before 'load_bitmap_entries_v1()' reads an actual EWAH bitmap, it should check that it can safely do so by ensuring that there are at least 6 bytes available to be read (four for the commit's index position, and then two more for the xor offset and flags, respectively). Likewise, it should check that the commit index it read refers to a legitimate object in the pack. The first fix catches a truncation bug that was exposed when testing, and the second is purely precautionary. There are some possible future improvements, not pursued here. They are: - Computing the correct boundary of the bitmap itself in the caller and ensuring that we don't read past it. This may or may not be worth it, since in a truncation situation, all bets are off: (is the trailer still there and the bitmap entries malformed, or is the trailer truncated?). The best we can do is try to read what's there as if it's correct data (and protect ourselves when it's obviously bogus). - Avoid the magic "6" by teaching read_be32() and read_u8() (both of which are custom helpers for this function) to check sizes before advancing the pointers. - Adding more tests in this area. Testing these truncation situations are remarkably fragile to even subtle changes in the bitmap generation. So, the resulting tests are likely to be quite brittle. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08	rev-list: die when --test-bitmap detects a mismatch	Jeff King	1	-1/+1
	You can use "git rev-list --test-bitmap HEAD" to check that bitmaps produce the same answer we'd get from a regular traversal. But if we detect an error, we only print "mismatch", and still exit with a successful error code. That makes the uses of --test-bitmap in the test suite (e.g., in t5310) mostly pointless: even if we saw an error, the tests wouldn't notice. Let's instead call die(), which will let these tests work as designed, and alert us if the bitmaps are bogus. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08	pack-bitmap: bounds-check size of cache extension	Jeff King	1	-2/+6
	A .bitmap file may have a "name hash cache" extension, which puts a sequence of uint32_t values (one per object) at the end of the file. When we see a flag indicating this extension, we blindly subtract the appropriate number of bytes from our available length. However, if the .bitmap file is too short, we'll underflow our length variable and wrap around, thinking we have a very large length. This can lead to reading out-of-bounds bytes while loading individual ewah bitmaps. We can fix this by checking the number of available bytes when we parse the header. The existing "truncated bitmap" test is now split into two tests: one where we don't have this extension at all (and hence actually do try to read a truncated ewah bitmap) and one where we realize up-front that we can't even fit in the cache structure. We'll check stderr in each case to make sure we hit the error we're expecting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08	pack-bitmap: fix header size check	Jeff King	1	-3/+4
	When we parse a .bitmap header, we first check that we have enough bytes to make a valid header. We do that based on sizeof(struct bitmap_disk_header). However, as of 0f4d6cada8 (pack-bitmap: make bitmap header handling hash agnostic, 2019-02-19), that struct oversizes its checksum member to GIT_MAX_RAWSZ. That means we need to adjust for the difference between that constant and the size of the actual hash we're using. That commit adjusted the code which moves our pointer forward, but forgot to update the size check. This meant we were overly strict about the header size (requiring room for a 32-byte worst-case hash, when sha1 is only 20 bytes). But in practice it didn't matter because bitmap files tend to have at least 12 bytes of actual data anyway, so it was unlikely for a valid file to be caught by this. Let's fix it by pulling the header size into a separate variable and using it in both spots. That fixes the bug and simplifies the code to make it harder to have a mismatch like this in the future. It will also come in handy in the next patch for more bounds checking. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-04	pack-bitmap: pass object filter to fill-in traversal	Jeff King	1	-5/+9
	Sometimes a bitmap traversal still has to walk some commits manually, because those commits aren't included in the bitmap packfile (e.g., due to a push or commit since the last full repack). If we're given an object filter, we don't pass it down to this traversal. It's not necessary for correctness because the bitmap code has its own filters to post-process the bitmap result (which it must, to filter out the objects that _are_ mentioned in the bitmapped packfile). And with blob filters, there was no performance reason to pass along those filters, either. The fill-in traversal could omit them from the result, but it wouldn't save us any time to do so, since we'd still have to walk each tree entry to see if it's a blob or not. But now that we support tree filters, there's opportunity for savings. A tree:depth=0 filter means we can avoid accessing trees entirely, since we know we won't them (or any of the subtrees or blobs they point to). The new test in p5310 shows this off (the "partial bitmap" state is one where HEAD~100 and its ancestors are all in a bitmapped pack, but HEAD~100..HEAD are not). Here are the results (run against linux.git): Test HEAD^ HEAD ------------------------------------------------------------------------------------------------- [...] 5310.16: rev-list with tree filter (partial bitmap) 0.19(0.17+0.02) 0.03(0.02+0.01) -84.2% The absolute number of savings isn't _huge_, but keep in mind that we only omitted 100 first-parent links (in the version of linux.git here, that's 894 actual commits). In a more pathological case, we might have a much larger proportion of non-bitmapped commits. I didn't bother creating such a case in the perf script because the setup is expensive, and this is plenty to show the savings as a percentage. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-04	pack-bitmap.c: support 'tree:0' filtering	Taylor Blau	1	-1/+24
	In the previous patch, we made it easy to define other filters that exclude all objects of a certain type. Use that in order to implement bitmap-level filtering for the '--filter=tree:<n>' filter when 'n' is equal to 0. The general case is not helped by bitmaps, since for values of 'n > 0', the object filtering machinery requires a full-blown tree traversal in order to determine the depth of a given tree. Caching this is non-obvious, too, since the same tree object can have a different depth depending on the context (e.g., a tree was moved up in the directory hierarchy between two commits). But, the 'n = 0' case can be helped, and this patch does so. Running p5310.11 in this tree and on master with the kernel, we can see that this case is helped substantially: Test master this tree -------------------------------------------------------------------------------- 5310.11: rev-list count with tree:0 10.68(10.39+0.27) 0.06(0.04+0.01) -99.4% Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-04	pack-bitmap.c: make object filtering functions generic	Taylor Blau	1	-11/+24
	In 4f3bd5606a (pack-bitmap: implement BLOB_NONE filtering, 2020-02-14), filtering support for bitmaps was added for the 'LOFC_BLOB_NONE' filter. In the future, we would like to add support for filters that behave as if they exclude a certain type of object, for e.g., the tree depth filter with depth 0. To prepare for this, make some of the functions used for filtering more generic, such as 'find_tip_blobs' and 'filter_bitmap_blob_none' so that they can work over arbitrary object types. To that end, create 'find_tip_objects' and 'filter_bitmap_exclude_type', and redefine the aforementioned functions in terms of those. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-05	Merge branch 'jk/nth-packed-object-id'	Junio C Hamano	1	-9/+9
	Code cleanup to use "struct object_id" more by replacing use of "char sha1" jk/nth-packed-object-id: packfile: drop nth_packed_object_sha1() packed_object_info(): use object_id internally for delta base packed_object_info(): use object_id for returning delta base pack-check: push oid lookup into loop pack-check: convert "internal error" die to a BUG() pack-bitmap: use object_id when loading on-disk bitmaps pack-objects: use object_id struct in pack-reuse code pack-objects: convert oe_set_delta_ext() to use object_id pack-objects: read delta base oid into object_id struct nth_packed_object_oid(): use customary integer return
2020-03-02	Merge branch 'jk/object-filter-with-bitmap'	Junio C Hamano	1	-33/+239
	The object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal. There however are some cases where they can work together, and they were taught about them. * jk/object-filter-with-bitmap: rev-list --count: comment on the use of count_right++ pack-objects: support filters with bitmaps pack-bitmap: implement BLOB_LIMIT filtering pack-bitmap: implement BLOB_NONE filtering bitmap: add bitmap_unset() function rev-list: use bitmap filters for traversal pack-bitmap: basic noop bitmap filter infrastructure rev-list: allow commit-only bitmap traversals t5310: factor out bitmap traversal comparison rev-list: allow bitmaps when counting objects rev-list: make --count work with --objects rev-list: factor out bitmap-optimized routines pack-bitmap: refuse to do a bitmap traversal with pathspecs rev-list: fallback to non-bitmap traversal when filtering pack-bitmap: fix leak of haves/wants object lists pack-bitmap: factor out type iterator initialization
2020-02-24	pack-bitmap: use object_id when loading on-disk bitmaps	Jeff King	1	-6/+6
	A pack bitmap file contains the index position of the commit for each bitmap, which we then translate into an object id via nth_packed_object_sha1(). In preparation for that function going away, we can switch to the more type-safe nth_packed_object_id(). Note that even though the result ends up in an object_id this does incur an extra copy of the hash (into our temporary object_id, and then into the final malloc'd stored_bitmap struct). This shouldn't make any measurable difference. If it did, we could avoid this copy _and_ the copy of the rest of the items by allocating the stored_bitmap struct beforehand and reading directly into it from the bitmap file. Or better still, if this is a bottleneck, we could introduce an on-disk index to the bitmap file so we don't have to read every single entry to use just one of them. So it's not worth worrying about micro-optimizing out this one hash copy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24	nth_packed_object_oid(): use customary integer return	Jeff King	1	-2/+2
	Our nth_packed_object_sha1() function returns NULL for error. So when we wrapped it with nth_packed_object_oid(), we kept the same semantics. But it's a bit funny, because the caller actually passes in an out parameter, and the pointer we return is just that same struct they passed to us (or NULL). It's not too terrible, but it does make the interface a little non-idiomatic. Let's switch to our usual "0 for success, negative for error" return value. Most callers either don't check it, or are trivially converted. The one that requires the biggest change is actually improved, as we can ditch an extra aliased pointer variable. Since we are changing the interface in a subtle way that the compiler wouldn't catch, let's also change the name to catch any topics in flight. We can drop the 'o' and make it nth_packed_object_id(). That's slightly shorter, but also less redundant since the 'o' stands for "object" already. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14	Merge branch 'jk/packfile-reuse-cleanup'	Junio C Hamano	1	-63/+129
	The way "git pack-objects" reuses objects stored in existing pack to generate its result has been improved. * jk/packfile-reuse-cleanup: pack-bitmap: don't rely on bitmap_git->reuse_objects pack-objects: add checks for duplicate objects pack-objects: improve partial packfile reuse builtin/pack-objects: introduce obj_is_packed() pack-objects: introduce pack.allowPackReuse csum-file: introduce hashfile_total() pack-bitmap: simplify bitmap_has_oid_in_uninteresting() pack-bitmap: uninteresting oid can be outside bitmapped packfile pack-bitmap: introduce bitmap_walk_contains() ewah/bitmap: introduce bitmap_word_alloc() packfile: expose get_delta_base() builtin/pack-objects: report reused packfile objects
2020-02-14	pack-bitmap: implement BLOB_LIMIT filtering	Jeff King	1	-0/+80
	Just as the previous commit implemented BLOB_NONE, we can support BLOB_LIMIT filters by looking at the sizes of any blobs in the result and unsetting their bits as appropriate. This is slightly more expensive than BLOB_NONE, but still produces a noticeable speedup (these results are on git.git): Test HEAD~2 HEAD ------------------------------------------------------------------------------------ 5310.9: rev-list count with blob:none 1.80(1.77+0.02) 0.22(0.20+0.02) -87.8% 5310.10: rev-list count with blob:limit=1k 1.99(1.96+0.03) 0.29(0.25+0.03) -85.4% The implementation is similar to the BLOB_NONE one, with the exception that we have to go object-by-object while walking the blob-type bitmap (since we can't mask out the matches, but must look up the size individually for each blob). The trick with using ctz64() is taken from show_objects_for_type(), which likewise needs to find individual bits (but wants to quickly skip over big chunks without blobs). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14	pack-bitmap: implement BLOB_NONE filtering	Jeff King	1	-0/+74
	We can easily support BLOB_NONE filters with bitmaps. Since we know the types of all of the objects, we just need to clear the result bits of any blobs. Note two subtleties in the implementation (which I also called out in comments): - we have to include any blobs that were specifically asked for (and not reached through graph traversal) to match the non-bitmap version - we have to handle in-pack and "ext_index" objects separately. Arguably prepare_bitmap_walk() could be adding these ext_index objects to the type bitmaps. But it doesn't for now, so let's match the rest of the bitmap code here (it probably wouldn't be an efficiency improvement to do so since the cost of extending those bitmaps is about the same as our loop here, but it might make the code a bit simpler). Here are perf results for the new test on git.git: Test HEAD^ HEAD -------------------------------------------------------------------------------- 5310.9: rev-list count with blob:none 1.67(1.62+0.05) 0.22(0.21+0.02) -86.8% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14	pack-bitmap: basic noop bitmap filter infrastructure	Jeff King	1	-1/+25
	Currently you can't use object filters with bitmaps, but we plan to support at least some filters with bitmaps. Let's introduce some infrastructure that will help us do that: - prepare_bitmap_walk() now accepts a list_objects_filter_options parameter (which can be NULL for no filtering; all the current callers pass this) - we'll bail early if the filter is incompatible with bitmaps (just as we would if there were no bitmaps at all). Currently all filters are incompatible. - we'll filter the resulting bitmap; since there are no supported filters yet, this is always a noop. There should be no behavior change yet, but we'll support some actual filters in a future patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14	rev-list: allow commit-only bitmap traversals	Jeff King	1	-5/+15
	Ever since we added reachability bitmap support, we've been able to use it with rev-list to get the full list of objects, like: git rev-list --objects --use-bitmap-index --all But you can't do so without --objects, since we weren't ready to just show the commits. However, the internals of the bitmap code are mostly ready for this: they avoid opening up trees when walking to fill in the bitmaps. We just need to actually pass in the rev_info to traverse_bitmap_commit_list() so it knows which types to bother triggering our callback for. For completeness, the perf test now covers both the existing --objects case, as well as the new commits-only behavior (the objects one got way faster when we introduced bitmaps, but obviously isn't improved now). Here are numbers for linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8% 5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0% That run was cheating a little, as I didn't have any commit-graph in the repository, and we'd built it by default these days when running git-gc. Here are numbers with a commit-graph: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1% 5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1% Still an improvement, but a lot less impressive. We could have the perf script remove any commit-graph to show the out-sized effect, but it probably makes sense to leave it in what would be a more typical setup. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14	pack-bitmap: refuse to do a bitmap traversal with pathspecs	Jeff King	1	-1/+11
	rev-list has refused to use bitmaps with pathspec limiting since c8a70d3509 (rev-list: disable --use-bitmap-index when pruning commits, 2015-07-01). But this is true not just for rev-list, but for anyone who calls prepare_bitmap_walk(); the code isn't equipped to handle this case. We never noticed because the only other callers would never pass a pathspec limiter. But let's push the check down into prepare_bitmap_walk() anyway. That's a more logical place for it to live, as callers shouldn't need to know the details (and must be prepared to fall back to a regular traversal anyway, since there might not be bitmaps in the repository). It would also prepare us for a day where this case _is_ handled, but that's pretty unlikely. E.g., we could use bitmaps to generate the set of commits, and then diff each commit to see if it matches the pathspec. That would be slightly faster than a naive traversal that actually walks the commits. But you'd probably do better still to make use of the newer commit-graph feature to make walking the commits very cheap. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-13	pack-bitmap: fix leak of haves/wants object lists	Jeff King	1	-0/+5
	When we do a bitmap-aware revision traversal, we create an object_list for each of the "haves" and "wants" tips. After creating the result bitmaps these are no longer needed or used, but we never free the list memory. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-13	pack-bitmap: factor out type iterator initialization	Jeff King	1	-30/+33
	When count_object_type() wants to iterate over the bitmap of all objects of a certain type, we have to pair up OBJ_COMMIT with bitmap->commits, and so forth. Since we're about to add more code to iterate over these bitmaps, let's pull the initialization into its own function. We can also use this to simplify traverse_bitmap_commit_list(). It accomplishes the same thing by manually passing the object type and the bitmap to show_objects_for_type(), but using our helper we just need the object type. Note there's one small code change here: previously we'd simply return zero when counting an unknown object type, and now we'll BUG(). This shouldn't matter in practice, as all of the callers pass in only usual commit/tree/blob/tag types. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23	pack-bitmap: don't rely on bitmap_git->reuse_objects	Jeff King	1	-11/+7
	We no longer compute bitmap_git->reuse_objects, so we cannot rely on it anymore to terminate the loop early; we have to iterate to the end. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23	pack-objects: improve partial packfile reuse	Jeff King	1	-41/+109
	The old code to reuse deltas from an existing packfile just tried to dump a whole segment of the pack verbatim. That's faster than the traditional way of actually adding objects to the packing list, but it didn't kick in very often. This new code is really going for a middle ground: do _some_ per-object work, but way less than we'd traditionally do. The general strategy of the new code is to make a bitmap of objects from the packfile we'll include, and then iterate over it, writing out each object exactly as it is in our on-disk pack, but _not_ adding it to our packlist (which costs memory, and increases the search space for deltas). One complication is that if we're omitting some objects, we can't set a delta against a base that we're not sending. So we have to check each object in try_partial_reuse() to make sure we have its delta. About performance, in the worst case we might have interleaved objects that we are sending or not sending, and we'd have as many chunks as objects. But in practice we send big chunks. For instance, packing torvalds/linux on GitHub servers now reused 6.5M objects, but only needed ~50k chunks. Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23	pack-bitmap: simplify bitmap_has_oid_in_uninteresting()	Jeff King	1	-12/+2
	Let's refactor bitmap_has_oid_in_uninteresting() using bitmap_walk_contains(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>