aboutsummaryrefslogtreecommitdiffstats
path: root/t/helper
AgeCommit message (Collapse)AuthorFilesLines
2025-12-05Merge branch 'ps/object-source-management'Junio C Hamano1-14/+2
Code refactoring around object database sources. * ps/object-source-management: odb: handle recreation of quarantine directories odb: handle changing a repository's commondir chdir-notify: add function to unregister listeners odb: handle initialization of sources in `odb_new()` http-push: stop setting up `the_repository` for each reference t/helper: stop setting up `the_repository` repeatedly builtin/index-pack: fix deferred fsck outside repos oidset: introduce `oidset_equal()` odb: move logic to disable ref updates into repo odb: refactor `odb_clear()` to `odb_free()` odb: adopt logic to close object databases setup: convert `set_git_dir()` to have file scope path: move `enter_repo()` into "setup.c"
2025-11-26Merge branch 'jk/test-mktemp-leakfix'Junio C Hamano1-1/+7
Test leakfix. * jk/test-mktemp-leakfix: test-mktemp: plug memory and descriptor leaks
2025-11-25t/helper: stop setting up `the_repository` repeatedlyPatrick Steinhardt1-14/+2
The "repository" test helper sets up `the_repository` twice. In fact though, we don't even have to set it up even once: all we need is to set up its hash algorithm, because we still depend on some subsystems that aren't free of `the_repository`. Refactor the code accordingly. This prepares for a subsequent change, where setting up the repository repeatedly will lead to a `BUG()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-19Merge branch 'ps/ref-peeled-tags'Junio C Hamano2-4/+3
Some ref backend storage can hold not just the object name of an annotated tag, but the object name of the object the tag points at. The code to handle this information has been streamlined. * ps/ref-peeled-tags: t7004: do not chdir around in the main process ref-filter: fix stale parsed objects ref-filter: parse objects on demand ref-filter: detect broken tags when dereferencing them refs: don't store peeled object IDs for invalid tags object: add flag to `peel_object()` to verify object type refs: drop infrastructure to peel via iterators refs: drop `current_ref_iter` hack builtin/show-ref: convert to use `reference_get_peeled_oid()` ref-filter: propagate peeled object ID upload-pack: convert to use `reference_get_peeled_oid()` refs: expose peeled object ID via the iterator refs: refactor reference status flags refs: fully reset `struct ref_iterator::ref` on iteration refs: introduce `.ref` field for the base iterator refs: introduce wrapper struct for `each_ref_fn`
2025-11-18test-mktemp: plug memory and descriptor leaksJeff King1-1/+7
We test xmkstemp() in our helper by just calling: xmkstemp(xstrdup(argv[1])); This leaks both the copied string as well as the descriptor returned by the function. In practice this isn't a big deal, since we immediately exit the program, but: 1. LSan will complain about the memory leak. The only reason we did not notice this in our leak-checking builds is that both of the callers in the test suite (both in t0070) pass a broken template (and expect failure). So the function calls die() before we can actually leak. But it's an accident waiting to happen if anybody adds a call which succeeds. 2. Coverity complains about the descriptor leak. There's a long list of uninteresting or false positives in Coverity's results, but since we're here we might as well fix it, too. I didn't bother adding a new test that triggers the leak. It's not even in real production code, but just in the test-helper itself. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04Merge branch 'jk/test-delete-gpgsig-leakfix'Junio C Hamano1-3/+4
Leakfix. * jk/test-delete-gpgsig-leakfix: test-tool: fix leak in delete-gpgsig command
2025-11-04object: add flag to `peel_object()` to verify object typePatrick Steinhardt1-1/+1
When peeling a tag to a non-tag object we repeatedly call `parse_object()` on the tagged object until we find the first object that isn't a tag. While this feels sensible at first, there is a big catch here: `parse_object()` doesn't actually verify the type of the tagged object. The relevant code path here eventually ends up in `parse_tag_buffer()`. Here, we parse the various fields of the tag, including the "type". Once we've figured out the type and the tagged object ID, we call one of the `lookup_${type}()` functions for whatever type we have found. There is two possible outcomes in the successful case: 1. The object is already part of our cached objects. In that case we double-check whether the type we're trying to look up matches the type that was cached. 2. The object is _not_ part of our cached objects. In that case, we simply create a new object with the expected type, but we don't parse that object. In the first case we might notice type mismatches, but only in the case where our cache has the object with the correct type. In the second case, we'll blindly assume that the type is correct and then go with it. We'll only notice that the type might be wrong when we try to parse the object at a later point. Now arguably, we could change `parse_tag_buffer()` to verify the tagged object's type for us. But that would have the effect that such a tag cannot be parsed at all anymore, and we have a small bunch of tests for exactly this case that assert we still can open such tags. So this change does not feel like something we can retroactively tighten, even though one shouldn't ever hit such corrupted tags. Instead, add a new `flags` field to `peel_object()` that allows the caller to opt in to strict object verification. This will be wired up at a subset of callsites over the next few commits. Note that this change also inlines `deref_tag_noverify()`. There's only been two callsites of that function, the one we're changing and one in our test helpers. The latter callsite can trivially use `deref_tag()` instead, so by inlining the function we avoid having to pass down the flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt1-3/+2
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30Merge branch 'ps/remove-packfile-store-get-packs'Junio C Hamano2-2/+3
Two slightly different ways to get at "all the packfiles" in API has been cleaned up. * ps/remove-packfile-store-get-packs: packfile: rename `packfile_store_get_all_packs()` packfile: introduce macro to iterate through packs packfile: drop `packfile_store_get_packs()` builtin/grep: simplify how we preload packs builtin/gc: convert to use `packfile_store_get_all_packs()` object-name: convert to use `packfile_store_get_all_packs()`
2025-10-29test-tool: fix leak in delete-gpgsig commandJeff King1-3/+4
We read the input into a strbuf, so we must free it. Without this, t1016 complains in SANITIZE=leak mode. The bug was introduced in 7673ecd2dc (t1016-compatObjectFormat: add tests to verify the conversion between objects, 2023-10-01). But nobody seems to have noticed, probably because CI did not run these tests until the fix in 6cd8369ef3 (t/lib-gpg: call prepare_gnupghome() in GPG2 prereq, 2024-07-03). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16packfile: introduce macro to iterate through packsPatrick Steinhardt2-2/+3
We have a bunch of different sites that want to iterate through all packs of a given `struct packfile_store`. This pattern is somewhat verbose and repetitive, which makes it somewhat cumbersome. Introduce a new macro `repo_for_each_pack()` that removes some of the boilerplate. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-07Merge branch 'ps/odb-clean-stale-wrappers'Junio C Hamano1-6/+4
Code clean-up. * ps/odb-clean-stale-wrappers: odb: drop deprecated wrapper functions
2025-09-24packfile: refactor `get_all_packs()` to work on packfile storePatrick Steinhardt2-2/+2
The `get_all_packs()` function prepares the packfile store and then returns its packfiles. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-11odb: drop deprecated wrapper functionsPatrick Steinhardt1-6/+4
In the Git 2.51 release cycle we've refactored the object database layer to access objects via `struct object_database` directly. To make the transition a bit easier we have retained some of the old-style functions in case those were widely used. Now that Git 2.51 has been released it's time to clean up though and drop these old wrappers. Do so and adapt the small number of newly added users to use the new functions instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-02Merge branch 'ps/object-store-midx-dedup-info' into ps/packfile-storeJunio C Hamano1-13/+18
* ps/object-store-midx-dedup-info: midx: compute paths via their source midx: stop duplicating info redundant with its owning source midx: write multi-pack indices via their source midx: load multi-pack indices via their source midx: drop redundant `struct repository` parameter odb: simplify calling `link_alt_odb_entry()` odb: return newly created in-memory sources odb: consistently use "dir" to refer to alternate's directory odb: allow `odb_find_source()` to fail odb: store locality in object database sources
2025-08-25Merge branch 'ps/commit-graph-wo-globals'Junio C Hamano1-1/+1
Remove dependency on the_repository and other globals from the commit-graph code, and other changes unrelated to de-globaling. * ps/commit-graph-wo-globals: commit-graph: stop passing in redundant repository commit-graph: stop using `the_repository` commit-graph: stop using `the_hash_algo` commit-graph: refactor `parse_commit_graph()` to take a repository commit-graph: store the hash algorithm instead of its length commit-graph: stop using `the_hash_algo` via macros
2025-08-21Merge branch 'jc/string-list-split'Junio C Hamano4-6/+7
string_list_split*() family of functions have been extended to simplify common use cases. * jc/string-list-split: string-list: split-then-remove-empty can be done while splitting string-list: optionally omit empty string pieces in string_list_split*() diff: simplify parsing of diff.colormovedws string-list: optionally trim string pieces split by string_list_split*() string-list: unify string_list_split* functions string-list: align string_list_split() with its _in_place() counterpart string-list: report programming error with BUG
2025-08-21Merge branch 'ps/remote-rename-fix'Junio C Hamano1-1/+2
"git remote rename origin upstream" failed to move origin/HEAD to upstream/HEAD when origin/HEAD is unborn and performed other renames extremely inefficiently, which has been corrected. * ps/remote-rename-fix: builtin/remote: only iterate through refs that are to be renamed builtin/remote: rework how remote refs get renamed builtin/remote: determine whether refs need renaming early on builtin/remote: fix sign comparison warnings refs: simplify logic when migrating reflog entries refs: pass refname when invoking reflog entry callback
2025-08-15commit-graph: stop passing in redundant repositoryPatrick Steinhardt1-1/+1
Many of the commit-graph related functions take in both a repository and the object database source (directly or via `struct commit_graph`) for which we are supposed to load such a commit-graph. In the best case this information is simply redundant as the source already contains a reference to its owning object database, which in turn has a reference to its repository. In the worst case this information could even mismatch when passing in a source that doesn't belong to the same repository. Refactor the code so that we only pass in the object database source in those cases. There is one exception though, namely `load_commit_graph_chain_fd_st()`, which is responsible for loading a commit-graph chain. It is expected that parts of the commit-graph chain aren't located in the same object source as the chain file itself, but in a different one. Consequently, this function doesn't work on the source level but on the database level instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11midx: stop duplicating info redundant with its owning sourcePatrick Steinhardt1-1/+1
Multi-pack indices store some information that is redundant with their owning source: - The locality bit that tracks whether the source is the primary object source or an alternate. - The object directory path the multi-pack index is located in. - The pointer to the owning parent directory. All of this information is already contained in `struct odb_source`. So now that we always have that struct available when loading a multi-pack index we have it readily accessible. Drop the redundant information and instead store a pointer to the object source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11midx: load multi-pack indices via their sourcePatrick Steinhardt1-10/+15
To load a multi-pack index the caller is expected to pass both the repository and the object directory where the multi-pack index is located. While this works, this layout has a couple of downsides: - We need to pass in information reduntant with the owning source, namely its object directory and whether the source is local or not. - We don't have access to the source when loading the multi-pack index. If we had that access, we could store a pointer to the owning source in the MIDX and thus deduplicate some information. - Multi-pack indices are inherently specific to the object source and its format. With the goal of pluggable object backends in mind we will eventually want the backends to own the logic of reading and writing multi-pack indices. Making the logic work on top of object sources is a step into that direction. Refactor loading of multi-pack indices accordingly. This surfaces one small problem though: git-multi-pack-index(1) and our MIDX test helper both know to read and write multi-pack-indices located in a different object directory. This issue is addressed by adding the user-provided object directory as an in-memory alternate. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11midx: drop redundant `struct repository` parameterPatrick Steinhardt1-2/+2
There are a couple of functions that take both a `struct repository` and a `struct multi_pack_index`. This provides redundant information though without much benefit given that the multi-pack index already has a pointer to its owning repository. Drop the `struct repository` parameter from such functions. While at it, reorder the list of parameters of `fill_midx_entry()` so that the MIDX comes first to better align with our coding guidelines. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-06refs: pass refname when invoking reflog entry callbackPatrick Steinhardt1-1/+2
With `refs_for_each_reflog_ent()` callers can iterate through all the reflog entries for a given reference. The callback that is being invoked for each such entry does not receive the name of the reference that we are currently iterating through. This isn't really a limiting factor, as callers can simply pass the name via the callback data. But this layout sometimes does make for a bit of an awkward calling pattern. One example: when iterating through all reflogs, and for each reflog we iterate through all refnames, we have to do some extra book keeping to track which reference name we are currently yielding reflog entries for. Change the signature of the callback function so that the reference name of the reflog gets passed through to it. Adapt callers accordingly and start using the new parameter in trivial cases. The next commit will refactor the reference migration logic to make use of this parameter so that we can simplify its logic a bit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-04Merge branch 'jc/test-hashmap-is-still-here'Junio C Hamano1-0/+5
Comment fix. * jc/test-hashmap-is-still-here: test-hashmap: document why it is no longer used but still there
2025-08-04Merge branch 'ps/config-wo-the-repository'Junio C Hamano4-13/+15
The config API had a set of convenience wrapper functions that implicitly use the_repository instance; they have been removed and inlined at the calling sites. * ps/config-wo-the-repository: (21 commits) config: fix sign comparison warnings config: move Git config parsing into "environment.c" config: remove unused `the_repository` wrappers config: drop `git_config_set_multivar()` wrapper config: drop `git_config_get_multivar_gently()` wrapper config: drop `git_config_set_multivar_in_file_gently()` wrapper config: drop `git_config_set_in_file_gently()` wrapper config: drop `git_config_set()` wrapper config: drop `git_config_set_gently()` wrapper config: drop `git_config_set_in_file()` wrapper config: drop `git_config_get_bool()` wrapper config: drop `git_config_get_ulong()` wrapper config: drop `git_config_get_int()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string_multi()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get()` wrapper config: drop `git_config_clear()` wrapper ...
2025-08-03Merge branch 'hl/test-helper-fd-close'Junio C Hamano2-53/+27
A few file descriptors left unclosed upon program completion in a few test helper programs are now closed. * hl/test-helper-fd-close: test-delta: close output descriptor after use test-delta: use strbufs to hold input files test-delta: handle errors with die() t/helper/test-truncate: close file descriptor after truncation
2025-08-02string-list: split-then-remove-empty can be done while splittingJunio C Hamano2-4/+4
Thanks to the new STRING_LIST_SPLIT_NONEMPTY flag, a common pattern to split a string into a string list and then remove empty items in the resulting list is no longer needed. Instead, just tell the string_list_split*() to omit empty ones while splitting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-02string-list: align string_list_split() with its _in_place() counterpartJunio C Hamano2-2/+3
The string_list_split_in_place() function was updated by 52acddf3 (string-list: multi-delimiter `string_list_split_in_place()`, 2023-04-24) to take more than one delimiter characters, hoping that we can later use it to replace our uses of strtok(). We however did not make a matching change to the string_list_split() function, which is very similar. Before giving both functions more features in future commits, allow string_list_split() to also take more than one delimiter characters to make them closer to each other. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-30test-hashmap: document why it is no longer used but still thereJunio C Hamano1-0/+5
As I ended up wasting a few dozen minutes looking for the reason why this is still here, help future developers by saving them from wasting their time by documenting why this code that apparently is not used by anybody is still here. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-24test-delta: close output descriptor after useJeff King1-0/+2
After we write to the output file, the program exits. This naturally closes the descriptor. But we should do an explicit close for two reasons: 1. It's possible to hit an error on close(), which we should detect and report via our exit code. 2. Leaking descriptors is a bad practice in general. Even if it isn't meaningful here, it sets a bad example. It is tempting to write: if (write_in_full(fd, ...) < 0 || close(fd) < 0) die_errno(...); But that pattern contains a subtle problem that has resulted in descriptor leaks before. If write_in_full() fails, we'll short-circuit and never call close(), leaking the descriptor. That's not a problem here, since our error path dies instead of returning up the stack. But since we're trying to set a good example, let's write it out as two separate conditions. As a bonus, that lets us produce a slightly more specific error message. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-24test-delta: use strbufs to hold input filesJeff King1-26/+14
We want to read the whole contents of two files into memory. If we switch from raw ptr/len pairs to strbufs, we can use strbuf_read_file() to shorten the code. This incidentally fixes two small bugs: 1. We stat() the files and allocate our buffers based on st.st_size. But that is an off_t which may be larger than the size_t we'd use to allocate. We should use xsize_t() to do a checked conversion. Otherwise integer truncation (on a file >4GB) could cause us to under-allocate (though in practice this does not result in a buffer overflow because the same truncation happens when read_in_full() also takes a size_t). 2. We get the size from st.st_size, and then try to read_in_full() that many bytes. But it may return fewer bytes than expected (if the file changed racily and we get an early EOF), leading us to read uninitialized bytes in the allocated buffer. We don't notice because we only check the value for error, not that we got the expected number of bytes. The strbuf code doesn't run into this, because it just reads to EOF, expanding the buffer dynamically as necessary. Neither bug is a big deal for a test helper, but fixing them is a nice bonus on top of simplifying the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-24test-delta: handle errors with die()Jeff King1-37/+18
This is a short test helper that does all of its work in the main function. When we encounter an error, we try to clean up memory and descriptors and then jump to an error return, which exits the program. We can get the same effect by just calling die(), which means we do not have to bother with cleaning up. This simplifies the code, and also removes some inconsistencies where a few code paths forgot to clean up descriptors (though in practice it was not a big deal since we were exiting anyway). In addition to die() and die_errno(), we'll also use a few of our usual helpers like xopen() and usage() that make things more ergonomic. This does change the exit code in these cases from 1 to 128, but I don't think it matters (and arguably is better, as we'd already exit 128 for other errors like xmalloc() failure). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23Merge branch 'ly/changed-paths-traversal'Junio C Hamano1-4/+4
Lift the limitation to use changed-path filter in "git log" so that it can be used for a pathspec with multiple literal paths. * ly/changed-paths-traversal: bloom: optimize multiple pathspec items in revision revision: make helper for pathspec to bloom keyvec bloom: replace struct bloom_key * with struct bloom_keyvec bloom: rename function operates on bloom_key bloom: add test helper to return murmur3 hash
2025-07-23config: move Git config parsing into "environment.c"Patrick Steinhardt2-0/+2
In "config.c" we host both the business logic to read and write config files as well as the logic to parse specific Git-related variables. On the one hand this is mixing concerns, but even more importantly it means that we cannot easily remove the dependency on `the_repository` in our config parsing logic. Move the logic into "environment.c". This file is a grab bag of all kinds of global state already, so it is quite a good fit. Furthermore, it also hosts most of the global variables that we're parsing the config values into, making this an even better fit. Note that there is one hidden change: in `parse_fsync_components()` we use an `int` to iterate through `ARRAY_SIZE(fsync_component_names)`. But as -Wsign-compare warnings are enabled in this file this causes a compiler warning. The issue is fixed by using a `size_t` instead. This change allows us to drop the `USE_THE_REPOSITORY_VARIABLE` declaration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_bool()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_bool()`. All callsites are adjusted so that they use `repo_config_get_bool(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_int()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_int()`. All callsites are adjusted so that they use `repo_config_get_int(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_string()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_string()`. All callsites are adjusted so that they use `repo_config_get_string(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_value()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_value()`. All callsites are adjusted so that they use `repo_config_get_value(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_value()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_value()`. All callsites are adjusted so that they use `repo_config_get_value(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get()`. All callsites are adjusted so that they use `repo_config_get(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config()` wrapperPatrick Steinhardt4-7/+7
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-22t/helper/test-truncate: close file descriptor after truncationHoyoung Lee1-0/+3
Fix a resource leak where the file descriptor was not closed after truncating a file in t/helper/test-truncate.c. Signed-off-by: Hoyoung Lee <lhywkd22@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16Merge branch 'rs/parse-options-precision'Junio C Hamano1-3/+14
Define .precision to more canned parse-options type to avoid bugs coming from using a variable with a wrong type to capture the parsed values. * rs/parse-options-precision: parse-options: add precision handling for OPTION_COUNTUP parse-options: add precision handling for OPTION_BITOP parse-options: add precision handling for OPTION_NEGBIT parse-options: add precision handling for OPTION_BIT parse-options: add precision handling for OPTION_SET_INT parse-options: add precision handling for PARSE_OPT_CMDMODE parse-options: require PARSE_OPT_NOARG for OPTION_BITOP
2025-07-15Merge branch 'ly/load-bitmap-leakfix'Junio C Hamano1-0/+8
Leakfix with a new and a bit invasive test. * ly/load-bitmap-leakfix: pack-bitmap: add load corrupt bitmap test pack-bitmap: reword comments in test_bitmap_commits() pack-bitmap: fix memory leak if load_bitmap() failed
2025-07-15Merge branch 'ps/object-store'Junio C Hamano6-11/+11
Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`
2025-07-14Merge branch 'sj/string-list'Junio C Hamano1-96/+0
Code and test clean-up around string-list API. * sj/string-list: u-string-list: move "remove duplicates" test to "u-string-list.c" u-string-list: move "filter string" test to "u-string-list.c" u-string-list: move "test_split_in_place" to "u-string-list.c" u-string-list: move "test_split" into "u-string-list.c" string-list: enable sign compare warnings check string-list: return index directly when inserting an existing element string-list: remove unused "insert_at" parameter from add_entry string-list: fix sign compare warnings for loop iterator
2025-07-14bloom: rename function operates on bloom_keyLidong Yan1-2/+2
git code style requires that functions operating on a struct S should be named in the form S_verb. However, the functions operating on struct bloom_key do not follow this convention. Therefore, fill_bloom_key() and clear_bloom_key() are renamed to bloom_key_fill() and bloom_key_clear(), respectively. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-14bloom: add test helper to return murmur3 hashLidong Yan1-2/+2
In bloom.h, murmur3_seeded_v2() is exported for the use of test murmur3 hash. To clarify that murmur3_seeded_v2() is exported solely for testing purposes, a new helper function test_murmur3_seeded() was added instead of exporting murmur3_seeded_v2() directly. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-09parse-options: add precision handling for OPTION_COUNTUPRené Scharfe1-0/+3
Similar to 09705696f7 (parse-options: introduce precision handling for `OPTION_INTEGER`, 2025-04-17) support value variables of different sizes for OPTION_COUNTUP. Do that by requiring their "precision" to be set, casting their "value" pointer accordingly and checking whether the value fits. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-09parse-options: add precision handling for OPTION_SET_INTRené Scharfe1-0/+1
Similar to 09705696f7 (parse-options: introduce precision handling for `OPTION_INTEGER`, 2025-04-17) support value variables of different sizes for OPTION_SET_INT. Do that by requiring their "precision" to be set, casting their "value" pointer accordingly and checking whether the value fits. Factor out the casting code from the part of do_get_value() that handles OPTION_INTEGER to avoid code duplication. We're going to use it in the next patches as well. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-09parse-options: add precision handling for PARSE_OPT_CMDMODERené Scharfe1-3/+10
Build on 09705696f7 (parse-options: introduce precision handling for `OPTION_INTEGER`, 2025-04-17) to support value variables of different sizes for PARSE_OPT_CMDMODE options. Do that by requiring their "precision" to be set and casting their "value" pointer accordingly. Call the function that does the raw casting do_get_int_value() to reserve the name get_int_value() for a more friendly wrapper we're going to introduce in one of the next patches. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-07u-string-list: move "remove duplicates" test to "u-string-list.c"shejialuo1-39/+0
We use "test-tool string-list remove_duplicates" to test the "string_list_remove_duplicates" function. As we have introduced the unit test, we'd better remove the logic from shell script to C program to improve test speed and readability. As all the tests in shell script are removed, let's just delete the "t0063-string-list.sh" and update the "meson.build" file to align with this change. Also we could simply remove "DISABLE_SIGN_COMPARE_WARNINGS" due to we have already deleted related code. Unfortunately, we cannot totally remove "test-string-list.c" due to that we would test the performance of sorting about string list by executing "test-tool string-list sort" in "p0071-sort.sh". Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-07u-string-list: move "filter string" test to "u-string-list.c"shejialuo1-21/+0
We use "test-tool string-list filter" to test the "filter_string_list" function. As we have introduced the unit test, we'd better remove the logic from shell script to C program to improve test speed and readability. Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-07u-string-list: move "test_split_in_place" to "u-string-list.c"shejialuo1-22/+0
We use "test-tool string-list split_in_place" to test the "string_list_split_in_place" function. As we have introduced the unit test, we'd better remove the logic from shell script to C program to improve test speed and readability. Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-07u-string-list: move "test_split" into "u-string-list.c"shejialuo1-14/+0
We rely on "test-tool string-list" command to test the functionality of the "string-list". However, as we have introduced clar test framework, we'd better move the shell script into C program to improve speed and readability. Create a new file "u-string-list.c" under "t/unit-tests", then update the Makefile and "meson.build" to build the file. And let's first move "test_split" into unit test and gradually convert the shell script into C program. In order to create `string_list` easily by simply specifying strings in the function call, create "t_vcreate_string_list_dup" function to do this. Then port the shell script tests to C program and remove unused "test-tool" code and tests. Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt1-1/+1
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: get rid of `the_repository` when handling alternatesPatrick Steinhardt1-1/+1
The functions to manage alternates all depend on `the_repository`. Refactor them to accept an object database as a parameter and adjust all callers. The functions are renamed accordingly. Note that right now the situation is still somewhat weird because we end up using the object store path provided by the object store's repository anyway. Consequently, we could have instead passed in a pointer to the repository instead of passing in the pointer to the object store. This will be addressed in subsequent commits though, where we will start to use the path owned by the object store itself. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt6-6/+6
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename `object_directory` to `odb_source`Patrick Steinhardt1-3/+3
The `object_directory` structure is used as an access point for a single object directory like ".git/objects". While the structure isn't yet fully self-contained, the intent is for it to eventually contain all information required to access objects in one specific location. While the name "object directory" is a good fit for now, this will change over time as we continue with the agenda to make pluggable object databases a thing. Eventually, objects may not be accessed via any kind of directory at all anymore, but they could instead be backed by any kind of durable storage mechanism. While it seems quite far-fetched for now, it is thinkable that eventually this might even be some form of a database, for example. As such, the current name of this structure will become worse over time as we evolve into the direction of pluggable ODBs. Immediate next steps will start to carve out proper self-contained object directories, which requires us to pass in these object directories as parameters. Based on our modern naming schema this means that those functions should then be named after their subsystem, which means that we would start to bake the current name into the codebase more and more. Let's preempt this by renaming the structure. There have been a couple alternatives that were discussed: - `odb_backend` was discarded because it led to the association that one object database has a single backend, but the model is that one alternate has one backend. Furthermore, "backend" is more about the actual backing implementation and less about the high-level concept. - `odb_alternate` was discarded because it is a bit of a stretch to also call the main object directory an "alternate". Instead, pick `odb_source` as the new name. It makes it sufficiently clear that there can be multiple sources and does not cause confusion when mixed with the already-existing "alternate" terminology. In the future, this change allows us to easily introduce for example a `odb_files_source` and other format-specific implementations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01pack-bitmap: add load corrupt bitmap testLidong Yan1-0/+8
t5310 lacks a test to ensure git works correctly when commit bitmap data is corrupted. So this patch add test helper in pack-bitmap.c to list each commit bitmap position in bitmap file and `load corrupt bitmap` test case in t/t5310 to corrupt a commit bitmap before loading it. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-17Merge branch 'ds/path-walk-2'Junio C Hamano1-0/+2
"git pack-objects" learns to find delta bases from blobs at the same path, using the --path-walk API. * ds/path-walk-2: pack-objects: allow --shallow and --path-walk path-walk: add new 'edge_aggressive' option pack-objects: thread the path-based compression pack-objects: refactor path-walk delta phase scalar: enable path-walk during push via config pack-objects: enable --path-walk via config repack: add --path-walk option t5538: add tests to confirm deltas in shallow pushes pack-objects: introduce GIT_TEST_PACK_PATH_WALK p5313: add performance tests for --path-walk pack-objects: update usage to match docs pack-objects: add --path-walk option pack-objects: extract should_attempt_deltas()
2025-05-16path-walk: add new 'edge_aggressive' optionDerrick Stolee1-0/+2
In preparation for allowing both the --shallow and --path-walk options in the 'git pack-objects' builtin, create a new 'edge_aggressive' option in the path-walk API. This option will help walk the boundary more thoroughly and help avoid sending extra objects during fetches and pushes. The only use of the 'edge_hint_aggressive' option in the revision API is within mark_edges_uninteresting(), which is usually called before between prepare_revision_walk() and before visiting commits with get_revision(). In prepare_revision_walk(), the UNINTERESTING commits are walked until a boundary is found. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16t/helper: add zlib test-toolJeff King4-0/+65
It's occasionally useful when testing or debugging to be able to do raw zlib inflate/deflate operations (e.g., to check the bytes of a specific loose or packed object). Even though zlib's deflate algorithm is used by many other programs, this is surprisingly hard to do in a portable way. E.g., gzip can do this if you manually munge some header bytes. But the result is somewhat arcane, and we don't assume gzip is available anyway. Likewise, pigz will handle raw zlib, but we can't assume it is available. So let's introduce a short test helper for just doing zlib operations. We'll use it in subsequent patches to add some new tests, but it would also have come in handy a few times in the past: - The hard-coded pack data from 3b910d0c5e (add tests for indexing packs with delta cycles, 2013-08-23) could probably be generated on the fly. - Likewise we could avoid the hard-coded data from 0b1493c2d4 (git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT, 2025-02-25). Though note this would require support for more zlib options. - It would have helped with the debugging documented in 41dfbb2dbe (howto: add article on recovering a corrupted object, 2013-10-25). I'll leave refactoring existing tests for another day, but I hope the examples above show the general utility. I aimed for simplicity in the code. In particular, it will read all input into a memory buffer, rather than streaming. That makes the zlib loops harder to get wrong (which has been a source of subtle bugs in the past). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12Merge branch 'ds/fix-thin-fix'Junio C Hamano4-0/+151
"git index-pack --fix-thin" used to abort to prevent a cycle in delta chains from forming in a corner case even when there is no such cycle. * ds/fix-thin-fix: index-pack: allow revisiting REF_DELTA chains t5309: create failing test for 'git index-pack' test-tool: add pack-deltas helper
2025-04-29Merge branch 'ps/reftable-api-revamp'Junio C Hamano1-7/+74
Overhaul of the reftable API. * ps/reftable-api-revamp: reftable/table: move printing logic into test helper reftable/constants: make block types part of the public interface reftable/table: introduce iterator for table blocks reftable/table: add `reftable_table` to the public interface reftable/block: expose a generic iterator over reftable records reftable/block: make block iterators reseekable reftable/block: store block pointer in the block iterator reftable/block: create public interface for reading blocks git-zlib: use `struct z_stream_s` instead of typedef reftable/block: rename `block_reader` to `reftable_block` reftable/block: rename `block` to `block_data` reftable/table: move reading block into block reader reftable/block: simplify how we track restart points reftable/blocksource: consolidate code into a single file reftable/reader: rename data structure to "table" reftable: fix formatting of the license header
2025-04-29Merge branch 'az/tighten-string-array-constness'Junio C Hamano9-13/+13
Code clean-up. * az/tighten-string-array-constness: global: mark usage strings and string tables const
2025-04-28test-tool: add pack-deltas helperDerrick Stolee4-0/+151
When trying to demonstrate certain behavior in tests, it can be helpful to create packfiles that have specific delta structures. 'git pack-objects' uses various algorithms to select deltas based on their compression rates, but that does not always demonstrate all possible packfile shapes. This becomes especially important when wanting to test 'git index-pack' and its ability to parse certain pack shapes. We have prior art in t/lib-pack.sh, where certain delta structures are produced by manually writing certain opaque pack contents. However, producing these script updates is cumbersome and difficult to do as a contributor. Instead, create a new test-tool, 'test-tool pack-deltas', that reads a list of instructions for which objects to include in a packfile and how those objects should be written in delta form. At the moment, this only supports REF_DELTAs as those are the kinds of deltas needed to exercise a bug in 'git index-pack'. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/parse-options-integers'Junio C Hamano1-11/+39
Update parse-options API to catch mistakes to pass address of an integral variable of a wrong type/size. * ps/parse-options-integers: parse-options: detect mismatches in integer signedness parse-options: introduce precision handling for `OPTION_UNSIGNED` parse-options: introduce precision handling for `OPTION_INTEGER` parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()` parse-options: support unit factors in `OPT_INTEGER()` global: use designated initializers for options parse: fix off-by-one for minimum signed values
2025-04-24Merge branch 'ps/object-file-cleanup'Junio C Hamano5-5/+5
Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-21global: mark usage strings and string tables constAhelenia Ziemiańska9-13/+13
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17parse-options: introduce precision handling for `OPTION_UNSIGNED`Patrick Steinhardt1-0/+3
This commit is the equivalent to the preceding commit, but instead of introducing precision handling for `OPTION_INTEGER` we introduce it for `OPTION_UNSIGNED`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17parse-options: introduce precision handling for `OPTION_INTEGER`Patrick Steinhardt1-0/+3
The `OPTION_INTEGER` option type accepts a signed integer. The type of the underlying integer is a simple `int`, which restricts the range of values accepted by such options. But there is a catch: because the caller provides a pointer to the value via the `.value` field, which is a simple void pointer. This has two consequences: - There is no check whether the passed value is sufficiently long to store the entire range of `int`. This can lead to integer wraparound in the best case and out-of-bounds writes in the worst case. - Even when a caller knows that they want to store a value larger than `INT_MAX` they don't have a way to do so. In practice this doesn't tend to be a huge issue because users typically don't end up passing huge values to most commands. But the parsing logic is demonstrably broken, and it is too easy to get the calling convention wrong. Improve the situation by introducing a new `precision` field into the structure. This field gets assigned automatically by `OPT_INTEGER_F()` and tracks the size of the passed value. Like this it becomes possible for the caller to pass arbitrarily-sized integers and the underlying logic knows to handle it correctly by doing range checks. Furthermore, convert the code to use `strtoimax()` intstead of `strtol()` so that we can also parse values larger than `LONG_MAX`. Note that we do not yet assert signedness of the passed variable, which is another source of bugs. This will be handled in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`Patrick Steinhardt1-3/+3
With the preceding commit, `OPT_INTEGER()` has learned to support unit factors. Consequently, the major differencen between `OPT_INTEGER()` and `OPT_MAGNITUDE()` isn't the support of unit factors anymore, as both of them do support them now. Instead, the difference is that one handles signed and the other handles unsigned integers. Adapt the name of `OPT_MAGNITUDE()` accordingly by renaming it to `OPT_UNSIGNED()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17global: use designated initializers for optionsPatrick Steinhardt1-8/+30
While we expose macros for most of our different option types understood by the "parse-options" subsystem, not every combination of fields that has one as that would otherwise quickly lead to an explosion of macros. Instead, we just initialize structures manually for those variants of fields that don't have a macro. Callsites that open-code these structure initialization don't use designated initializers though and instead just provide values for each of the fields that they want to initialize. This has three significant downsides: - Callsites need to specify all values up to the last field that they care about. This often includes fields that should simply be left at their default zero-initialized state, which adds distraction. - Any reader not deeply familiar with the layout of the structure has a hard time figuring out what the respective initializers mean. - Reordering or introducing new fields in the middle of the structure is impossible without adapting all callsites. Convert all sites to instead use designated initializers, which we have started using in our codebase quite a while ago. This allows us to skip any default-initialized fields, gives the reader context by specifying the field names and allows us to reorder or introduce new fields where we want to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-16Merge branch 'ps/test-wo-perl-prereq'Junio C Hamano2-2/+15
"make test" used to have a hard dependency on (basic) Perl; tests have been rewritten help environment with NO_PERL test the build as much as possible. * ps/test-wo-perl-prereq: t5703: refactor test to not depend on Perl t5316: refactor `max_chain()` to not depend on Perl t0210: refactor trace2 scrubbing to not use Perl t0021: refactor `generate_random_characters()` to not depend on Perl t/lib-httpd: refactor "one-time-perl" CGI script to not depend on Perl t/lib-t6000: refactor `name_from_description()` to not depend on Perl t/lib-gpg: refactor `sanitize_pgp()` to not depend on Perl t: refactor tests depending on Perl for textconv scripts t: refactor tests depending on Perl to print data t: refactor tests depending on Perl substitution operator t: refactor tests depending on Perl transliteration operator Makefile: stop requiring Perl when running tests meson: stop requiring Perl when tests are enabled t: adapt existing PERL prerequisites t: introduce PERL_TEST_HELPERS prerequisite t: adapt `test_readlink()` to not use Perl t: adapt `test_copy_bytes()` to not use Perl t: adapt character translation helpers to not use Perl t: refactor environment sanitization to not use Perl t: skip chain lint when PERL_PATH is unset
2025-04-15Merge branch 'ps/object-wo-the-repository'Junio C Hamano2-2/+2
The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt5-5/+5
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanupJunio C Hamano2-2/+2
* ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-07reftable/table: move printing logic into test helperPatrick Steinhardt1-1/+68
The logic to print individual blocks in a table is hosted in the reftable library. This is only the case due to historical reasons though because users of the library had no interfaces to read blocks one by one. Otherwise, printing individual blocks has no place in the reftable library given that the format will not be generic in the first place. We have now grown a public interface to iterate through blocks contained in a table, and thus we can finally move the logic to print them into the test helper. Move over the logic and refactor it accordingly. Note that the iterator also trivially allows us to access index sections, which we previously didn't print at all. This omission wasn't intentional though, so start dumping those sections as well so that we can assert that indices are written as expected. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07reftable/reader: rename data structure to "table"Patrick Steinhardt1-7/+7
The `struct reftable_reader` subsystem encapsulates a table that has been read from the disk. As such, the current name of that structure is somewhat hard to understand as it only talks about the fact that we read something from disk, without really giving an indicator _what_ that is. Furthermore, this naming schema doesn't really fit well into how the other structures are named: `reftable_merged_table`, `reftable_stack`, `reftable_block` and `reftable_record` are all named after what they encapsulate. Rename the subsystem to `reftable_table`, which directly gives a hint that the data structure is about handling the individual tables part of the stack. While this change results in a lot of churn, it prepares for us exposing the APIs to third-party callers now that the reftable library is a standalone library that can be linked against by other projects. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07t: refactor tests depending on Perl transliteration operatorPatrick Steinhardt1-2/+2
We have a bunch of tests that use Perl to perform character transliteration via the "y/" or "tr/" operator. These usecases can be trivially replaced with tr(1). Refactor the tests accordingly so that we can drop a couple of PERL_TEST_HELPERS prerequisites. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07t: adapt `test_readlink()` to not use PerlPatrick Steinhardt1-0/+13
The `test_readlink()` helper function reads a symbolic link and returns the path it is pointing to. It is thus equivalent to the readlink(1) utility, which isn't available on all supported platforms. As such, it is implemented using Perl so that we can use it even on platforms where the shell utility isn't available. While using readlink(1) is not an option, what we can do is to implement the logic ourselves in our test-tool. Do so, which allows a bunch of tests to pass when Perl is not available. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07Merge branch 'js/mingw-admins-are-special'Junio C Hamano1-0/+19
"Dubious ownership" checks on Windows has been tightened up. * js/mingw-admins-are-special: test-tool path-utils: support debugging "dubious ownership" issues mingw: special-case administrators even more
2025-03-29Merge branch 'ps/refname-avail-check-optim'Junio C Hamano1-0/+1
The code paths to check whether a refname X is available (by seeing if another ref X/Y exists, etc.) have been optimized. * ps/refname-avail-check-optim: refs: reuse iterators when determining refname availability refs/iterator: implement seeking for files iterators refs/iterator: implement seeking for packed-ref iterators refs/iterator: implement seeking for ref-cache iterators refs/iterator: implement seeking for reftable iterators refs/iterator: implement seeking for merged iterators refs/iterator: provide infrastructure to re-seek iterators refs/iterator: separate lifecycle from iteration refs: stop re-verifying common prefixes for availability refs/files: batch refname availability checks for initial transactions refs/files: batch refname availability checks for normal transactions refs/reftable: batch refname availability checks refs: introduce function to batch refname availability checks builtin/update-ref: skip ambiguity checks when parsing object IDs object-name: allow skipping ambiguity checks in `get_oid()` family object-name: introduce `repo_get_oid_with_flags()`
2025-03-25test-tool path-utils: support debugging "dubious ownership" issuesJohannes Schindelin1-0/+19
This adds a new sub-sub-command for `test-tool`, simply passing through the command-line arguments to the `is_path_owned_by_current_user()` function. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs/iterator: separate lifecycle from iterationPatrick Steinhardt1-0/+1
The ref and reflog iterators have their lifecycle attached to iteration: once the iterator reaches its end, it is automatically released and the caller doesn't have to care about that anymore. When the iterator should be released before it has been exhausted, callers must explicitly abort the iterator via `ref_iterator_abort()`. This lifecycle is somewhat unusual in the Git codebase and creates two problems: - Callsites need to be very careful about when exactly they call `ref_iterator_abort()`, as calling the function is only valid when the iterator itself still is. This leads to somewhat awkward calling patterns in some situations. - It is impossible to reuse iterators and re-seek them to a different prefix. This feature isn't supported by any iterator implementation except for the reftable iterators anyway, but if it was implemented it would allow us to optimize cases where we need to search for specific references repeatedly by reusing internal state. Detangle the lifecycle from iteration so that we don't deallocate the iterator anymore once it is exhausted. Instead, callers are now expected to always call a newly introduce `ref_iterator_free()` function that deallocates the iterator and its internal state. Note that the `dir_iterator` is somewhat special because it does not implement the `ref_iterator` interface, but is only used to implement other iterators. Consequently, we have to provide `dir_iterator_free()` instead of `dir_iterator_release()` as the allocated structure itself is managed by the `dir_iterator` interfaces, as well, and not freed by `ref_iterator_free()` like in all the other cases. While at it, drop the return value of `ref_iterator_abort()`, which wasn't really required by any of the iterator implementations anyway. Furthermore, stop calling `base_ref_iterator_free()` in any of the backends, but instead call it in `ref_iterator_free()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt2-2/+2
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-06Merge branch 'tz/doc-txt-to-adoc-fixes'Junio C Hamano1-1/+1
Fallouts from recent renaming of documentation files from .txt suffix to the new .adoc suffix have been corrected. * tz/doc-txt-to-adoc-fixes: (38 commits) xdiff: *.txt -> *.adoc fixes unpack-trees.c: *.txt -> *.adoc fixes transport.h: *.txt -> *.adoc fixes trace2/tr2_sysenv.c: *.txt -> *.adoc fixes trace2.h: *.txt -> *.adoc fixes t6434: *.txt -> *.adoc fixes t6012: *.txt -> *.adoc fixes t/helper/test-rot13-filter.c: *.txt -> *.adoc fixes simple-ipc.h: *.txt -> *.adoc fixes setup.c: *.txt -> *.adoc fixes refs.h: *.txt -> *.adoc fixes pseudo-merge.h: *.txt -> *.adoc fixes parse-options.h: *.txt -> *.adoc fixes object-name.c: *.txt -> *.adoc fixes list-objects-filter-options.h: *.txt -> *.adoc fixes fsck.h: *.txt -> *.adoc fixes diffcore.h: *.txt -> *.adoc fixes diff.h: *.txt -> *.adoc fixes contrib/long-running-filter: *.txt -> *.adoc fixes config.c: *.txt -> *.adoc fixes ...
2025-03-05Merge branch 'ps/path-sans-the-repository'Junio C Hamano1-4/+3
The path.[ch] API takes an explicit repository parameter passed throughout the callchain, instead of relying on the_repository singleton instance. * ps/path-sans-the-repository: path: adjust last remaining users of `the_repository` environment: move access to "core.sharedRepository" into repo settings environment: move access to "core.hooksPath" into repo settings repo-settings: introduce function to clear struct path: drop `git_path()` in favor of `repo_git_path()` rerere: let `rerere_path()` write paths into a caller-provided buffer path: drop `git_common_path()` in favor of `repo_common_path()` worktree: return allocated string from `get_worktree_git_dir()` path: drop `git_path_buf()` in favor of `repo_git_path_replace()` path: drop `git_pathdup()` in favor of `repo_git_path()` path: drop unused `strbuf_git_path()` function path: refactor `repo_submodule_path()` family of functions submodule: refactor `submodule_to_gitdir()` to accept a repo path: refactor `repo_worktree_path()` family of functions path: refactor `repo_git_path()` family of functions path: refactor `repo_common_path()` family of functions
2025-03-03t/helper/test-rot13-filter.c: *.txt -> *.adoc fixesTodd Zullinger1-1/+1
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03Merge branch 'ps/build-meson-fixes-0130'Junio C Hamano1-2/+2
Assorted fixes and improvements to the build procedure based on meson. * ps/build-meson-fixes-0130: gitlab-ci: restrict maximum number of link jobs on Windows meson: consistently use custom program paths to resolve programs meson: fix overwritten `git` variable meson: prevent finding sed(1) in a loop meson: improve handling of `sane_tool_path` option meson: improve PATH handling meson: drop separate version library meson: stop linking libcurl into all executables meson: introduce `libgit_curl` dependency meson: simplify use of the common-main library meson: inline the static 'git' library meson: fix OpenSSL fallback when not explicitly required meson: fix exec path with enabled runtime prefix
2025-02-26meson: simplify use of the common-main libraryPatrick Steinhardt1-2/+2
The "common-main.c" file is used by multiple executables. In order to make it easy to set it up we have created a separate library that these executables can link against. All of these executables also want to link against `libgit.a` though, which makes it necessary to specify both of these as dependencies for every executable. Simplify this a bit by declaring the library as a source dependency: instead of creating a static library, we now instead compile the common set of files into each executable separately. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18Merge branch 'ds/backfill'Junio C Hamano1-1/+21
Lazy-loading missing files in a blobless clone on demand is costly as it tends to be one-blob-at-a-time. "git backfill" is introduced to help bulk-download necessary files beforehand. * ds/backfill: backfill: assume --sparse when sparse-checkout is enabled backfill: add --sparse option backfill: add --min-batch-size=<n> option backfill: basic functionality and tests backfill: add builtin boilerplate
2025-02-12Merge branch 'ds/name-hash-tweaks'Junio C Hamano4-0/+26
"git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version
2025-02-07path: refactor `repo_submodule_path()` family of functionsPatrick Steinhardt1-4/+3
As explained in an earlier commit, we're refactoring path-related functions to provide a consistent interface for computing paths into the commondir, gitdir and worktree. Refactor the "submodule" family of functions accordingly. Note that in contrast to the other `repo_*_path()` families, we have to pass in the repository as a non-constant pointer. This is because we end up calling `repo_read_gitmodules()` deep down in the callstack, which may end up modifying the repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add --sparse optionDerrick Stolee1-1/+21
One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensive as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Non-cone mode can describe the included files using both positive and negative patterns, which changes the possible return values of path_matches_pattern_list(). Test both kinds of matches for increased coverage. To test this, we can create a blobless sparse clone, expand the sparse-checkout slightly, and then run 'git backfill --sparse' to see how much data is downloaded. The general steps are 1. git clone --filter=blob:none --sparse <url> 2. git sparse-checkout set <dir1> ... <dirN> 3. git backfill --sparse For the Git repository with the 'builtin' directory in the sparse-checkout, we get these results for various batch sizes: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|-------| | (Initial clone) | 3 | 110 MB | | | 10K | 12 | 192 MB | 17.2s | | 15K | 9 | 192 MB | 15.5s | | 20K | 8 | 192 MB | 15.5s | | 25K | 7 | 192 MB | 14.7s | This case matters less because a full clone of the Git repository from GitHub is currently at 277 MB. Using a copy of the Linux repository with the 'kernel/' directory in the sparse-checkout, we get these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|------| | (Initial clone) | 2 | 1,876 MB | | | 10K | 11 | 2,187 MB | 46s | | 25K | 7 | 2,188 MB | 43s | | 50K | 5 | 2,194 MB | 44s | | 100K | 4 | 2,194 MB | 48s | This case is more meaningful because a full clone of the Linux repository is currently over 6 GB, so this is a valuable way to download a fraction of the repository and no longer need network access for all reachable objects within the sparse-checkout. Choosing a batch size will depend on a lot of factors, including the user's network speed or reliability, the repository's file structure, and how many versions there are of the file within the sparse-checkout scope. There will not be a one-size-fits-all solution. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'tb/unsafe-hash-cleanup'Junio C Hamano6-20/+35
The API around choosing to use unsafe variant of SHA-1 implementation has been updated in an attempt to make it harder to abuse. * tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper
2025-01-31global: adapt callers to use generic hash context helpersPatrick Steinhardt2-4/+4
Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31hash: stop typedeffing the hash contextPatrick Steinhardt2-3/+3
We generally avoid using `typedef` in the Git codebase. One exception though is the `git_hash_ctx`, likely because it used to be a union rather than a struct until the preceding commit refactored it. But now that it is a normal `struct` there isn't really a need for a typedef anymore. Drop the typedef and adapt all callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31Merge branch 'tb/unsafe-hash-cleanup' into ps/hash-cleanupJunio C Hamano6-20/+35
* tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper
2025-01-29Merge branch 'ds/path-walk-1'Junio C Hamano4-0/+115
Introduce a new API to visit objects in batches based on a common path, or by type. * ds/path-walk-1: path-walk: drop redundant parse_tree() call path-walk: reorder object visits path-walk: mark trees and blobs as UNINTERESTING path-walk: visit tags and cached objects path-walk: allow consumer to specify object types t6601: add helper for testing path-walk API test-lib-functions: add test_cmp_sorted path-walk: introduce an object walk by path
2025-01-28Merge branch 'jc/show-usage-help'Junio C Hamano1-2/+2
The help text from "git $cmd -h" appear on the standard output for some $cmd and the standard error for others. The built-in commands have been fixed to show them on the standard output consistently. * jc/show-usage-help: builtin: send usage() help text to standard output oddballs: send usage() help text to standard output builtins: send usage_with_options() help text to standard output usage: add show_usage_if_asked() parse-options: add show_usage_with_options_if_asked() t0012: optionally check that "-h" output goes to stdout
2025-01-27test-tool: add helper for name-hash valuesDerrick Stolee3-0/+25
Add a new test-tool helper, name-hash, to output the value of the name-hash algorithms for the input list of strings, one per line. Since the name-hash values can be stored in the .bitmap files, it is important that these hash functions do not change across Git versions. Add a simple test to t5310-pack-bitmaps.sh to provide some testing of the current values. Due to how these functions are implemented, it would be difficult to change them without disturbing these values. The paths used for this test are carefully selected to demonstrate some of the behavior differences of the two current name hash versions, including which conditions will cause them to collide. Create a performance test that uses test_size to demonstrate how collisions occur for these hash algorithms. This test helps inform someone as to the behavior of the name-hash algorithms for their repo based on the paths at HEAD. My copy of the Git repository shows modest statistics around the collisions of the default name-hash algorithm: Test this tree -------------------------------------------------- 5314.1: paths at head 4.5K 5314.2: distinct hash value: v1 4.1K 5314.3: maximum multiplicity: v1 13 5314.4: distinct hash value: v2 4.2K 5314.5: maximum multiplicity: v2 9 Here, the maximum collision multiplicity is 13, but around 10% of paths have a collision with another path. In a more interesting example, the microsoft/fluentui [1] repo had these statistics at time of committing: Test this tree -------------------------------------------------- 5314.1: paths at head 19.5K 5314.2: distinct hash value: v1 8.2K 5314.3: maximum multiplicity: v1 279 5314.4: distinct hash value: v2 17.8K 5314.5: maximum multiplicity: v2 44 [1] https://github.com/microsoft/fluentui That demonstrates that of the nearly twenty thousand path names, they are assigned around eight thousand distinct values. 279 paths are assigned to a single value, leading the packing algorithm to sort objects from those paths together, by size. With the v2 name hash function, the maximum multiplicity lowers to 44, leaving some room for further improvement. In a more extreme example, an internal monorepo had a much worse collision rate: Test this tree -------------------------------------------------- 5314.1: paths at head 227.3K 5314.2: distinct hash value: v1 72.3K 5314.3: maximum multiplicity: v1 14.4K 5314.4: distinct hash value: v2 166.5K 5314.5: maximum multiplicity: v2 138 Here, we can see that the v2 name hash function provides somem improvements, but there are still a number of collisions that could lead to repacking problems at this scale. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23t/helper/test-hash.c: use unsafe_hash_algo()Taylor Blau1-12/+5
Remove a series of conditionals within the shared cmd_hash_impl() helper that powers the 'sha1' and 'sha1-unsafe' helpers. Instead, replace them with a single conditional that transforms the specified hash algorithm into its unsafe variant. Then all subsequent calls can directly use whatever function it wants to call without having to decide between the safe and unsafe variants. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23t/helper/test-tool: implement sha1-unsafe helperTaylor Blau6-23/+45
With the new "unsafe" SHA-1 build knob, it is convenient to have a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing, similar to 't/helper/test-tool sha1'. Implement that helper by altering the implementation of that test-tool (in cmd_hash_impl(), which is generic and parameterized over different hash functions) to conditionally run the unsafe variants of the chosen hash function, and expose the new behavior via a new 'sha1-unsafe' test helper. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21Merge branch 'ps/the-repository'Junio C Hamano2-3/+10
More code paths have a repository passed through the callchain, instead of assuming the primary the_repository object. * ps/the-repository: match-trees: stop using `the_repository` graph: stop using `the_repository` add-interactive: stop using `the_repository` tmp-objdir: stop using `the_repository` resolve-undo: stop using `the_repository` credential: stop using `the_repository` mailinfo: stop using `the_repository` diagnose: stop using `the_repository` server-info: stop using `the_repository` send-pack: stop using `the_repository` serve: stop using `the_repository` trace: stop using `the_repository` pager: stop using `the_repository` progress: stop using `the_repository`
2025-01-21Merge branch 'ps/reftable-get-random-fix'Junio C Hamano1-1/+1
The code to compute "unique" name used git_rand() which can fail or get stuck; the callsite does not require cryptographic security. Introduce the "insecure" mode and use it appropriately. * ps/reftable-get-random-fix: reftable/stack: accept insecure random bytes wrapper: allow generating insecure random bytes
2025-01-17builtins: send usage_with_options() help text to standard outputJunio C Hamano1-2/+2
Using the show_usage_with_options_if_asked() helper we introduced earlier, fix callers of usage_with_options() that want to show the help text when explicitly asked by the end-user. The help text now goes to the standard output stream for them. The test in t7600 for "git merge -h" may want to be retired, as the same is covered by t0012 already, but it is specifically testing that the "-h" option gets a response even with a corrupt index file, so for now let's leave it there. Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-07wrapper: allow generating insecure random bytesPatrick Steinhardt1-1/+1
The `csprng_bytes()` function generates randomness and writes it into a caller-provided buffer. It abstracts over a couple of implementations, where the exact one that is used depends on the platform. These implementations have different guarantees: while some guarantee to never fail (arc4random(3)), others may fail. There are two significant failures to distinguish from one another: - Systemic failure, where e.g. opening "/dev/urandom" fails or when OpenSSL doesn't have a provider configured. - Entropy failure, where the entropy pool is exhausted, and thus the function cannot guarantee strong cryptographic randomness. While we cannot do anything about the former, the latter failure can be acceptable in some situations where we don't care whether or not the randomness can be predicted. Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt into weak cryptographic randomness. The exact behaviour of the flag depends on the underlying implementation: - `arc4random_buf()` never returns an error, so it doesn't change. - `getrandom()` pulls from "/dev/urandom" by default, which never blocks on modern systems even when the entropy pool is empty. - `getentropy()` seems to block when there is not enough randomness available, and there is no way of changing that behaviour. - `GtlGenRandom()` doesn't mention anything about its specific failure mode. - The fallback reads from "/dev/urandom", which also returns bytes in case the entropy pool is drained in modern Linux systems. That only leaves OpenSSL with `RAND_bytes()`, which returns an error in case the returned data wouldn't be cryptographically safe. This function is replaced with a call to `RAND_pseudo_bytes()`, which can indicate whether or not the returned data is cryptographically secure via its return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag is set, then we ignore the insecurity and return the data regardless. It is somewhat questionable whether we really need the flag in the first place, or whether we wouldn't just ignore the potentially-insecure data. But the risk of doing that is that we might have or grow callsites that aren't aware of the potential insecureness of the data in places where it really matters. So using a flag to opt-in to that behaviour feels like the more secure choice. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27commit-reach: use `size_t` to track indices when computing merge basesPatrick Steinhardt1-3/+3
The functions `repo_get_merge_bases_many()` and friends accepts an array of commits as well as a parameter that indicates how large that array is. This parameter is using a signed integer, which leads to a couple of warnings with -Wsign-compare. Refactor the code to use `size_t` to track indices instead and adapt callers accordingly. While most callers are trivial, there are two callers that require a bit more scrutiny: - builtin/merge-base.c:show_merge_base() subtracts `1` from the `rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if the variable was `0` it would wrap. This code is fine though because its only caller will execute that code only when `argc >= 2`, and it follows that `rev_nr >= 2`, as well. - bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`. Again, there is only a single caller that populates `rev_nr` with `good_revs.nr`. And because a bisection always requires at least one good revision it follws that `rev_nr >= 1`. Mark the file as -Wsign-compare-clean. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano20-29/+30
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-20path-walk: mark trees and blobs as UNINTERESTINGDerrick Stolee1-3/+9
When the input rev_info has UNINTERESTING starting points, we want to be sure that the UNINTERESTING flag is passed appropriately through the objects. To match how this is done in places such as 'git pack-objects', we use the mark_edges_uninteresting() method. This method has an option for using the "sparse" walk, which is similar in spirit to the path-walk API's walk. To be sure to keep it independent, add a new 'prune_all_uninteresting' option to the path_walk_info struct. To check how the UNINTERSTING flag is spread through our objects, extend the 'test-tool path-walk' command to output whether or not an object has that flag. This changes our tests significantly, including the removal of some objects that were previously visited due to the incomplete implementation. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20path-walk: visit tags and cached objectsDerrick Stolee1-2/+13
The rev_info that is specified for a path-walk traversal may specify visiting tag refs (both lightweight and annotated) and also may specify indexed objects (blobs and trees). Update the path-walk API to walk these objects as well. When walking tags, we need to peel the annotated objects until reaching a non-tag object. If we reach a commit, then we can add it to the pending objects to make sure we visit in the commit walk portion. If we reach a tree, then we will assume that it is a root tree. If we reach a blob, then we have no good path name and so add it to a new list of "tagged blobs". When the rev_info includes the "--indexed-objects" flag, then the pending set includes blobs and trees found in the cache entries and cache-tree. The cache entries are usually blobs, though they could be trees in the case of a sparse index. The cache-tree stores previously-hashed tree objects but these are cleared out when staging objects below those paths. We add tests that demonstrate this. The indexed objects come with a non-NULL 'path' value in the pending item. This allows us to prepopulate the 'path_to_lists' strmap with lists for these paths. The tricky thing about this walk is that we will want to combine the indexed objects walk with the commit walk, especially in the future case of walking objects during a command like 'git repack'. Whenever possible, we want the objects from the index to be grouped with similar objects in history. We don't want to miss any paths that appear only in the index and not in the commit history. Thus, we need to be careful to let the path stack be populated initially with only the root tree path (and possibly tags and tagged blobs) and go through the normal depth-first search. Afterwards, if there are other paths that are remaining in the paths_to_lists strmap, we should then iterate through the stack and visit those objects recursively. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20path-walk: allow consumer to specify object typesDerrick Stolee1-2/+13
We add the ability to filter the object types in the path-walk API so the callback function is called fewer times. This adds the ability to ask for the commits in a list, as well. We re-use the empty string for this set of objects because these are passed directly to the callback function instead of being part of the 'path_stack'. Future changes will add the ability to visit annotated tags. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20t6601: add helper for testing path-walk APIDerrick Stolee3-0/+86
Add some tests based on the current behavior, doing interesting checks for different sets of branches, ranges, and the --boundary option. This sets a baseline for the behavior and we can extend it as new options are introduced. Store and output a 'batch_nr' value so we can demonstrate that the paths are grouped together in a batch and not following some other ordering. This allows us to test the depth-first behavior of the path-walk API. However, we purposefully do not test the order of the objects in the batch, so the output is compared to the expected output through a sort. It is important to mention that the behavior of the API will change soon as we start to handle UNINTERESTING objects differently, but these tests will demonstrate the change in behavior. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18serve: stop using `the_repository`Patrick Steinhardt1-2/+5
Stop using `the_repository` in the "serve" subsystem by passing in a repository when advertising capabilities or serving requests. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18progress: stop using `the_repository`Patrick Steinhardt1-1/+5
Stop using `the_repository` in the "progress" subsystem by passing in a repository when initializing `struct progress`. Furthermore, store a pointer to the repository in that struct so that we can pass it to the trace2 API when logging information. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18Merge branch 'ps/build-sign-compare' into ps/the-repositoryJunio C Hamano20-29/+30
* ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-15Merge branch 'ps/build'Junio C Hamano1-0/+91
Build procedure update plus introduction of Meson based builds. * ps/build: (24 commits) Introduce support for the Meson build system Documentation: add comparison of build systems t: allow overriding build dir t: better support for out-of-tree builds Documentation: extract script to generate a list of mergetools Documentation: teach "cmd-list.perl" about out-of-tree builds Documentation: allow sourcing generated includes from separate dir Makefile: simplify building of templates Makefile: write absolute program path into bin-wrappers Makefile: allow "bin-wrappers/" directory to exist Makefile: refactor generators to be PWD-independent Makefile: extract script to generate gitweb.js Makefile: extract script to generate gitweb.cgi Makefile: extract script to massage Python scripts Makefile: extract script to massage Shell scripts Makefile: use "generate-perl.sh" to massage Perl library Makefile: extract script to massage Perl scripts Makefile: consistently use PERL_PATH Makefile: generate doc versions via GIT-VERSION-GEN Makefile: generate "git.rc" via GIT-VERSION-GEN ...
2024-12-13Merge branch 'kn/midx-wo-the-repository'Junio C Hamano1-4/+4
Yet another "pass the repository through the callchain" topic. * kn/midx-wo-the-repository: midx: inline the `MIDX_MIN_SIZE` definition midx: pass down `hash_algo` to functions using global variables midx: pass `repository` to `load_multi_pack_index` midx: cleanup internal usage of `the_repository` and `the_hash_algo` midx-write: pass down repository to `write_midx_file[_only]` write-midx: add repository field to `write_midx_context` midx-write: use `revs->repo` inside `read_refs_snapshot` midx-write: pass down repository to static functions packfile.c: remove unnecessary prepare_packed_git() call midx: add repository to `multi_pack_index` struct config: make `packed_git_(limit|window_size)` non-global variables config: make `delta_base_cache_limit` a non-global variable packfile: pass down repository to `for_each_packed_object` packfile: pass down repository to `has_object[_kept]_pack` packfile: pass down repository to `odb_pack_name` packfile: pass `repository` to static function in the file packfile: use `repository` from `packed_git` directly packfile: add repository to struct `packed_git`
2024-12-10Merge branch 'ps/reftable-detach'Junio C Hamano1-2/+2
Isolates the reftable subsystem from the rest of Git's codebase by using fewer pieces of Git's infrastructure. * ps/reftable-detach: reftable/system: provide thin wrapper for lockfile subsystem reftable/stack: drop only use of `get_locked_file_path()` reftable/system: provide thin wrapper for tempfile subsystem reftable/stack: stop using `fsync_component()` directly reftable/system: stop depending on "hash.h" reftable: explicitly handle hash format IDs reftable/system: move "dir.h" to its only user
2024-12-07Introduce support for the Meson build systemPatrick Steinhardt1-0/+91
Introduce support for the Meson build system, a "modern" meta build system that supports many different platforms, including Linux, macOS, Windows and BSDs. Meson supports different backends, including Ninja, Xcode and Microsoft Visual Studio. Several common IDEs provide an integration with it. The biggest contender compared to Meson is probably CMake as outlined in our "Documentation/technical/build-systems.txt" file. Based on my own personal experience from working with both build systems extensively I strongly favor Meson over CMake. In my opinion, it feels significantly easier to use with a syntax that feels more like a "real" programming language. The second big reason is that Meson supports Rust natively, which may prove to be important given that the project may pick up Rust as another language eventually. Using Meson is rather straight-forward. An example: ``` # Meson uses out-of-tree builds. You can set up multiple build # directories, how you name them is completely up to you. $ mkdir build $ cd build $ meson setup .. -Dprefix=/tmp/git-installation # Build the project. This also provides several other targets like e.g. `install` or `test`. $ ninja # Meson has been wired up to support execution of our test suites. # Both our unit tests and our integration tests are supported. # Running `meson test` without any arguments will execute all tests, # but the syntax supports globbing to select only some tests. $ meson test 't-*' # Execute single test interactively to allow for debugging. $ meson test 't0000-*' --interactive --test-args=-ix ``` The build instructions have been successfully tested on the following systems, tests are passing: - Apple macOS 10.15. - FreeBSD 14.1. - NixOS 24.11. - OpenBSD 7.6. - Ubuntu 24.04. - Windows 10 with Cygwin. - Windows 10 with MinGW64, except for t9700, which is also broken with our Makefile. - Windows 10 with Visual Studio 2022 toolchain, using the Native Tools Command Prompt with `meson setup --vsenv`. Tests pass, except for t9700. - Windows 10 with Visual Studio 2022 solution, using the Native Tools Command Prompt with `meson setup --backend vs2022`. Tests pass, except for t9700. - Windows 10 with VS Code, using the Meson plug-in. It is expected that there will still be rough edges in the current version. If this patch lands the expectation is that it will coexist with our other build systems for a while. Like this, distributions can slowly migrate over to Meson and report any findings they have to us such that we can continue to iterate. A potential cutoff date for other build systems may be Git 3.0. Some notes: - The installed distribution is structured somewhat differently than how it used to be the case. All of our binaries are installed into `$libexec/git-core`, while all binaries part of `$bindir` are now symbolic links pointing to the former. This rule is consistent in itself and thus easier to reason about. - We do not install dashed binaries into `$libexec/git-core` anymore, so there won't e.g. be a symlink for git-add(1). These are not required by modern Git and there isn't really much of a use case for those anymore. By not installing those symlinks we thus start the deprecation of this layout. - We're targeting Meson 1.3.0, which has been released relatively recently November 2023. The only feature we use from that version is `fs.relative_to()`, which we could replace if necessary. If so, we could start to target Meson 1.0.0 and newer, released in December 2022. - The whole build instructions count around 3300 lines, half of which is listing all of our code and test files. Our Makefiles are around 5000 lines, autoconf adds another 1300 lines. CMake in comparison has only 1200 linescode, but it avoids listing individual files and does not wire up auto-configuration as extensively as the Meson instructions do. - We bundle a set of subproject wrappers for curl, expat, openssl, pcre2 and zlib. This allows developers to build Git without these dependencies preinstalled, and Meson will fetch and build them automatically. This is especially helpful on Windows. Helped-by: Eli Schwartz <eschwartz@gentoo.org> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06t/helper: don't depend on implicit wraparoundPatrick Steinhardt2-7/+2
In our test helpers we have two cases where we assign -1 to an `unsigned long`. The intent is to essentially mean "unbounded output", which is achieved via implicit wraparound of the value. This pattern causes warnings with -Wsign-compare though. Adapt it and instead use `ULONG_MAX` explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: trivial conversions to fix `-Wsign-compare` warningsPatrick Steinhardt9-38/+14
We have a bunch of loops which iterate up to an unsigned boundary using a signed index, which generates warnigs because we compare a signed and unsigned value in the loop condition. Address these sites for trivial cases and enable `-Wsign-compare` warnings for these code units. This patch only adapts those code units where we can drop the `DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06diff.h: fix index used to loop through unsigned integerPatrick Steinhardt1-1/+0
The `struct diff_flags` structure is essentially an array of flags, all of which have the same type. We can thus use `sizeof()` to iterate through all of the flags, which we do in `diff_flags_or()`. But while the statement returns an unsigned integer, we used a signed integer to iterate through the flags, which generates a warning. Fix this by using `size_t` for the index instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt21-0/+31
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04midx: pass `repository` to `load_multi_pack_index`Karthik Nayak1-4/+4
The `load_multi_pack_index` function in midx uses `the_repository` variable to access the `repository` struct. Modify the function and its callee's to send the `repository` field. This moves usage of `the_repository` to the `test-read-midx.c` file. While that is not optimal, it is okay, since the upcoming commits will slowly move the usage of `the_repository` up the layers and remove it eventually. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04Merge branch 'kn/pass-repo-to-builtin-sub-sub-commands'Junio C Hamano1-3/+5
Built-in Git subcommands are supplied the repository object to work with; they learned to do the same when they invoke sub-subcommands. * kn/pass-repo-to-builtin-sub-sub-commands: builtin: pass repository to sub commands
2024-12-04Merge branch 'ps/ref-backend-migration-optim'Junio C Hamano1-1/+1
The migration procedure between two ref backends has been optimized. * ps/ref-backend-migration-optim: reftable: rename scratch buffer refs: adapt `initial_transaction` flag to be unsigned reftable/block: optimize allocations by using scratch buffer reftable/block: rename `block_writer::buf` variable reftable/writer: optimize allocations by using a scratch buffer refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG` refs: skip collision checks in initial transactions refs: use "initial" transaction semantics to migrate refs refs/files: support symbolic and root refs in initial transaction refs: introduce "initial" transaction flag refs/files: move logic to commit initial transaction refs: allow passing flags when setting up a transaction
2024-11-26builtin: pass repository to sub commandsKarthik Nayak1-3/+5
In 9b1cb5070f (builtin: add a repository parameter for builtin functions, 2024-09-13) the repository was passed down to all builtin commands. This allowed the repository to be passed down to lower layers without depending on the global `the_repository` variable. Continue this work by also passing down the repository parameter from the command to sub-commands. This will help pass down the repository to other subsystems and cleanup usage of global variables like 'the_repository' and 'the_hash_algo'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21t/helper: fix leaking commit graph in "read-graph" subcommandPatrick Steinhardt1-2/+1
We're leaking the commit-graph in the "test-helper read-graph" subcommand, but as the leak is annotated with `UNLEAK()` the leak sanitizer doesn't complain. Fix the leak by calling `free_commit_graph()`. Besides getting rid of the `UNLEAK()` annotation, it also increases code coverage because we properly release resources as Git would do it, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs: skip collision checks in initial transactionsPatrick Steinhardt1-1/+1
Reference transactions use `refs_verify_refname_available()` to check for colliding references. This check consists of two parts: - Checks for whether multiple ref updates in the same transaction conflict with each other. - Checks for whether existing refs conflict with any refs part of the transaction. While we generally cannot avoid the first check, the second check is superfluous in cases where the transaction is an initial one in an otherwise empty ref store. The check results in multiple ref reads as well as the creation of a ref iterator for every ref we're checking, which adds up quite fast when performing the check for many refs. Introduce a new flag that allows us to skip this check and wire it up in such that the backends pass it when running an initial transaction. This leads to significant speedups when migrating ref storage backends. From "files" to "reftable": Benchmark 1: migrate files:reftable (refcount = 100000, revision = HEAD~) Time (mean ± σ): 472.4 ms ± 6.7 ms [User: 175.9 ms, System: 285.2 ms] Range (min … max): 463.5 ms … 483.2 ms 10 runs Benchmark 2: migrate files:reftable (refcount = 100000, revision = HEAD) Time (mean ± σ): 86.1 ms ± 1.9 ms [User: 67.9 ms, System: 16.0 ms] Range (min … max): 82.9 ms … 90.9 ms 29 runs Summary migrate files:reftable (refcount = 100000, revision = HEAD) ran 5.48 ± 0.15 times faster than migrate files:reftable (refcount = 100000, revision = HEAD~) And from "reftable" to "files": Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~) Time (mean ± σ): 452.7 ms ± 3.4 ms [User: 209.9 ms, System: 235.4 ms] Range (min … max): 445.9 ms … 457.5 ms 10 runs Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD) Time (mean ± σ): 95.2 ms ± 2.2 ms [User: 73.6 ms, System: 20.6 ms] Range (min … max): 91.7 ms … 100.8 ms 28 runs Summary migrate reftable:files (refcount = 100000, revision = HEAD) ran 4.76 ± 0.11 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19reftable/system: stop depending on "hash.h"Patrick Steinhardt1-2/+2
We include "hash.h" in "reftable/system.h" such that we can use hash format IDs as well as the raw size of SHA1 and SHA256. As we are in the process of converting the reftable library to become standalone we of course cannot rely on those constants anymore. Introduce a new `enum reftable_hash` to replace internal uses of the hash format IDs and new constants that replace internal uses of the hash size. Adapt the reftable backend to set up the correct hash function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-13Merge branch 'ps/leakfixes-part-9'Junio C Hamano3-2/+12
More leakfixes. * ps/leakfixes-part-9: (22 commits) list-objects-filter-options: work around reported leak on error builtin/merge: release output buffer after performing merge dir: fix leak when parsing "status.showUntrackedFiles" t/helper: fix leaking buffer in "dump-untracked-cache" t/helper: stop re-initialization of `the_repository` sparse-index: correctly free EWAH contents dir: release untracked cache data combine-diff: fix leaking lost lines builtin/tag: fix leaking key ID on failure to sign transport-helper: fix leaking import/export marks builtin/commit: fix leaking cleanup config trailer: fix leaking strbufs when formatting trailers trailer: fix leaking trailer values builtin/commit: fix leaking change data contents upload-pack: fix leaking URI protocols pretty: clear signature check diff-lib: fix leaking diffopts in `do_diff_cache()` revision: fix leaking bloom filters builtin/grep: fix leak with `--max-count=0` grep: fix leak in `grep_splice_or()` ...
2024-11-07Merge branch 'ps/leakfixes-part-9' into ps/leakfixes-part-10Junio C Hamano3-2/+12
* ps/leakfixes-part-9: (22 commits) list-objects-filter-options: work around reported leak on error builtin/merge: release output buffer after performing merge dir: fix leak when parsing "status.showUntrackedFiles" t/helper: fix leaking buffer in "dump-untracked-cache" t/helper: stop re-initialization of `the_repository` sparse-index: correctly free EWAH contents dir: release untracked cache data combine-diff: fix leaking lost lines builtin/tag: fix leaking key ID on failure to sign transport-helper: fix leaking import/export marks builtin/commit: fix leaking cleanup config trailer: fix leaking strbufs when formatting trailers trailer: fix leaking trailer values builtin/commit: fix leaking change data contents upload-pack: fix leaking URI protocols pretty: clear signature check diff-lib: fix leaking diffopts in `do_diff_cache()` revision: fix leaking bloom filters builtin/grep: fix leak with `--max-count=0` grep: fix leak in `grep_splice_or()` ...
2024-11-04t/helper: fix leaking buffer in "dump-untracked-cache"Patrick Steinhardt1-0/+2
We never release the local `struct strbuf base` buffer, thus leaking memory. Fix this leak. This leak is exposed by t7063, but plugging it alone does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04t/helper: stop re-initialization of `the_repository`Patrick Steinhardt1-2/+0
While "common-main.c" already initializes `the_repository` for us, we do so a second time in the "read-cache" test helper. This causes a memory leak because the old repository's contents isn't released. Stop calling `initialize_repository()` to plug this leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04t/helper: fix leaks in "reach" test toolPatrick Steinhardt1-0/+10
The "reach" test tool doesn't bother to clean up any of its allocated resources, causing various leaks. Plug them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-25packfile: use object_id in find_pack_entry_one()Jeff King1-1/+1
The main function we use to search a pack index for an object is find_pack_entry_one(). That function still takes a bare pointer to the hash, despite the fact that its underlying bsearch_pack() function needs an object_id struct. And so we end up making an extra copy of the hash into the struct just to do a lookup. As it turns out, all callers but one already have such an object_id. So we can just take a pointer to that struct and use it directly. This avoids the extra copy and provides a more type-safe interface. The one exception is get_delta_base() in packfile.c, when we are chasing a REF_DELTA from inside the pack (and thus we have a pointer directly to the mmap'd pack memory, not a struct). We can just bump the hashcpy() from inside find_pack_entry_one() to this one caller that needs it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25Merge branch 'ak/typofixes'Taylor Blau1-1/+1
Typofixes. * ak/typofixes: t: fix typos t/helper: fix a typo t/perf: fix typos t/unit-tests: fix typos contrib: fix typos compat: fix typos
2024-10-10Merge branch 'ps/leakfixes-part-8'Junio C Hamano2-0/+9
More leakfixes. * ps/leakfixes-part-8: (23 commits) builtin/send-pack: fix leaking list of push options remote: fix leaking push reports t/helper: fix leaks in proc-receive helper pack-write: fix return parameter of `write_rev_file_order()` revision: fix leaking saved parents revision: fix memory leaks when rewriting parents midx-write: fix leaking buffer pack-bitmap-write: fix leaking OID array pseudo-merge: fix leaking strmap keys pseudo-merge: fix various memory leaks line-log: fix several memory leaks diff: improve lifecycle management of diff queues builtin/revert: fix leaking `gpg_sign` and `strategy` config t/helper: fix leaking repository in partial-clone helper builtin/clone: fix leaking repo state when cloning with bundle URIs builtin/pack-redundant: fix various memory leaks builtin/stash: fix leaking `pathspec_from_file` submodule: fix leaking submodule entry list wt-status: fix leaking buffer with sparse directories shell: fix leaking strings ...
2024-10-10Merge branch 'ps/reftable-alloc-failures'Junio C Hamano1-2/+8
The reftable library is now prepared to expect that the memory allocation function given to it may fail to allocate and to deal with such an error. * ps/reftable-alloc-failures: (26 commits) reftable/basics: fix segfault when growing `names` array fails reftable/basics: ban standard allocator functions reftable: introduce `REFTABLE_FREE_AND_NULL()` reftable: fix calls to free(3P) reftable: handle trivial allocation failures reftable/tree: handle allocation failures reftable/pq: handle allocation failures when adding entries reftable/block: handle allocation failures reftable/blocksource: handle allocation failures reftable/iter: handle allocation failures when creating indexed table iter reftable/stack: handle allocation failures in auto compaction reftable/stack: handle allocation failures in `stack_compact_range()` reftable/stack: handle allocation failures in `reftable_new_stack()` reftable/stack: handle allocation failures on reload reftable/reader: handle allocation failures in `reader_init_iter()` reftable/reader: handle allocation failures for unindexed reader reftable/merged: handle allocation failures in `merged_table_init_iter()` reftable/writer: handle allocation failures in `reftable_new_writer()` reftable/writer: handle allocation failures in `writer_index_hash()` reftable/record: handle allocation failures when decoding records ...
2024-10-10t/helper: fix a typoAndrew Kreimer1-1/+1
Fix a typo in comments: bellow -> below. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-02reftable/merged: handle allocation failures in `merged_table_init_iter()`Patrick Steinhardt1-2/+8
Handle allocation failures in `merged_table_init_iter()`. While at it, merge `merged_iter_init()` into the function. It only has a single caller and merging them makes it easier to handle allocation failures consistently. This change also requires us to adapt `reftable_stack_init_*_iterator()` to bubble up the new error codes of `merged_table_iter_init()`. Adapt callsites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-02Merge branch 'ps/leakfixes-part-7'Junio C Hamano1-1/+1
More leak-fixes. * ps/leakfixes-part-7: (23 commits) diffcore-break: fix leaking filespecs when merging broken pairs revision: fix leaking parents when simplifying commits builtin/maintenance: fix leak in `get_schedule_cmd()` builtin/maintenance: fix leaking config string promisor-remote: fix leaking partial clone filter grep: fix leaking grep pattern submodule: fix leaking submodule ODB paths trace2: destroy context stored in thread-local storage builtin/difftool: plug several trivial memory leaks builtin/repack: fix leaking configuration diffcore-order: fix leaking buffer when parsing orderfiles parse-options: free previous value of `OPTION_FILENAME` diff: fix leaking orderfile option builtin/pull: fix leaking "ff" option dir: fix off by one errors for ignored and untracked entries builtin/submodule--helper: fix leaking remote ref on errors t/helper: fix leaking subrepo in nested submodule config helper builtin/submodule--helper: fix leaking error buffer builtin/submodule--helper: clear child process when not running it submodule: fix leaking update strategy ...
2024-09-30t/helper: fix leaks in proc-receive helperPatrick Steinhardt1-0/+7
Fix trivial leaks in the proc-receive helpe. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30t/helper: fix leaking repository in partial-clone helperPatrick Steinhardt1-0/+2
We initialize but never clear a repository in the partial-clone test helper. Plug this leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30Merge branch 'ps/leakfixes-part-7' into ps/leakfixes-part-8Junio C Hamano1-1/+1
* ps/leakfixes-part-7: (23 commits) diffcore-break: fix leaking filespecs when merging broken pairs revision: fix leaking parents when simplifying commits builtin/maintenance: fix leak in `get_schedule_cmd()` builtin/maintenance: fix leaking config string promisor-remote: fix leaking partial clone filter grep: fix leaking grep pattern submodule: fix leaking submodule ODB paths trace2: destroy context stored in thread-local storage builtin/difftool: plug several trivial memory leaks builtin/repack: fix leaking configuration diffcore-order: fix leaking buffer when parsing orderfiles parse-options: free previous value of `OPTION_FILENAME` diff: fix leaking orderfile option builtin/pull: fix leaking "ff" option dir: fix off by one errors for ignored and untracked entries builtin/submodule--helper: fix leaking remote ref on errors t/helper: fix leaking subrepo in nested submodule config helper builtin/submodule--helper: fix leaking error buffer builtin/submodule--helper: clear child process when not running it submodule: fix leaking update strategy ...
2024-09-27t/helper: fix leaking subrepo in nested submodule config helperPatrick Steinhardt1-1/+1
In the "submodule-nested-repo-config" helper we create a submodule repository and print its configuration. We do not clear the repo, causing a memory leak. Plug it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-23Merge branch 'ps/environ-wo-the-repository'Junio C Hamano2-1/+4
Code clean-up. * ps/environ-wo-the-repository: (21 commits) environment: stop storing "core.notesRef" globally environment: stop storing "core.warnAmbiguousRefs" globally environment: stop storing "core.preferSymlinkRefs" globally environment: stop storing "core.logAllRefUpdates" globally refs: stop modifying global `log_all_ref_updates` variable branch: stop modifying `log_all_ref_updates` variable repo-settings: track defaults close to `struct repo_settings` repo-settings: split out declarations into a standalone header environment: guard state depending on a repository environment: reorder header to split out `the_repository`-free section environment: move `set_git_dir()` and related into setup layer environment: make `get_git_namespace()` self-contained environment: move object database functions into object layer config: make dependency on repo in `read_early_config()` explicit config: document `read_early_config()` and `read_very_early_config()` environment: make `get_git_work_tree()` accept a repository environment: make `get_graft_file()` accept a repository environment: make `get_index_file()` accept a repository environment: make `get_object_directory()` accept a repository environment: make `get_git_common_dir()` accept a repository ...
2024-09-16Merge branch 'cp/unit-test-reftable-stack'Junio C Hamano3-11/+1
Another reftable test migrated to the unit-test framework. * cp/unit-test-reftable-stack: t-reftable-stack: add test for stack iterators t-reftable-stack: add test for non-default compaction factor t-reftable-stack: use reftable_ref_record_equal() to compare ref records t-reftable-stack: use Git's tempfile API instead of mkstemp() t: harmonize t-reftable-stack.c with coding guidelines t: move reftable/stack_test.c to the unit testing framework
2024-09-16Merge branch 'cp/unit-test-reftable-stack' into ps/reftable-alloc-failuresJunio C Hamano3-11/+1
* cp/unit-test-reftable-stack: t-reftable-stack: add test for stack iterators t-reftable-stack: add test for non-default compaction factor t-reftable-stack: use reftable_ref_record_equal() to compare ref records t-reftable-stack: use Git's tempfile API instead of mkstemp() t: harmonize t-reftable-stack.c with coding guidelines t: move reftable/stack_test.c to the unit testing framework
2024-09-12Merge branch 'jk/messages-with-excess-lf-fix'Junio C Hamano3-10/+10
One-line messages to "die" and other helper functions will get LF added by these helper functions, but many existing messages had an unnecessary LF at the end, which have been corrected. * jk/messages-with-excess-lf-fix: drop trailing newline from warning/error/die messages
2024-09-12Merge branch 'gt/unit-test-oid-array'Junio C Hamano3-51/+0
Another unit-test. * gt/unit-test-oid-array: t: port helper/test-oid-array.c to unit-tests/t-oid-array.c
2024-09-12environment: guard state depending on a repositoryPatrick Steinhardt1-0/+2
In "environment.h" we have quite a lot of functions and variables that either explicitly or implicitly depend on `the_repository`. The implicit set of stateful declarations includes for example variables which get populated when parsing a repository's Git configuration. This set of variables is broken by design, as their state often depends on the last repository config that has been parsed. So they may or may not represent the state of `the_repository`. Fixing that is quite a big undertaking, and later patches in this series will demonstrate a solution for a first small set of those variables. So for now, let's guard these with `USE_THE_REPOSITORY_VARIABLE` so that callers are aware of the implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12config: make dependency on repo in `read_early_config()` explicitPatrick Steinhardt1-1/+2
The `read_early_config()` function can be used to read configuration where a repository has not yet been set up. As such, it is optional whether or not `the_repository` has already been initialized. If it was initialized we use its commondir and gitdir. If not, the function will try to detect the Git directories by itself and, if found, also parse their config files. This means that we implicitly rely on `the_repository`. Make this dependency explicit by passing a `struct repository`. This allows us to again drop the `USE_THE_REPOSITORY_VARIABLE` define in "config.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-08t: move reftable/stack_test.c to the unit testing frameworkChandra Pratap3-11/+1
reftable/stack_test.c exercises the functions defined in reftable/stack.{c, h}. Migrate reftable/stack_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework and renaming the tests to be in-line with unit-tests' standards. Since some of the tests use set_test_hash() defined by reftable/test_framework.{c, h} but these files are not '#included' in the test file, copy this function in the ported test file. With the migration of stack test to the unit-tests framework, "test-tool reftable" becomes a no-op. Hence, get rid of everything that uses "test-tool reftable" alongside everything that is used to implement it. While at it, alphabetically sort the cmds[] list in helper/test-tool.c by moving the entry for "dump-reftable". Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05drop trailing newline from warning/error/die messagesJeff King3-10/+10
Our error reporting routines append a trailing newline, and the strings we pass to them should not include them (otherwise we get an extra blank line after the message). These cases were all found by looking at the results of: git grep -P '[^_](error|error_errno|warning|die|die_errno)\(.*\\n"[,)]' '*.c' Note that we _do_ sometimes include a newline in the middle of such messages, to create multiline output (hence our grep matching "," or ")" after we see the newline, so we know we're at the end of the string). It's possible that one or more of these cases could intentionally be including a blank line at the end, but having looked at them all manually, I think these are all just mistakes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-03Merge branch 'ps/reftable-concurrent-compaction'Junio C Hamano1-2/+2
The code path for compacting reftable files saw some bugfixes against concurrent operation. * ps/reftable-concurrent-compaction: reftable/stack: fix segfault when reload with reused readers fails reftable/stack: reorder swapping in the reloaded stack contents reftable/reader: keep readers alive during iteration reftable/reader: introduce refcounting reftable/stack: fix broken refnames in `write_n_ref_tables()` reftable/reader: inline `reader_close()` reftable/reader: inline `init_reader()` reftable/reader: rename `reftable_new_reader()` reftable/stack: inline `stack_compact_range_stats()` reftable/blocksource: drop malloc block source
2024-09-03Merge branch 'ps/leakfixes-part-5'Junio C Hamano1-1/+7
Even more leak fixes. * ps/leakfixes-part-5: transport: fix leaking negotiation tips transport: fix leaking arguments when fetching from bundle builtin/fetch: fix leaking transaction with `--atomic` remote: fix leaking peer ref when expanding refmap remote: fix leaks when matching refspecs remote: fix leaking config strings builtin/fetch-pack: fix leaking refs sideband: fix leaks when configuring sideband colors builtin/send-pack: fix leaking refspecs transport: fix leaking OID arrays in git:// transport data t/helper: fix leaking multi-pack-indices in "read-midx" builtin/repack: fix leaks when computing packs to repack midx-write: fix leaking hashfile on error cases builtin/archive: fix leaking `OPT_FILENAME()` value builtin/upload-archive: fix leaking args passed to `write_archive()` builtin/merge-tree: fix leaking `-X` strategy options pretty: fix leaking key/value separator buffer pretty: fix memory leaks when parsing pretty formats convert: fix leaks when resetting attributes mailinfo: fix leaking header data
2024-09-01t: port helper/test-oid-array.c to unit-tests/t-oid-array.cGhanshyam Thakkar3-51/+0
helper/test-oid-array.c along with t0064-oid-array.sh test the oid-array.h API, which provides storage and processing efficiency over large lists of object identifiers. Migrate them to the unit testing framework for better runtime performance and efficiency. As we don't initialize a repository in these tests, the hash algo that functions like oid_array_lookup() use is not initialized, therefore call repo_set_hash_algo() to initialize it. And init_hash_algo():lib-oid.c can aid in this process, so make it public. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-29Merge branch 'cp/unit-test-reftable-block'Junio C Hamano1-1/+0
Another test for reftable library ported to the unit test framework. * cp/unit-test-reftable-block: t-reftable-block: mark unused argv/argc t-reftable-block: add tests for index blocks t-reftable-block: add tests for obj blocks t-reftable-block: add tests for log blocks t-reftable-block: remove unnecessary variable 'j' t-reftable-block: use xstrfmt() instead of xstrdup() t-reftable-block: use block_iter_reset() instead of block_iter_close() t-reftable-block: use reftable_record_key() instead of strbuf_addstr() t-reftable-block: use reftable_record_equal() instead of check_str() t-reftable-block: release used block reader t: harmonize t-reftable-block.c with coding guidelines t: move reftable/block_test.c to the unit testing framework
2024-08-29Merge branch 'ps/reftable-drop-generic'Junio C Hamano1-1/+188
The code in the reftable library has been cleaned up by discarding unused "generic" interface. * ps/reftable-drop-generic: reftable: mark unused parameters in empty iterator functions reftable/generic: drop interface t/helper: refactor to not use `struct reftable_table` t/helper: use `hash_to_hex_algop()` to print hashes t/helper: inline printing of reftable records t/helper: inline `reftable_table_print()` t/helper: inline `reftable_stack_print_directory()` t/helper: inline `reftable_reader_print_file()` t/helper: inline `reftable_dump_main()` reftable/dump: drop unused `compact_stack()` reftable/generic: move generic iterator code into iterator interface reftable/iter: drop double-checking logic reftable/stack: open-code reading refs reftable/merged: stop using generic tables in the merged table reftable/merged: rename `reftable_new_merged_table()` reftable/merged: expose functions to initialize iterators
2024-08-28Merge branch 'gt/unit-test-urlmatch-normalization'Junio C Hamano3-58/+0
Another rewrite of test. * gt/unit-test-urlmatch-normalization: t: migrate t0110-urlmatch-normalization to the new framework
2024-08-26Merge branch 'ds/for-each-ref-is-base'Junio C Hamano1-0/+2
'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. * ds/for-each-ref-is-base: p1500: add is-base performance tests for-each-ref: add 'is-base' token commit: add gentle reference lookup method commit-reach: add get_branch_base_for_tip
2024-08-26Merge branch 'jk/mark-unused-parameters'Junio C Hamano3-3/+3
Mark unused parameters as UNUSED to squelch -Wunused warnings. * jk/mark-unused-parameters: t-hashmap: stop calling setup() for t_intern() test scalar: mark unused parameters in dummy function daemon: mark unused parameters in non-posix fallbacks setup: mark unused parameter in config callback test-mergesort: mark unused parameters in trivial callback t-hashmap: mark unused parameters in callback function reftable: mark unused parameters in virtual functions reftable: drop obsolete test function declarations reftable: ignore unused argc/argv in test functions unit-tests: ignore unused argc/argv t/helper: mark more unused argv/argc arguments oss-fuzz: mark unused argv/argc argument refs: mark unused parameters in do_for_each_reflog_helper() refs: mark unused parameters in ref_store fsck callbacks update-ref: mark more unused parameters in parser callbacks imap-send: mark unused parameter in ssl_socket_connect() fallback
2024-08-23Merge branch 'cp/unit-test-reftable-readwrite'Junio C Hamano1-1/+0
* cp/unit-test-reftable-readwrite: t-reftable-readwrite: add test for known error t-reftable-readwrite: use 'for' in place of infinite 'while' loops t-reftable-readwrite: use free_names() instead of a for loop t: move reftable/readwrite_test.c to the unit testing framework
2024-08-23Merge branch 'ps/config-wo-the-repository'Junio C Hamano3-0/+6
Use of API functions that implicitly depend on the_repository object in the config subsystem has been rewritten to pass a repository object through the callchain. * ps/config-wo-the-repository: config: hide functions using `the_repository` by default global: prepare for hiding away repo-less config functions config: don't depend on `the_repository` with branch conditions config: don't have setters depend on `the_repository` config: pass repo to functions that rename or copy sections config: pass repo to `git_die_config()` config: pass repo to `git_config_get_expiry_in_days()` config: pass repo to `git_config_get_expiry()` config: pass repo to `git_config_get_max_percent_split_change()` config: pass repo to `git_config_get_split_index()` config: pass repo to `git_config_get_index_threads()` config: expose `repo_config_clear()` config: introduce missing setters that take repo as parameter path: hide functions using `the_repository` by default path: stop relying on `the_repository` in `worktree_git_path()` path: stop relying on `the_repository` when reporting garbage hooks: remove implicit dependency on `the_repository` editor: do not rely on `the_repository` for interactive edits path: expose `do_git_common_path()` as `repo_common_pathv()` path: expose `do_git_path()` as `repo_git_pathv()`
2024-08-23reftable/reader: introduce refcountingPatrick Steinhardt1-1/+1
It was recently reported that concurrent reads and writes may cause the reftable backend to segfault. The root cause of this is that we do not properly keep track of reftable readers across reloads. Suppose that you have a reftable iterator and then decide to reload the stack while iterating through the iterator. When the stack has been rewritten since we have created the iterator, then we would end up discarding a subset of readers that may still be in use by the iterator. The consequence is that we now try to reference deallocated memory, which of course segfaults. One way to trigger this is in t5616, where some background maintenance jobs have been leaking from one test into another. This leads to stack traces like the following one: + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin AddressSanitizer:DEADLYSIGNAL ================================================================= ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0) ==657994==The signal is caused by a READ memory access. #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29 #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170 #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194 #3 0x55f23e54e72e in block_iter_next reftable/block.c:398 #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240 #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355 #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339 #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69 #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123 #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172 #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175 #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464 #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13 #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452 #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623 #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659 #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133 #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432 #18 0x55f23dba7764 in run_builtin git.c:484 #19 0x55f23dba7764 in handle_builtin git.c:741 #20 0x55f23dbab61e in run_argv git.c:805 #21 0x55f23dbab61e in cmd_main git.c:1000 #22 0x55f23dba4781 in main common-main.c:64 #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360 #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27) While it is somewhat awkward that the maintenance processes survive tests in the first place, it is totally expected that reftables should work alright with concurrent writers. Seemingly they don't. The only underlying resource that we need to care about in this context is the reftable reader, which is responsible for reading a single table from disk. These readers get discarded immediately (unless reused) when calling `reftable_stack_reload()`, which is wrong. We can only close them once we know that there are no iterators using them anymore. Prepare for a fix by converting the reftable readers to be refcounted. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23reftable/reader: rename `reftable_new_reader()`Patrick Steinhardt1-1/+1
Rename the `reftable_new_reader()` function to `reftable_reader_new()` to match our coding guidelines. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22t/helper: fix leaking multi-pack-indices in "read-midx"Patrick Steinhardt1-1/+7
Several of the subcommands of `test-helper read-midx` do not close the MIDX that they have opened, leading to memory leaks. Fix those. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22t/helper: refactor to not use `struct reftable_table`Patrick Steinhardt1-15/+18
The `struct reftable_table` interface in our "reftable" test helper gets used such that we can easily print either a single table, or a merged stack. This generic interface is about to go away. Prepare the code for this change by using merged tables instead. When printing the stack we've already got one. When using a single table, we can create a merged table from it to adapt. This removes the last user of the generic interface. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22t/helper: use `hash_to_hex_algop()` to print hashesPatrick Steinhardt1-37/+11
The "reftable" test helper uses a hand-crafted version to convert from a raw hash to its hex variant. This was done because this code used to be part of the reftable library, where we do not use most functions from the Git core. Now that the code is integrated into the "dump-reftable" helper though, that limitation went away. Let's thus use `hash_to_hex_algop()` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22t/helper: inline printing of reftable recordsPatrick Steinhardt1-8/+69
Move printing of reftable records into the "dump-reftable" helper. This follows the same reasoning as the preceding commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22t/helper: inline `reftable_table_print()`Patrick Steinhardt1-2/+50
Move `reftable_table_print()` into the "dump-reftable" helper. This follows the same reasoning as the preceding commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22t/helper: inline `reftable_stack_print_directory()`Patrick Steinhardt1-1/+22
Move `reftable_stack_print_directory()` into the "dump-reftable" helper. This follows the same reasoning as the preceding commit. Note that this requires us to remove the tests for this functionality in `reftable/stack_test.c`. The test does not really add much anyway, because all it verifies is that we do not crash or run into an error, and it specifically doesn't check the outputted data. Also, as the code is now part of the test helper, it doesn't make much sense to have a unit test for it in the first place. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22t/helper: inline `reftable_reader_print_file()`Patrick Steinhardt1-1/+22
Move `reftable_reader_print_file()` into the "dump-reftable" helper. This follows the same reasoning as the preceding commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22t/helper: inline `reftable_dump_main()`Patrick Steinhardt1-1/+60
The printing functionality part of `reftable/dump.c` is really only used by our "dump-reftable" test helper. It is certainly not generic logic that is useful to anybody outside of Git, and the format it generates is quite specific. Still, parts of it are used in our test suite and the output may be useful to take a peek into reftable stacks, tables and blocks. So while it does not make sense to expose this as part of the reftable library, it does make sense to keep it around. Inline the `reftable_dump_main()` function into the "dump-reftable" test helper. This clarifies that its format is subject to change and not part of our public interface. Furthermore, this allows us to iterate on the implementation in subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-21t: move reftable/block_test.c to the unit testing frameworkChandra Pratap1-1/+0
reftable/block_test.c exercises the functions defined in reftable/block.{c, h}. Migrate reftable/block_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework and renaming the tests to follow the unit-tests' naming conventions. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-20t: migrate t0110-urlmatch-normalization to the new frameworkGhanshyam Thakkar3-58/+0
helper/test-urlmatch-normalization along with t0110-urlmatch-normalization test the `url_normalize()` function from 'urlmatch.h'. Migrate them to the unit testing framework for better performance. And also add different test_msg()s for better debugging. In the migration, last two of the checks from `t_url_general_escape()` were slightly changed compared to the shell script. This involves changing '\'' -> ' '\!' -> ! in the urls of those checks. This is because in C strings, we don't need to escape "'" and "!". Other than these two, all the urls were pasted verbatim from the shell script. Another change is the removal of a MINGW prerequisite from one of the test. It was there because[1] on Windows, the command line is a Unicode string, it is not possible to pass arbitrary bytes to a program. But in unit tests we don't have this limitation. And since we can construct strings with arbitrary bytes in C, let's also remove the test files which contain URLs with arbitrary bytes in the 't/t0110' directory and instead embed those URLs in the unit test code itself. [1]: https://lore.kernel.org/git/53CAC8EF.6020707@gmail.com/ Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-19Merge branch 'tb/incremental-midx-part-1'Junio C Hamano1-7/+17
Incremental updates of multi-pack index files. * tb/incremental-midx-part-1: midx: implement support for writing incremental MIDX chains t/t5313-pack-bounds-checks.sh: prepare for sub-directories t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' midx: implement verification support for incremental MIDXs midx: support reading incremental MIDX chains midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs midx: teach `midx_preferred_pack()` about incremental MIDXs midx: teach `midx_contains_pack()` about incremental MIDXs midx: remove unused `midx_locate_pack()` midx: teach `fill_midx_entry()` about incremental MIDXs midx: teach `nth_midxed_offset()` about incremental MIDXs midx: teach `bsearch_midx()` about incremental MIDXs midx: introduce `bsearch_one_midx()` midx: teach `nth_bitmapped_pack()` about incremental MIDXs midx: teach `nth_midxed_object_oid()` about incremental MIDXs midx: teach `prepare_midx_pack()` about incremental MIDXs midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs midx: add new fields for incremental MIDX chains Documentation: describe incremental MIDX format
2024-08-19Merge branch 'rs/unit-tests-test-run'Junio C Hamano1-0/+35
Unit-test framework has learned a simple control structure to allow embedding test statements in-line instead of having to create a new function to contain them. * rs/unit-tests-test-run: t-strvec: use if_test t-reftable-basics: use if_test t-ctype: use if_test unit-tests: add if_test unit-tests: show location of checks outside of tests t0080: use here-doc test body
2024-08-17test-mergesort: mark unused parameters in trivial callbackJeff King1-1/+1
The mode_copy() function does nothing, but since it's used as a function pointer within "struct mode", it has to conform to the interface. Mark it to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-17t/helper: mark more unused argv/argc argumentsJeff King2-2/+2
This is a continuation of 126e3b3d2a (t/helper: mark unused argv/argc arguments, 2023-03-28) to cover a few new cases: - test-example-tap was added since that commit - test-hashmap used to accept the "ignorecase" argument on the command line. But since most of its logic was moved to a unit-test in 3469a23659 (t: port helper/test-hashmap.c to unit-tests/t-hashmap.c, 2024-08-03), it now ignores its argv entirely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-15Merge branch 'jc/refs-symref-referent'Junio C Hamano1-1/+1
The refs API has been taught to give symref target information to the users of ref iterators, allowing for-each-ref and friends to avoid an extra ref_resolve_* API call per a symbolic ref. * jc/refs-symref-referent: ref-filter: populate symref from iterator refs: add referent to each_ref_fn refs: keep track of unresolved reference value in iterators
2024-08-15Merge branch 'gt/unit-test-hashmap'Junio C Hamano1-97/+1
An existing test of hashmap API has been rewritten with the unit-test framework. * gt/unit-test-hashmap: t: port helper/test-hashmap.c to unit-tests/t-hashmap.c
2024-08-14Merge branch 'cp/unit-test-reftable-tree'Junio C Hamano1-1/+0
A test in reftable library has been rewritten using the unit test framework. * cp/unit-test-reftable-tree: t-reftable-tree: improve the test for infix_walk() t-reftable-tree: add test for non-existent key t-reftable-tree: split test_tree() into two sub-test functions t: move reftable/tree_test.c to the unit testing framework reftable: remove unnecessary curly braces in reftable/tree.c
2024-08-14Merge branch 'kl/test-fixes'Junio C Hamano2-0/+3
A flakey test and incorrect calls to strtoX() functions have been fixed. * kl/test-fixes: t6421: fix test to work when repo dir contains d0 set errno=0 before strtoX calls
2024-08-14Merge branch 'cp/unit-test-reftable-pq'Junio C Hamano1-1/+0
The tests for "pq" part of reftable library got rewritten to use the unit test framework. * cp/unit-test-reftable-pq: t-reftable-pq: add tests for merged_iter_pqueue_top() t-reftable-pq: add test for index based comparison t-reftable-pq: make merged_iter_pqueue_check() callable by reference t-reftable-pq: make merged_iter_pqueue_check() static t: move reftable/pq_test.c to the unit testing framework reftable: change the type of array indices to 'size_t' in reftable/pq.c reftable: remove unnecessary curly braces in reftable/pq.c
2024-08-14Merge branch 'ps/leakfixes-part-3'Junio C Hamano1-2/+2
More leakfixes. * ps/leakfixes-part-3: (24 commits) commit-reach: fix trivial memory leak when computing reachability convert: fix leaking config strings entry: fix leaking pathnames during delayed checkout object-name: fix leaking commit list items t/test-repository: fix leaking repository builtin/credential-cache: fix trivial leaks builtin/worktree: fix leaking derived branch names builtin/shortlog: fix various trivial memory leaks builtin/rerere: fix various trivial memory leaks builtin/credential-store: fix leaking credential builtin/show-branch: fix several memory leaks builtin/rev-parse: fix memory leak with `--parseopt` builtin/stash: fix various trivial memory leaks builtin/remote: fix various trivial memory leaks builtin/remote: fix leaking strings in `branch_list` builtin/ls-remote: fix leaking `pattern` strings builtin/submodule--helper: fix leaking buffer in `is_tip_reachable` builtin/submodule--helper: fix leaking clone depth parameter builtin/name-rev: fix various trivial memory leaks builtin/describe: fix trivial memory leak when describing blob ...
2024-08-14commit-reach: add get_branch_base_for_tipDerrick Stolee1-0/+2
Add a new reachability algorithm that intends to discover (from a heuristic) which branch was used as the starting point for a given commit. Add focused tests using the 'test-tool reach' command. In repositories that use pull requests (or merge requests) to advance one or more "protected" branches, the history of that reference can be recovered by following the first-parent history in most cases. Most are completed using no-fast-forward merges, though squash merges are quite common. Less common is rebase-and-merge, which still validates this assumption. Finally, the case that breaks this assumption is the fast-forward update (with potential rebasing). Even in this case, the previous commit commonly appears in the first-parent history of the branch. Similar assumptions can be made for a topic branch created by a single user with the intention to merge back into another branch. Using 'git commit', 'git merge', and 'git cherry-pick' from HEAD will default to having the first-parent commit be the previous commit at HEAD. This history changes only with commands such as 'git reset' or 'git rebase', where the command names also imply that the branch is starting from a new location. With this movement of branches in mind, the following heuristic is proposed as a way to determine the base branch for a given source branch: Among a list of candidate base branches, select the candidate that minimizes the number of commits in the first-parent history of the source that are not in the first-parent history of the candidate. Prior third-party solutions to this problem have used this optimization criteria, but have relied upon extracting the first-parent history and comparing those lists as tables instead of using commit-graph walks. Given current command-line interface options, this optimization criteria is not easy to detect directly. Even using the command git rev-list --count --first-parent <base>..<source> does not measure this count, as it uses full reachability from <base> to determine which commits to remove from the range '<base>..<source>'. This may lead to one asking if we should instead be using the full reachability of the candidate and only the first-parent history of the source. This, unfortunately, does not work for repositories that use long-lived branches and automation to merge across those branches. In extremely large repositories, merging into a single trunk may not be feasible. This is usually due to the desired frequency of updates (thousands of engineers doing daily work) combined with the time required to perform a validation build. These factors combine to create significant risk of semantic merge conflicts, leading to build breaks on the trunk. In response, repository maintainers can create a single Level Zero (L0) trunk and multiple Level One (L1) branches. By partitioning the engineers by organization, these engineers may see lower risk of semantic merge conflicts as well as be protected against build breaks in other L1 branches. The key to making this system work is a semi-automated process of merging L1 branches into the L0 trunk and vice-versa. In a large enough organization, these L1 branches may further split into L2 or L3 branches, but the same principles apply for merging across deeper levels. If these automated merges use a typical merge with the second parent bringing in the "new" content, then each L0 and L1 branch can track its previous positions by following first-parent history, which appear as parallel paths (until reaching the first place where the branches diverged). If we also walk to second parents, then the histories overlap significantly and cannot be distinguished except for very-recent changes. For this reason, the first-parent condition should be symmetrical across the base and source branches. Another common case for desiring the result of this optimization method is the use of release branches. When releasing a version of a repository, a branch can be used to track that release. Any updates that are worth fixing in that release can be merged to the release branch and shipped with only the necessary fixes without any new features introduced in the trunk branch. The 'maint-2.<X>' branches represent this pattern in the Git project. The microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This application doesn't need the symmetrical first-parent condition, but the use of first-parent histories does not change the results for these branches. To determine the base branch from a list of candidates, create a new method in commit-reach.c that performs a single* commit-graph walk. The core concept is to walk first-parents starting at the candidate bases and the source, tracking the "best" base to reach a given commit. Use generation numbers to ensure that a commit is walked at most once and all children have been explored before visiting it. When reaching a commit that is reachable from both a base and the source, we will then have a guarantee that this is the closest intersection of first-parent histories. Track the best base to reach that commit and return it as a result. In rare cases involving multiple root commits, the first-parent history of the source may never intersect any of the candidates and thus a null result is returned. * There are up to two walks, since we require all commits to have a computed generation number in order to avoid incorrect results. This is similar to the need for computed generation numbers in ahead_behind() as implemented in fd67d149bde (commit-reach: implement ahead_behind() logic, 2023-03-20). In order to track the "best" base, use a new commit slab that stores an integer. This value defaults to zero upon initialization, so use -1 to track that the source commit can reach this commit and use 'i + 1' to track that the ith base can reach this commit. When multiple bases can reach a commit, minimize the index to break ties. This allows the caller to specify an order to the bases that determines some amount of preference when the heuristic does not result in a unique result. The trickiest part of the integer slab is what happens when reaching a collision among the histories of the bases and the history of the source. This is noticed when viewing the first parent and seeing that it has a slab value that differs in sign (negative or positive). In this case, the collision commit is stored in the method variable 'branch_point' and its slab value is set to -1. The index of the best base (so far) is stored in the method variable 'best_index'. It is possible that there are multiple commits that have the branch_point as its first parent, leading to multiple updates of best_index. The result is determined when 'branch_point' is visited in the commit walk, giving the guarantee that all commits that could reach 'branch_point' were visited. Several interesting cases of collisions and different results are tested in the t6600-test-reach.sh script. Recall that this script also tests the algorithm in three possible states involving the commit-graph file and how many commits are written in the file. This provides some coverage of the need (and lack of need) for the ensure_generations_valid() method. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13t: move reftable/readwrite_test.c to the unit testing frameworkChandra Pratap1-1/+0
reftable/readwrite_test.c exercises the functions defined in reftable/reader.{c,h} and reftable/writer.{c,h}. Migrate reftable/readwrite_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework and renaming the tests to align with unit-tests' naming conventions. Since some tests in reftable/readwrite_test.c use the functions set_test_hash(), noop_flush() and strbuf_add_void() defined in reftable/test_framework.{c,h} but these files are not #included in the ported unit test, copy these functions in the new test file. While at it, ensure structs are 0-initialized with '= { 0 }' instead of '= { NULL }'. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13global: prepare for hiding away repo-less config functionsPatrick Steinhardt3-0/+6
We're about to hide config functions that implicitly depend on `the_repository` behind the `USE_THE_REPOSITORY_VARIABLE` macro. This will uncover a bunch of dependents that transitively relied on the global variable, but didn't define the macro yet. Adapt them such that we define the macro to prepare for this change. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09refs: add referent to each_ref_fnJohn Cai1-1/+1
Add a parameter to each_ref_fn so that callers to the ref APIs that use this function as a callback can have acess to the unresolved value of a symbolic ref. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-06midx: support reading incremental MIDX chainsTaylor Blau1-7/+17
Now that the MIDX machinery's internals have been taught to understand incremental MIDXs over the previous handful of commits, the MIDX machinery itself can begin reading incremental MIDXs. (Note that while the on-disk format for incremental MIDXs has been defined, the writing end has not been implemented. This will take place in the commit after next.) The core of this change involves following the order specified in the MIDX chain in reverse and opening up MIDXs in the chain one-by-one, adding them to the previous layer's `->base_midx` pointer at each step. In order to implement this, the `load_multi_pack_index()` function is taught to call a new `load_multi_pack_index_chain()` function if loading a non-incremental MIDX failed via `load_multi_pack_index_one()`. When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in the file one-by-one and dispatches calls to `load_multi_pack_index_one()` to read each layer of the MIDX chain. When a layer was successfully read, it is added to the MIDX chain by calling `add_midx_to_chain()` which validates the contents of the `BASE` chunk, performs some bounds checks on the number of combined packs and objects, and attaches the new MIDX by assigning its `base_midx` pointer to the existing part of the chain. As a supplement to this, introduce a new mode in the test-read-midx test-tool which allows us to read the information for a specific MIDX in the chain by specifying its trailing checksum via the command-line arguments like so: $ test-tool read-midx .git/objects [checksum] Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-06t: port helper/test-hashmap.c to unit-tests/t-hashmap.cGhanshyam Thakkar1-97/+1
helper/test-hashmap.c along with t0011-hashmap.sh test the hashmap.h library. Migrate them to the unit testing framework for better debugging, runtime performance and concise code. Along with the migration, make 'add' tests from the shell script order agnostic in unit tests, since they iterate over entries with the same keys and we do not guarantee the order. This was already done for the 'iterate' tests[1]. The helper/test-hashmap.c is still not removed because it contains a performance test meant to be run by the user directly (not used in t/perf). And it makes sense for such a utility to be a helper. [1]: e1e7a77141 (t: sort output of hashmap iteration, 2019-07-30) Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Helped-by: Josh Steadmon <steadmon@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-05set errno=0 before strtoX callsKyle Lippincott2-0/+3
To detect conversion failure after calls to functions like `strtod`, one can check `errno == ERANGE`. These functions are not guaranteed to set `errno` to `0` on successful conversion, however. Manual manipulation of `errno` can likely be avoided by checking that the output pointer differs from the input pointer, but that's not how other locations, such as parse.c:139, handle this issue; they set errno to 0 prior to executing the function. For every place I could find a strtoX function with an ERANGE check following it, set `errno = 0;` prior to executing the conversion function. Signed-off-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-04t: move reftable/tree_test.c to the unit testing frameworkChandra Pratap1-1/+0
reftable/tree_test.c exercises the functions defined in reftable/tree.{c, h}. Migrate reftable/tree_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework and renaming the tests to align with unit-tests' standards. Also add a comment to help understand the test routine. Note that this commit mostly moves the test from reftable/ to t/unit-tests/ and most of the refactoring is performed by the trailing commits. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-01t: move reftable/pq_test.c to the unit testing frameworkChandra Pratap1-1/+0
reftable/pq_test.c exercises a priority queue defined by reftable/pq.{c, h}. Migrate reftable/pq_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework, and renaming the tests to align with unit-tests' standards. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-01t/test-repository: fix leaking repositoryPatrick Steinhardt1-2/+2
The test-repository test helper zeroes out `the_repository` such that it can be sure that our codebase only ends up using the supplied repository that we initialize in the respective helper functions. This does cause memory leaks though as the data that `the_repository` has been holding onto is not referenced anymore. Fix this by calling `repo_clear()` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>