aboutsummaryrefslogtreecommitdiffstats
path: root/revision.c
AgeCommit message (Collapse)AuthorFilesLines
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt1-7/+5
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-13Merge branch 'ps/commit-graph-per-object-source'Junio C Hamano1-3/+0
Code clean-up around commit-graph. * ps/commit-graph-per-object-source: commit-graph: pass graphs that are to be merged as parameter commit-graph: return commit graph from `repo_find_commit_pos_in_graph()` commit-graph: return the prepared commit graph from `prepare_commit_graph()` revision: drop explicit check for commit graph blame: drop explicit check for commit graph
2025-09-29Merge branch 'jk/setup-revisions-freefix'Junio C Hamano1-3/+46
There are double frees and leaks around setup_revisions() API used in "git stash show", which has been fixed, and setup_revisions() API gained a wrapper to make it more ergonomic when using it with strvec-manged argc/argv pairs. * jk/setup-revisions-freefix: revision: retain argv NULL invariant in setup_revisions() treewide: pass strvecs around for setup_revisions_from_strvec() treewide: use setup_revisions_from_strvec() when we have a strvec revision: add wrapper to setup_revisions() from a strvec revision: manage memory ownership of argv in setup_revisions() stash: tell setup_revisions() to free our allocated strings
2025-09-22revision: retain argv NULL invariant in setup_revisions()Jeff King1-0/+6
In an argc/argv pair, the entry for argv[argc] is generally NULL. You can iterate by counting up to argc, or by looking for the NULL entry in argv. When we pass such a pair to setup_revisions(), it shrinks argc to account for the options we consumed and returns the result to the caller. But it doesn't touch the entries after the reduced argc. So argv[argc] will be left pointing at some arbitrary entry rather than NULL. This isn't the source of any known bugs, since all callers are aware of the limitation and act accordingly. But it's a possible gotcha that may be easy to miss. Let's set the new argv[argc] to NULL, taking care to free it if the caller asked us to do so. It is tempting to do likewise for all of the entries afterwards, too, as some of them may also need to be freed (e.g., if coming from a strvec). But doing so isn't entirely trivial, as we munge argc in the function (e.g., when we find "--" and move all of the entries after it into the prune_data list). It would be possible with some light refactoring, but it's probably not worth it. Nobody should ever look at them (they are beyond the revised argc and past the NULL argv entry) outside of strvec cleanup, and setup_revisions_from_strvec() already handles this case. There's one other interesting gotcha: many callers which do not want to provide arguments just pass 0/NULL for argc/argv. We need to check for this case before assigning the final NULL. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-22revision: add wrapper to setup_revisions() from a strvecJeff King1-0/+19
The setup_revisions() function was designed to take the argc/argv pair from the operating system. But we sometimes construct our own argv using a strvec and pass that in. There are a few gotchas that callers need to deal with here: 1. You should always pass the free_removed_argv_elements option via setup_revision_opt. Otherwise, entries may be leaked if setup_revisions() re-shuffles options. 2. After setup_revisions() returns, the strvec state is odd. We get a reduced argc from setup_revisions() telling us how many unknown options were left in place. Entries after that in argv may be retained, or may be NULL (depending on how the reshuffling happened). But the strvec's "nr" field still represents the original value, and some of the entries it thinks it is still storing may be NULL. Callers must be careful with how they access it. Some callers deal with (1), but not all. In practice they are OK because they do not pass any options that would cause setup_revisions() to re-shuffle (namely unknown options which may be relayed from the user, and the use of the "--" separator). But it's probably a good idea to consistently pass this option anyway to future-proof ourselves against the details of setup_revisions() changing. No callers address (2), though I don't think there any visible bugs. Most of them simply call strvec_clear() and never otherwise look at the result. And in fact, if they naively set foo.nr to the argc returned by setup_revisions(), that would cause leaks! Because setup_revisions() does not free consumed options[1], we have to leave the "nr" field of the strvec at its original value to find and free them during strvec_clear(). So I don't think there are any bugs to fix here, but we can make things safer and simpler for callers. Let's introduce a helper function that sets the free_removed_argv_elements automatically and shrinks the strvec to represent the retained options afterwards (taking care to free the now-obsolete entries). We'll start by converting all of the call-sites which use the free_removed_argv_elements option. There should be no behavior change for them, except that their "shrunken" entries are cleaned up immediately, rather than waiting for a strvec_clear() call. [1] Arguably setup_revisions() should be doing this step for us if we told it to free removed options, but there are many existing callers which will be broken if it did. Introducing this helper is a possible first step towards that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-22revision: manage memory ownership of argv in setup_revisions()Jeff King1-3/+21
The setup_revisions() function takes an argc/argv pair and consumes arguments from it, returning a reduced argc count to the caller. But it may also overwrite entries within the argv array, as it shifts unknown options to the front of argv (so they can be found in the range of 0..argc-1 after we return). For a normal argc/argv coming from the operating system, this is OK. We don't need to worry about memory ownership of the strings in those entries. But some callers pass in allocated strings from a strvec, and we do need to care about those. We faced a similar issue in f92dbdbc6a (revisions API: don't leak memory on argv elements that need free()-ing, 2022-08-02), which added an option for callers to tell us that elements need to be freed. But the implementation within setup_revisions() was incomplete. It only covered the case of dropping "--", but not the movement of unknown options. When we shift argv entries around, we should free the elements we are about to overwrite, so they are not leaked. For example, in: git stash show -p --invalid we will pass this to setup_revisions(): argc = 3, argv[] = { "show", "-p", "--invalid", NULL } which will then return: argc = 2, argv[] = { "show", "--invalid", "--invalid", NULL } overwriting the "-p" entry, which is leaked unless we free it at that moment. You can see in the output above another potential problem. We now have two copies of the "--invalid" string. If the caller does not respect the new argc when free-ing the strings via strvec_clear(), we'll get a double-free. And git-stash suffers from this, and will crash with the above command. So it seems at first glance that the solution is to just assign the reduced argc to the strvec.nr field in the caller. Then it would stop after freeing only any copied entries. But that's not always right either! Remember that we are reducing "argc" to account for elements we've consumed. So if there isn't an invalid option, we'd turn: argc = 2, argv[] = { "show", "-p", NULL } into: argc = 1, argv[] = { "show", "-p", NULL } In that case strvec_clear() must keep looking past the shortened argc we return to find the original "-p" to free. It needs to use the original argc to do that. We can solve this by turning our argv writes into strict moves, not copies. When we shuffle an unknown option to the front, we'll overwrite its old position with NULL. That leaves an argv array that may have NULL "holes" in it. So in the "--invalid" example above we get: argc = 2, argv[] = { "show", "--invalid", NULL, NULL } but something like "git stash -p --invalid -p" would yield: argc = 3, argv[] = { "show", "--invalid", NULL, "-p", NULL } because we move "--invalid" to overwrite the first "-p", but the second one is quietly consumed. But strvec_clear() can handle that fine (it iterates over the "nr" field, and passing NULL to free() is OK). To ease the implementation, I've introduced a helper function. It's a little hacky because it must take a double-pointer to set the old position to NULL. Which in turn means we cannot pass "&arg", our local alias for the current entry we're parsing, but instead "&argv[i]", the pointer in the original array. And to make it even more confusing, we delegate some of this work to handle_revision_opt(), which is passed a subset of the argv array, so is always working on "&argv[0]". Likewise, because handle_revision_opt() only receives the part of argv left to parse, it receives the array to accumulate unknown options as a separate unkc/unkv pair. But we're always working on the same argv array, so our strategy works fine. I suspect this would be a bit more obvious (and avoid some pointer cleverness) if all functions saw the full argv array and worked with positions within it (and our new helper would take two positions, a src and dst). But that would involve refactoring handle_revision_opt(). I punted on that, as what's here is not too ugly and is all contained within revision.c itself. The new test demonstrates that "git stash show -p --invalid" no longer crashes with a double-free (because we move instead of copy). And it passes with SANITIZE=leak because we free "-p" before overwriting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-04revision: drop explicit check for commit graphPatrick Steinhardt1-3/+0
When filtering down revisions by paths we know to use bloom filters from the commit graph, if we have any. The entry point for this is in `check_maybe_different_in_bloom_filter()`, where we first verify that: - We do have a commit graph. - That the commit is contained therein by checking that we have a proper generation number. - And that the graph contains a bloom filter. The first check is somewhat redundant though: if we don't have a commit graph, then the second check would already tell us that we don't have a generation number for the specific commit. In theory this could be seen as a performance optimization to short-circuit for scenarios where there is no commit graph. But in practice this shouldn't matter: if there is no commit graph, then the commit graph data slab would also be unpopulated and thus a lookup of the commit should happen in constant time. Drop the unnecessary check. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-21Merge branch 'ly/changed-path-traversal-with-magic-pathspec'Junio C Hamano1-13/+29
Revision traversal limited with pathspec, like "git log dir/*", used to ignore changed-paths Bloom filter when the pathspec contained wildcards; now they take advantage of the filter when they can. * ly/changed-path-traversal-with-magic-pathspec: bloom: enable bloom filter with wildcard pathspec in revision traversal
2025-08-21Merge branch 'ps/remote-rename-fix'Junio C Hamano1-1/+2
"git remote rename origin upstream" failed to move origin/HEAD to upstream/HEAD when origin/HEAD is unborn and performed other renames extremely inefficiently, which has been corrected. * ps/remote-rename-fix: builtin/remote: only iterate through refs that are to be renamed builtin/remote: rework how remote refs get renamed builtin/remote: determine whether refs need renaming early on builtin/remote: fix sign comparison warnings refs: simplify logic when migrating reflog entries refs: pass refname when invoking reflog entry callback
2025-08-11bloom: enable bloom filter with wildcard pathspec in revision traversalLidong Yan1-13/+29
When traversing commits, a pathspec item can be used to limit the traversal to commits that modify the specified paths. And the commit-graph includes a Bloom filter to exclude commits that definitely did not modify a given pathspec item. During commit traversal, the Bloom filter can significantly improve performance. However, it is disabled if the specified pathspec item contains wildcard characters or magic signatures. For performance reason, enable Bloom filter even if a pathspec item contains wildcard characters by filtering only the non-wildcard part of the pathspec item. The function of pathspec magic signature is generally to narrow down the path specified by the pathspecs. So, enable Bloom filter when the magic signature is "top", "glob", "attr", "--depth" or "literal". "exclude" is used to select paths other than the specified path, rather than serving as a filtering function, so it cannot be used together with the Bloom filter. Since Bloom filter is not case insensitive even in case insensitive system (e.g. MacOS), it cannot be used together with "icase" magic. With this optimization, we get some improvements for pathspecs with wildcards or magic signatures. First, in the Git repository we see these modest results: git log -100 -- "t/*" Benchmark 1: new Time (mean ± σ): 20.4 ms ± 0.6 ms Range (min … max): 19.3 ms … 24.4 ms Benchmark 2: old Time (mean ± σ): 23.4 ms ± 0.5 ms Range (min … max): 22.5 ms … 24.7 ms git log -100 -- ":(top)t" Benchmark 1: new Time (mean ± σ): 16.2 ms ± 0.4 ms Range (min … max): 15.3 ms … 17.2 ms Benchmark 2: old Time (mean ± σ): 18.6 ms ± 0.5 ms Range (min … max): 17.6 ms … 20.4 ms But in a larger repo, such as the LLVM project repo below, we get even better results: git log -100 -- "libc/*" Benchmark 1: new Time (mean ± σ): 16.0 ms ± 0.6 ms Range (min … max): 14.7 ms … 17.8 ms Benchmark 2: old Time (mean ± σ): 26.7 ms ± 0.5 ms Range (min … max): 25.4 ms … 27.8 ms git log -100 -- ":(top)libc" Benchmark 1: new Time (mean ± σ): 15.6 ms ± 0.6 ms Range (min … max): 14.4 ms … 17.7 ms Benchmark 2: old Time (mean ± σ): 19.6 ms ± 0.5 ms Range (min … max): 18.6 ms … 20.6 ms Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-06refs: pass refname when invoking reflog entry callbackPatrick Steinhardt1-1/+2
With `refs_for_each_reflog_ent()` callers can iterate through all the reflog entries for a given reference. The callback that is being invoked for each such entry does not receive the name of the reference that we are currently iterating through. This isn't really a limiting factor, as callers can simply pass the name via the callback data. But this layout sometimes does make for a bit of an awkward calling pattern. One example: when iterating through all reflogs, and for each reflog we iterate through all refnames, we have to do some extra book keeping to track which reference name we are currently yielding reflog entries for. Change the signature of the callback function so that the reference name of the reflog gets passed through to it. Adapt callers accordingly and start using the new parameter in trivial cases. The next commit will refactor the reference migration logic to make use of this parameter so that we can simplify its logic a bit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-04Merge branch 'ps/config-wo-the-repository'Junio C Hamano1-1/+1
The config API had a set of convenience wrapper functions that implicitly use the_repository instance; they have been removed and inlined at the calling sites. * ps/config-wo-the-repository: (21 commits) config: fix sign comparison warnings config: move Git config parsing into "environment.c" config: remove unused `the_repository` wrappers config: drop `git_config_set_multivar()` wrapper config: drop `git_config_get_multivar_gently()` wrapper config: drop `git_config_set_multivar_in_file_gently()` wrapper config: drop `git_config_set_in_file_gently()` wrapper config: drop `git_config_set()` wrapper config: drop `git_config_set_gently()` wrapper config: drop `git_config_set_in_file()` wrapper config: drop `git_config_get_bool()` wrapper config: drop `git_config_get_ulong()` wrapper config: drop `git_config_get_int()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string_multi()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get()` wrapper config: drop `git_config_clear()` wrapper ...
2025-08-01Merge branch 'jk/revision-no-early-output'Junio C Hamano1-17/+0
Remove unsupported, unused, and unsupportable old option from "git log". * jk/revision-no-early-output: revision: drop early output option
2025-07-23Merge branch 'ly/changed-paths-traversal'Junio C Hamano1-62/+60
Lift the limitation to use changed-path filter in "git log" so that it can be used for a pathspec with multiple literal paths. * ly/changed-paths-traversal: bloom: optimize multiple pathspec items in revision revision: make helper for pathspec to bloom keyvec bloom: replace struct bloom_key * with struct bloom_keyvec bloom: rename function operates on bloom_key bloom: add test helper to return murmur3 hash
2025-07-23config: drop `git_config()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-21revision: drop early output optionJeff King1-17/+0
We added the --early-output feature long ago in cdcefbc971 (Add "--early-output" log flag for interactive GUI use, 2007-11-03). The idea was that GUIs could use it to progressively render a history view, showing something quick-and-inaccurate at first and then enhancing it later. But we never documented it, and it appears never to have been used, even by the projects which initially expressed interest. There was an RFC patch for gitk to use it: http://public-inbox.org/git/18221.2285.259487.655684@cargo.ozlabs.ibm.com/ but it was never merged. Likewise QGit had a patch in: https://lore.kernel.org/git/e5bfff550711040225ne67c907r2023b1354c35f35@mail.gmail.com/ but it was never fully merged (to this day, QGit has a commented-out line to add "--early-output" to the "log" invocation). Searching for other mentions on the web or forges like github.com turns up nothing. Meanwhile, the feature has been broken off and on over the years without anybody noticing (and naturally, there are no tests, either). From 2011 to 2017 the option didn't even turn on via "--early-output"; this was fixed in e35b6ac56f (revision.h: turn rev_info.early_output back into an unsigned int, 2017-06-10). It worked for a while then, but it does not interact well at all with commit-graphs (which are turned on by default these days). The main logic to count early commits is triggered by limit_list(), which we traditionally invoked when showing output in topo-order (and --early-output always enables --topo-order). But that changed in f0d9cc4196 (revision.c: begin refactoring --topo-order logic, 2018-11-01). Now when we have generation numbers, we skip limit_list() entirely, and the early-output code shows no commits, and just the final header "Final output: 1 done". Which is syntactically OK, but semantically wrong: that message should give the total number of commits we're about to show. So let's drop the feature. It is extra code that is untested and undocumented, and makes working on the revision machinery more brittle. Given the history above, it seems unlikely that anybody is using it (or has used it), and we can drop it without the usual deprecation period. A gentler option might be to "soft" drop it: keep accepting the option, have it imply --topo-order as it does now, print "Final output: 1 done", and then do our regular traversal. That would keep any hypothetical caller working. But it doesn't seem worth the hassle to me. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15Merge branch 'ps/object-store'Junio C Hamano1-2/+3
Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`
2025-07-15bloom: optimize multiple pathspec items in revisionLidong Yan1-10/+11
To enable optimize multiple pathspec items in revision traversal, return 0 if all pathspec item is literal in forbid_bloom_filters(). Add for loops to initialize and check each pathspec item's bloom_keyvec when optimization is possible. Add new test cases in t/t4216-log-bloom.sh to ensure - consistent results between the optimization for multiple pathspec items using bloom filter and the case without bloom filter optimization. - does not use bloom filter if any pathspec item is not literal. With these optimizations, we get some improvements for multi-pathspec runs of 'git log'. First, in the Git repository we see these modest results: Benchmark 1: old Time (mean ± σ): 73.1 ms ± 2.9 ms Range (min … max): 69.9 ms … 84.5 ms 42 runs Benchmark 2: new Time (mean ± σ): 55.1 ms ± 2.9 ms Range (min … max): 51.1 ms … 61.2 ms 52 runs Summary 'new' ran 1.33 ± 0.09 times faster than 'old' But in a larger repo, such as the LLVM project repo below, we get even better results: Benchmark 1: old Time (mean ± σ): 1.974 s ± 0.006 s Range (min … max): 1.960 s … 1.983 s 10 runs Benchmark 2: new Time (mean ± σ): 262.9 ms ± 2.4 ms Range (min … max): 257.7 ms … 266.2 ms 11 runs Summary 'new' ran 7.51 ± 0.07 times faster than 'old' Signed-off-by: Derrick Stolee <stolee@gmail.com> [ly: rename convert_pathspec_to_filter() to convert_pathspec_to_bloom_keyvec()] Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-14Merge branch 'jk/all-negative-diff-filter-fix'Junio C Hamano1-1/+1
A diff-filter with negative-only specification like "git log --diff-filter=d" did not trigger correctly, which has been fixed. * jk/all-negative-diff-filter-fix: setup_revisions(): turn on diffs for all-negative diff filter
2025-07-14revision: make helper for pathspec to bloom keyvecLidong Yan1-16/+29
When preparing to use bloom filters in a revision walk, Git populates a boom_keyvec with an array of bloom keys for the components of a path. Before we create the ability to map multiple pathspecs to multiple bloom_keyvecs, extract the conversion from a pathspec to a bloom_keyvec into its own helper method. This simplifies the state that persists in prepare_to_use_bloom_filter() as well as makes the future change much simpler. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-14bloom: replace struct bloom_key * with struct bloom_keyvecLidong Yan1-46/+30
Previously, we stored bloom keys in a flat array and marked a commit as NOT TREESAME if any key reported "definitely not changed". To support multiple pathspec items, we now require that for each pathspec item, there exists a bloom key reporting "definitely not changed". This "for every" condition makes a flat array insufficient, so we introduce a new structure to group keys by a single pathspec item. `struct bloom_keyvec` is introduced to replace `struct bloom_key *` and `bloom_key_nr`. And because we want to support multiple pathspec items, we added a bloom_keyvec * and a bloom_keyvec_nr field to `struct rev_info` to represent an array of bloom_keyvecs. This commit still optimize only one pathspec item, thus bloom_keyvec_nr can only be 0 or 1. New bloom_keyvec_* functions are added to create and destroy a keyvec. bloom_filter_contains_vec() is added to check if all key in keyvec is contained in a bloom filter. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-14bloom: rename function operates on bloom_keyLidong Yan1-4/+4
git code style requires that functions operating on a struct S should be named in the form S_verb. However, the functions operating on struct bloom_key do not follow this convention. Therefore, fill_bloom_key() and clear_bloom_key() are renamed to bloom_key_fill() and bloom_key_clear(), respectively. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-07setup_revisions(): turn on diffs for all-negative diff filterJeff King1-1/+1
When the user gives us a diff filter like --diff-filter=D, we need to do a tree diff even if we're not planning to show the diff result itself, in order to decide whether to show the commit at all. So there's an explicit check of revs->diffopt.filter in setup_revisions(), and we set revs->diff if any bits are set. Originally that "filter" field covered both positive capital-letter filters (like "D") and also negative lowercase filters (like "d"), so it was sufficient for both cases. But later, 75408ca949 (diff-filter: be more careful when looking for negative bits, 2022-01-28) split the negative bits out into a "filter_not" field. We eventually fold those into "filter", but not until diff_setup_done() is called, which happens after our explicit check. As a result, a purely negative filter like: git log --diff-filter=d failed to turn on diffs at all. But rather than fail to filter by diff, because the filter variable is eventually set, we mistakenly show no commits at all, thinking that the empty diffs were cases where nothing passed through the filter. The smallest fix here is to just have our check look for any bits in either "filter" or "filter_not". I suspect it would also be OK to reorder the function a bit to call diff_setup_done() earlier, but that risks violating some other subtle ordering dependency. So I went with the simple and safe solution here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: get rid of `the_repository` in `for_each()` functionsPatrick Steinhardt1-1/+2
There are a couple of iterator-style functions that execute a callback for each instance of a given set, all of which currently depend on `the_repository`. Refactor them to instead take an object database as parameter so that we can get rid of this dependency. Rename the functions accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt1-1/+1
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-24Merge branch 'ly/prepare-show-merge-leakfix'Junio C Hamano1-0/+1
Leakfix. * ly/prepare-show-merge-leakfix: revision: fix memory leak in prepare_show_merge()
2025-06-09revision: fix memory leak in prepare_show_merge()Lidong Yan1-0/+1
In revision.c:prepare_show_merge(), we allocated an array in prune but forget to free it. Since parse_pathspec is not responsible to free prune, we should add `free(prune)` in the end of prepare_show_merge(). Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup'Junio C Hamano1-1/+1
Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-23Merge branch 'mh/left-right-limited'Junio C Hamano1-0/+2
"git log --{left,right}-only A...B", when A and B does not share any common ancestor, now behaves as expected. * mh/left-right-limited: revision: fix --left/right-only use with unrelated histories
2025-04-16Merge branch 'jt/rev-list-z'Junio C Hamano1-8/+0
"git rev-list" learns machine-parsable output format that delimits each field with NUL. * jt/rev-list-z: rev-list: support NUL-delimited --missing option rev-list: support NUL-delimited --boundary option rev-list: support delimiting objects with NUL bytes rev-list: refactor early option parsing rev-list: inline `show_object_with_name()` in `show_object()`
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt1-1/+1
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-01revision: fix --left/right-only use with unrelated historiesMatt Hunter1-0/+2
This is a similar fix as 023756f4eb (revision walker: --cherry-pick is a limited operation), but for the --left-only and --right-only options. When computing a symmetric difference between two unrelated histories, no suitable merge base exists, and so no boundary commit is flagged as UNINTERESTING. Previously, we relied on the presence of such boundary to trigger limiting and thus consideration of either "revs->left_only" or "revs->right_only". A number of other entries in the option parser have started including overrides for "revs->limited = 1". Do the same for these options. Signed-off-by: Matt Hunter <m@lfurio.us> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21rev-list: inline `show_object_with_name()` in `show_object()`Justin Tobler1-8/+0
The `show_object_with_name()` function only has a single call site. Inline call to `show_object_with_name()` in `show_object()` so the explicit function can be cleaned up and live closer to where it is used. While at it, factor out the code that prints the OID and newline for both objects with and without a name. In a subsequent commit, `show_object()` is modified to support printing object information in a NUL-delimited format. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10object: stop depending on `the_repository`Patrick Steinhardt1-1/+2
There are a couple of functions exposed by "object.c" that implicitly depend on `the_repository`. Remove this dependency by injecting the repository via a parameter. Adapt callers accordingly by simply using `the_repository`, except in cases where the subsystem is already free of the repository. In that case, we instead pass the repository provided by the caller's context. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07worktree: return allocated string from `get_worktree_git_dir()`Patrick Steinhardt1-1/+6
The `get_worktree_git_dir()` function returns a string constant that does not need to be free'd by the caller. This string is computed for three different cases: - If we don't have a worktree we return a path into the Git directory. The returned string is owned by `the_repository`, so there is no need for the caller to free it. - If we have a worktree, but no worktree ID then the caller requests the main worktree. In this case we return a path into the common directory, which again is owned by `the_repository` and thus does not need to be free'd. - In the third case, where we have an actual worktree, we compute the path relative to "$GIT_COMMON_DIR/worktrees/". This string does not need to be released either, even though `git_common_path()` ends up allocating memory. But this doesn't result in a memory leak either because we write into a buffer returned by `get_pathname()`, which returns one out of four static buffers. We're about to drop `git_common_path()` in favor of `repo_common_path()`, which doesn't use the same mechanism but instead returns an allocated string owned by the caller. While we could adapt `get_worktree_git_dir()` to also use `get_pathname()` and print the derived common path into that buffer, the whole schema feels a lot like premature optimization in this context. There are some callsites where we call `get_worktree_git_dir()` in a loop that iterates through all worktrees. But none of these loops seem to be even remotely in the hot path, so saving a single allocation there does not feel worth it. Refactor the function to instead consistently return an allocated path so that we can start using `repo_common_path()` in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano1-0/+1
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-13Merge branch 'kn/midx-wo-the-repository'Junio C Hamano1-6/+7
Yet another "pass the repository through the callchain" topic. * kn/midx-wo-the-repository: midx: inline the `MIDX_MIN_SIZE` definition midx: pass down `hash_algo` to functions using global variables midx: pass `repository` to `load_multi_pack_index` midx: cleanup internal usage of `the_repository` and `the_hash_algo` midx-write: pass down repository to `write_midx_file[_only]` write-midx: add repository field to `write_midx_context` midx-write: use `revs->repo` inside `read_refs_snapshot` midx-write: pass down repository to static functions packfile.c: remove unnecessary prepare_packed_git() call midx: add repository to `multi_pack_index` struct config: make `packed_git_(limit|window_size)` non-global variables config: make `delta_base_cache_limit` a non-global variable packfile: pass down repository to `for_each_packed_object` packfile: pass down repository to `has_object[_kept]_pack` packfile: pass down repository to `odb_pack_name` packfile: pass `repository` to static function in the file packfile: use `repository` from `packed_git` directly packfile: add repository to struct `packed_git`
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04Merge branch 'kn/the-repository' into kn/midx-wo-the-repositoryJunio C Hamano1-6/+7
* kn/the-repository: packfile.c: remove unnecessary prepare_packed_git() call midx: add repository to `multi_pack_index` struct config: make `packed_git_(limit|window_size)` non-global variables config: make `delta_base_cache_limit` a non-global variable packfile: pass down repository to `for_each_packed_object` packfile: pass down repository to `has_object[_kept]_pack` packfile: pass down repository to `odb_pack_name` packfile: pass `repository` to static function in the file packfile: use `repository` from `packed_git` directly packfile: add repository to struct `packed_git`
2024-12-04packfile: pass down repository to `for_each_packed_object`Karthik Nayak1-4/+5
The function `for_each_packed_object` currently relies on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from this function and closely related function `is_promisor_object`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: pass down repository to `has_object[_kept]_pack`Karthik Nayak1-2/+2
The functions `has_object[_kept]_pack` currently rely on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from these functions and any related ones. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21bisect: fix leaking good/bad terms when reading multipe timesPatrick Steinhardt1-2/+2
Even though `read_bisect_terms()` is declared as assigning string constants, it in fact assigns allocated strings to the `read_bad` and `read_good` out parameters. The only callers of this function assign the result to global variables and thus don't have to free them in order to be leak-free. But that changes when executing the function multiple times because we'd then overwrite the previous value and thus make it unreachable. Fix the function signature and free the previous values. This leak is exposed by t0630, but plugging it does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04revision: fix leaking bloom filtersPatrick Steinhardt1-0/+5
The memory allocated by `prepare_to_use_bloom_filter()` is not released by `release_revisions()`, causing a memory leak. Plug it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30revision: fix leaking saved parentsPatrick Steinhardt1-2/+10
The `saved_parents` slab is used by `--full-diff` to save parents of a commit which we are about to rewrite. We do not release its contents once it's not used anymore, causing a memory leak. Plug it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30revision: fix memory leaks when rewriting parentsPatrick Steinhardt1-0/+2
Both `rewrite_parents()` and `remove_duplicate_parents()` may end up dropping some parents from a commit without freeing the respective `struct commit_list` items. This causes a bunch of memory leaks. Plug these. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27revision: fix leaking parents when simplifying commitsPatrick Steinhardt1-0/+5
When simplifying commits, e.g. because they are treesame with their parents, we unset the commit's parent pointers but never free them. Plug the resulting memory leaks. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-10Merge branch 'jk/free-commit-buffer-of-skipped-commits'Junio C Hamano1-0/+1
The code forgot to discard unnecessary in-core commit buffer data for commits that "git log --skip=<number>" traversed but omitted from the output, which has been corrected. * jk/free-commit-buffer-of-skipped-commits: revision: free commit buffers for skipped commits
2024-08-30revision: free commit buffers for skipped commitsJeff King1-0/+1
In git-log we leave the save_commit_buffer flag set to "1", which tells the commit parsing code to store the object content after it has parsed it to find parents, tree, etc. That lets us reuse the contents for pretty-printing the commit in the output. And then after printing each commit, we call free_commit_buffer(), since we don't need it anymore. But some options may cause us to traverse commits which are not part of the output. And so git-log does not see them at all, and doesn't free them. One such case is something like: git log -n 1000 --skip=1000000 which will churn through a million commits, before showing only a thousand. We loop through these inside get_revision(), without freeing the contents. As a result, we end up storing the object data for those million commits simultaneously. We should free the stored buffers (if any) for those commits as we skip over them, which is what this patch does. Running the above command in linux.git drops the peak heap usage from ~1.1GB to ~200MB, according to valgrind/massif. (I thought we might get an even bigger improvement, but the remaining memory is going to commit/tree structs, which we do hold on to forever). Note that this problem doesn't occur if: - you're running a git-rev-list without a --format parameter; it turns off save_commit_buffer by default, since it only output the object id - you've built a commit-graph file, since in that case we'd use the optimized graph data instead of the initial parse, and then do a lazy parse for commits we're actually going to output There are probably some other option combinations that can likewise end up with useless stored commit buffers. For example, if you ask for "foo..bar", then we'll have to walk down to the merge base, and everything on the "foo" side won't be shown. Tuning the "save" behavior to handle that might be tricky (I guess maybe drop buffers for anything we mark as UNINTERESTING?). And in the long run, the right solution here is probably to make sure the commit-graph is built (since it fixes the memory problem _and_ drastically reduces CPU usage). But since this "--skip" case is an easy one-liner, it's worth fixing in the meantime. It should be OK to make this call even if there is no saved buffer (e.g., because save_commit_buffer=0, or because a commit-graph was used), since it's O(1) to look up the buffer and is a noop if it isn't present. I verified by running the above command after "git commit-graph write --reachable", and it takes the same time with and without this patch. Reported-by: Yuri Karnilaev <karnilaev@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23Merge branch 'ps/config-wo-the-repository'Junio C Hamano1-1/+1
Use of API functions that implicitly depend on the_repository object in the config subsystem has been rewritten to pass a repository object through the callchain. * ps/config-wo-the-repository: config: hide functions using `the_repository` by default global: prepare for hiding away repo-less config functions config: don't depend on `the_repository` with branch conditions config: don't have setters depend on `the_repository` config: pass repo to functions that rename or copy sections config: pass repo to `git_die_config()` config: pass repo to `git_config_get_expiry_in_days()` config: pass repo to `git_config_get_expiry()` config: pass repo to `git_config_get_max_percent_split_change()` config: pass repo to `git_config_get_split_index()` config: pass repo to `git_config_get_index_threads()` config: expose `repo_config_clear()` config: introduce missing setters that take repo as parameter path: hide functions using `the_repository` by default path: stop relying on `the_repository` in `worktree_git_path()` path: stop relying on `the_repository` when reporting garbage hooks: remove implicit dependency on `the_repository` editor: do not rely on `the_repository` for interactive edits path: expose `do_git_common_path()` as `repo_common_pathv()` path: expose `do_git_path()` as `repo_git_pathv()`
2024-08-13path: stop relying on `the_repository` in `worktree_git_path()`Patrick Steinhardt1-1/+1
When not provided a worktree, then `worktree_git_path()` will fall back to returning a path relative to the main repository. In this case, we implicitly rely on `the_repository` to derive the path. Remove this dependency by passing a `struct repository` as parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09refs: add referent to each_ref_fnJohn Cai1-1/+1
Add a parameter to each_ref_fn so that callers to the ref APIs that use this function as a callback can have acess to the unresolved value of a symbolic ref. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-08Merge branch 'ps/leakfixes-more'Junio C Hamano1-22/+37
More memory leaks have been plugged. * ps/leakfixes-more: (29 commits) builtin/blame: fix leaking ignore revs files builtin/blame: fix leaking prefixed paths blame: fix leaking data for blame scoreboards line-range: plug leaking find functions merge: fix leaking merge bases builtin/merge: fix leaking `struct cmdnames` in `get_strategy()` sequencer: fix memory leaks in `make_script_with_merges()` builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()` apply: fix leaking string in `match_fragment()` sequencer: fix leaking string buffer in `commit_staged_changes()` commit: fix leaking parents when calling `commit_tree_extended()` config: fix leaking "core.notesref" variable rerere: fix various trivial leaks builtin/stash: fix leak in `show_stash()` revision: free diff options builtin/log: fix leaking commit list in git-cherry(1) merge-recursive: fix memory leak when finalizing merge builtin/merge-recursive: fix leaking object ID bases builtin/difftool: plug memory leaks in `run_dir_diff()` object-name: free leaking object contexts ...
2024-07-08Merge branch 'tb/path-filter-fix'Junio C Hamano1-4/+22
The Bloom filter used for path limited history traversal was broken on systems whose "char" is unsigned; update the implementation and bump the format version to 2. * tb/path-filter-fix: bloom: introduce `deinit_bloom_filters()` commit-graph: reuse existing Bloom filters where possible object.h: fix mis-aligned flag bits table commit-graph: new Bloom filter version that fixes murmur3 commit-graph: unconditionally load Bloom filters bloom: prepare to discard incompatible Bloom filters bloom: annotate filters with hash version repo-settings: introduce commitgraph.changedPathsVersion t4216: test changed path filters with high bit paths t/helper/test-read-graph: implement `bloom-filters` mode bloom.h: make `load_bloom_filter_from_graph()` public t/helper/test-read-graph.c: extract `dump_graph_info()` gitformat-commit-graph: describe version 2 of BDAT commit-graph: ensure Bloom filters are read with consistent settings revision.c: consult Bloom filters for root commits t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
2024-07-02Merge branch 'ps/use-the-repository'Junio C Hamano1-0/+2
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-06-25revision.c: consult Bloom filters for root commitsTaylor Blau1-4/+22
The commit-graph stores changed-path Bloom filters which represent the set of paths included in a tree-level diff between a commit's root tree and that of its parent. When a commit has no parents, the tree-diff is computed against that commit's root tree and the empty tree. In other words, every path in that commit's tree is stored in the Bloom filter (since they all appear in the diff). Consult these filters during pathspec-limited traversals in the function `rev_same_tree_as_empty()`. Doing so yields a performance improvement where we can avoid enumerating the full set of paths in a parentless commit's root tree when we know that the path(s) of interest were not listed in that commit's changed-path Bloom filter. Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Original-patch-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-24Merge branch 'jc/worktree-git-path'Junio C Hamano1-0/+1
Code cleanup. * jc/worktree-git-path: worktree_git_path(): move the declaration to path.h
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+2
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11revision: free diff optionsPatrick Steinhardt1-1/+1
There is a todo comment in `release_revisions()` that mentions that we need to free the diff options, which was added via 54c8a7c379 (revisions API: add a TODO for diff_free(&revs->diffopt), 2022-04-14). Releasing the diff options wasn't quite feasible at that time because some call sites rely on its contents to remain even after the revisions have been released. In fact, there really only are a couple of callsites that misbehave here: - `cmd_shortlog()` releases the revisions, but continues to access its file pointer. - `do_diff_cache()` creates a shallow copy of `struct diff_options`, but does not set the `no_free` member. Consequently, we end up releasing resources of the caller-provided diff options. - `diff_free()` and friends do not play nice when being called multiple times as they don't unset data structures that they have just released. Fix all of those cases and enable the call to `diff_free()`, which plugs a bunch of memory leaks. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11object-name: free leaking object contextsPatrick Steinhardt1-21/+34
While it is documented in `struct object_context::path` that this variable needs to be released by the caller, this fact is rather easy to miss given that we do not ever provide a function to release the object context. And of course, while some callers dutifully release the path, many others don't. Introduce a new `object_context_release()` function that releases the path. Convert callsites that used to free the path to use that new function and add missing calls to callsites that were leaking memory. Refactor those callsites as required to have a single return path, only. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11revision: fix leaking display notesPatrick Steinhardt1-0/+1
We never free the display notes options embedded into `struct revision`. Implement a new function `release_display_notes()` that we can call in `release_revisions()` to fix this. There is another gotcha here though: we play some games with the string list used to track extra notes refs, where we sometimes set the bit that indicates that strings should be strdup'd and sometimes unset it. This dance is done to avoid a copy of an already-allocated string when we call `enable_ref_display_notes()`. But this dance is rather pointless as we can instead call `string_list_append_nodup()` to transfer ownership of the allocated string to the list. Refactor the code to do so and drop the `strdup_strings` dance. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11revision: fix memory leak when reversing revisionsPatrick Steinhardt1-0/+1
When reversing revisions in a rev walk, `get_revision()` will allocate a new commit list and assign it to `revs->commits`. It does not free the old list though, which makes it leak. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-08worktree_git_path(): move the declaration to path.hJunio C Hamano1-0/+1
The definition of this function is in path.c but its declaration is in worktree.h, which is something unexpected. The function is explained as "Similar to git_path()"; declaring it next to where git_path() is declared would make more sense. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07revision: always store allocated strings in output encodingPatrick Steinhardt1-1/+2
The `git_log_output_encoding` variable can be set via the `--encoding=` option. When doing so, we conditionally either assign it to the passed value, or if the value is "none" we assign it the empty string. Depending on which of the both code paths we pick though, the variable may end up being assigned either an allocated string or a string constant. This is somewhat risky and may easily lead to bugs when a different code path may want to reassign a new value to it, freeing the previous value. We already to this when parsing the "i18n.logoutputencoding" config in `git_default_i18n_config()`. But because the config is typically parsed before we parse command line options this has been fine so far. Regardless of that, safeguard the code such that the variable always contains an allocated string. While at it, also free the old value in case there was any to plug a potential memory leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07cocci: apply rules to rewrite callers of "refs" interfacesPatrick Steinhardt1-9/+18
Apply the rules that rewrite callers of "refs" interfaces to explicitly pass `struct ref_store`. The resulting patch has been applied with the `--whitespace=fix` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-28Merge branch 'eb/hash-transition'Junio C Hamano1-2/+2
Work to support a repository that work with both SHA-1 and SHA-256 hash algorithms has started. * eb/hash-transition: (30 commits) t1016-compatObjectFormat: add tests to verify the conversion between objects t1006: test oid compatibility with cat-file t1006: rename sha1 to oid test-lib: compute the compatibility hash so tests may use it builtin/ls-tree: let the oid determine the output algorithm object-file: handle compat objects in check_object_signature tree-walk: init_tree_desc take an oid to get the hash algorithm builtin/cat-file: let the oid determine the output algorithm rev-parse: add an --output-object-format parameter repository: implement extensions.compatObjectFormat object-file: update object_info_extended to reencode objects object-file-convert: convert commits that embed signed tags object-file-convert: convert commit objects when writing object-file-convert: don't leak when converting tag objects object-file-convert: convert tag objects when writing object-file-convert: add a function to convert trees between algorithms object: factor out parse_mode out of fast-import and tree-walk into in object.h cache: add a function to read an OID of a specific algorithm tag: sign both hashes commit: export add_header_signature to support handling signatures on tags ...
2024-03-15Merge branch 'as/option-names-in-messages'Junio C Hamano1-1/+1
Error message updates. * as/option-names-in-messages: revision.c: trivial fix to message builtin/clone.c: trivial fix of message builtin/remote.c: trivial fix of error message transport-helper.c: trivial fix of error message
2024-03-11Merge branch 'js/merge-base-with-missing-commit'Junio C Hamano1-4/+8
Make sure failure return from merge_bases_many() is properly caught. * js/merge-base-with-missing-commit: merge-ort/merge-recursive: do report errors in `merge_submodule()` merge-recursive: prepare for `merge_submodule()` to report errors commit-reach(repo_get_merge_bases_many_dirty): pass on errors commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors commit-reach(get_octopus_merge_bases): pass on "missing commits" errors commit-reach(repo_get_merge_bases): pass on "missing commits" errors commit-reach(get_merge_bases_many_0): pass on "missing commits" errors commit-reach(merge_bases_many): pass on "missing commits" errors commit-reach(paint_down_to_common): start reporting errors commit-reach(paint_down_to_common): prepare for handling shallow commits commit-reach(repo_in_merge_bases_many): report missing commits commit-reach(repo_in_merge_bases_many): optionally expect missing commits commit-reach(paint_down_to_common): plug two memory leaks
2024-03-07Merge branch 'jk/upload-pack-bounded-resources'Junio C Hamano1-1/+2
Various parts of upload-pack has been updated to bound the resource consumption relative to the size of the repository to protect from abusive clients. * jk/upload-pack-bounded-resources: upload-pack: free tree buffers after parsing upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places upload-pack: always turn off save_commit_buffer upload-pack: disallow object-info capability by default upload-pack: accept only a single packfile-uri line upload-pack: use a strmap for want-ref lines upload-pack: use oidset for deepen_not list upload-pack: switch deepen-not list to an oid_array upload-pack: drop separate v2 "haves" array
2024-03-07Merge branch 'ml/log-merge-with-cherry-pick-and-other-pseudo-heads'Junio C Hamano1-4/+23
"git log --merge" learned to pay attention to CHERRY_PICK_HEAD and other kinds of *_HEAD pseudorefs. * ml/log-merge-with-cherry-pick-and-other-pseudo-heads: revision: implement `git log --merge` also for rebase/cherry-pick/revert revision: ensure MERGE_HEAD is a ref in prepare_show_merge
2024-03-07Merge branch 'cc/rev-list-allow-missing-tips'Junio C Hamano1-5/+19
"git rev-list --missing=print" has learned to optionally take "--allow-missing-tips", which allows the objects at the starting points to be missing. * cc/rev-list-allow-missing-tips: revision: fix --missing=[print|allow*] for annotated tags rev-list: allow missing tips with --missing=[print|allow*] t6022: fix 'test' style and 'even though' typo oidset: refactor oidset_insert_from_set() revision: clarify a 'return NULL' in get_reference()
2024-03-05revision.c: trivial fix to messageAlexander Shopov1-1/+1
ancestry-path is an option, not a command - mark it as such. This brings it in sync with the rest of usages in the file Signed-off-by: Alexander Shopov <ash@kambanaria.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29commit-reach(repo_get_merge_bases): pass on "missing commits" errorsJohannes Schindelin1-4/+8
The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases()` function (which is also surfaced via the `repo_get_merge_bases()` macro) is aware of that, too. Naturally, there are a lot of callers that need to be adjusted now, too. Next step: adjust the callers of `get_octopus_merge_bases()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: free tree buffers after parsingJeff King1-1/+2
When a client sends us a "want" or "have" line, we call parse_object() to get an object struct. If the object is a tree, then the parsed state means that tree->buffer points to the uncompressed contents of the tree. But we don't really care about it. We only really need to parse commits and tags; for trees and blobs, the important output is just a "struct object" with the correct type. But much worse, we do not ever free that tree buffer. It's not leaked in the traditional sense, in that we still have a pointer to it from the global object hash. But if the client requests many trees, we'll hold all of their contents in memory at the same time. Nobody really noticed because it's rare for clients to directly request a tree. It might happen for a lightweight tag pointing straight at a tree, or it might happen for a "tree:depth" partial clone filling in missing trees. But it's also possible for a malicious client to request a lot of trees, causing upload-pack's memory to balloon. For example, without this patch, requesting every tree in git.git like: pktline() { local msg="$*" printf "%04x%s\n" $((1+4+${#msg})) "$msg" } want_trees() { pktline command=fetch printf 0001 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | while read oid type; do test "$type" = "tree" || continue pktline want $oid done pktline done printf 0000 } want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null shows a peak heap usage of ~3.7GB. Which is just about the sum of the sizes of all of the uncompressed trees. For linux.git, it's closer to 17GB. So the obvious thing to do is to call free_tree_buffer() after we realize that we've parsed a tree. We know that upload-pack won't need it later. But let's push the logic into parse_object_with_flags(), telling it to discard the tree buffer immediately. There are two reasons for this. One, all of the relevant call-sites already call the with_options variant to pass the SKIP_HASH flag. So it actually ends up as less code than manually free-ing in each spot. And two, it enables an extra optimization that I'll discuss below. I've touched all of the sites that currently use SKIP_HASH in upload-pack. That drops the peak heap of the upload-pack invocation above from 3.7GB to ~24MB. I've also modified the caller in get_reference(); a partial clone benefits from its use in pack-objects for the reasons given in 0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects, 2022-09-06), where we were measuring blob requests. But note that the results of get_reference() are used for traversing, as well; so we really would _eventually_ use the tree contents. That makes this at first glance a space/time tradeoff: we won't hold all of the trees in memory at once, but we'll have to reload them each when it comes time to traverse. And here's where our extra optimization comes in. If the caller is not going to immediately look at the tree contents, and it doesn't care about checking the hash, then parse_object() can simply skip loading the tree entirely, just like we do for blobs! And now it's not a space/time tradeoff in get_reference() anymore. It's just a lazy-load: we're delaying reading the tree contents until it's time to actually traverse them one by one. And of course for upload-pack, this optimization means we never load the trees at all, saving lots of CPU time. Timing the "every tree from git.git" request above shows upload-pack dropping from 32 seconds of CPU to 19 (the remainder is mostly due to pack-objects actually sending the pack; timing just the upload-pack portion shows we go from 13s to ~0.28s). These are all highly gamed numbers, of course. For real-world partial-clone requests we're saving only a small bit of time in practice. But it does help harden upload-pack against malicious denial-of-service attacks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28revision: implement `git log --merge` also for rebase/cherry-pick/revertMichael Lohmann1-8/+23
'git log' learned in ae3e5e1ef2 (git log -p --merge [[--] paths...], 2006-07-03) to show commits touching conflicted files in the range HEAD...MERGE_HEAD, an addition documented in d249b45547 (Document rev-list's option --merge, 2006-08-04). It can be useful to look at the commit history to understand what lead to merge conflicts also for other mergy operations besides merges, like cherry-pick, revert and rebase. For rebases and cherry-picks, an interesting range to look at is HEAD...{REBASE_HEAD,CHERRY_PICK_HEAD}, since even if all the commits included in that range are not directly part of the 3-way merge, conflicts encountered during these operations can indeed be caused by changes introduced in preceding commits on both sides of the history. For revert, as we are (most likely) reversing changes from a previous commit, an appropriate range is REVERT_HEAD..HEAD, which is equivalent to REVERT_HEAD...HEAD and to HEAD...REVERT_HEAD, if we keep HEAD and its parents on the left side of the range. As such, adjust the code in prepare_show_merge so it constructs the range HEAD...$OTHER for OTHER={MERGE_HEAD, CHERRY_PICK_HEAD, REVERT_HEAD or REBASE_HEAD}. Note that we try these pseudorefs in order, so keep REBASE_HEAD last since the three other operations can be performed during a rebase. Note also that in the uncommon case where $OTHER and HEAD do not share a common ancestor, this will show the complete histories of both sides since their root commits, which is the same behaviour as currently happens in that case for HEAD and MERGE_HEAD. Adjust the documentation of this option accordingly. Co-authored-by: Johannes Sixt <j6t@kdbg.org> Co-authored-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com> [jc: tweaked in j6t's precedence fix that tries REBASE_HEAD last] Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28revision: ensure MERGE_HEAD is a ref in prepare_show_mergeMichael Lohmann1-1/+5
This is done to (1) ensure MERGE_HEAD is a ref, (2) obtain the oid without any prefixing by refs.c:repo_dwim_ref() (3) error out when MERGE_HEAD is a symref. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28revision: fix --missing=[print|allow*] for annotated tagsChristian Couder1-1/+7
In 9830926c7d (rev-list: add commit object support in `--missing` option, 2023-10-27) we fixed the `--missing` option in `git rev-list` so that it works with missing commits, not just blobs/trees. Unfortunately, such a command was still failing with a "fatal: bad object <oid>" if it was passed a missing commit, blob or tree as an argument (before the rev walking even begins). This was fixed in a recent commit. That fix still doesn't work when an argument passed to the command is an annotated tag pointing to a missing commit though. In that case `git rev-list --missing=...` still errors out with a "fatal: bad object <oid>" error where <oid> is the object ID of the missing commit. Let's fix this issue, and also, while at it, let's add tests not just for annotated tags but also for regular tags and branches. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: drop unused params from the reflog iterator callbackPatrick Steinhardt1-3/+1
The ref and reflog iterators share much of the same underlying code to iterate over the corresponding entries. This results in some weird code because the reflog iterator also exposes an object ID as well as a flag to the callback function. Neither of these fields do refer to the reflog though -- they refer to the corresponding ref with the same name. This is quite misleading. In practice at least the object ID cannot really be implemented in any other way as a reflog does not have a specific object ID in the first place. This is further stressed by the fact that none of the callbacks except for our test helper make use of these fields. Split up the infrastucture so that ref and reflog iterators use separate callback signatures. This allows us to drop the nonsensical fields from the reflog iterator. Note that internally, the backends still use the same shared infra to iterate over both types. As the backends should never end up being called directly anyway, this is not much of a problem and thus kept as-is for simplicity's sake. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-14rev-list: allow missing tips with --missing=[print|allow*]Christian Couder1-3/+11
In 9830926c7d (rev-list: add commit object support in `--missing` option, 2023-10-27) we fixed the `--missing` option in `git rev-list` so that it works with with missing commits, not just blobs/trees. Unfortunately, such a command would still fail with a "fatal: bad object <oid>" if it is passed a missing commit, blob or tree as an argument (before the rev walking even begins). When such a command is used to find the dependencies of some objects, for example the dependencies of quarantined objects (see the "QUARANTINE ENVIRONMENT" section in the git-receive-pack(1) documentation), it would be better if the command would instead consider such missing objects, especially commits, in the same way as other missing objects. If, for example `--missing=print` is used, it would be nice for some use cases if the missing tips passed as arguments were reported in the same way as other missing objects instead of the command just failing. We could introduce a new option to make it work like this, but most users are likely to prefer the command to have this behavior as the default one. Introducing a new option would require another dumb loop to look for that option early, which isn't nice. Also we made `git rev-list` work with missing commits very recently and the command is most often passed commits as arguments. So let's consider this as a bug fix related to these recent changes. While at it let's add a NEEDSWORK comment to say that we should get rid of the existing ugly dumb loops that parse the `--exclude-promisor-objects` and `--missing=...` options early. Helped-by: Linus Arver <linusa@google.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-14revision: clarify a 'return NULL' in get_reference()Christian Couder1-1/+1
When we know a pointer variable is NULL, it's clearer to explicitly return NULL than to return that variable. In get_reference(), when 'object' is NULL, we already return NULL when 'revs->exclude_promisor_objects && is_promisor_object(oid)' is true, but we return 'object' when 'revs->ignore_missing' is true. Let's make the code clearer and more uniform by also explicitly returning NULL when 'revs->ignore_missing' is true. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-08Merge branch 'en/header-cleanup' into maint-2.43Junio C Hamano1-2/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-02-08Merge branch 'rs/incompatible-options-messages' into maint-2.43Junio C Hamano1-12/+15
Clean-up code that handles combinations of incompatible options. * rs/incompatible-options-messages: worktree: simplify incompatibility message for --orphan and commit-ish worktree: standardize incompatibility messages clean: factorize incompatibility message revision, rev-parse: factorize incompatibility messages about - -exclude-hidden revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs repack: use die_for_incompatible_opt3() for -A/-k/--cruft push: use die_for_incompatible_opt4() for - -delete/--tags/--all/--mirror
2024-02-08Merge branch 'rs/i18n-cannot-be-used-together' into maint-2.43Junio C Hamano1-2/+2
Clean-up code that handles combinations of incompatible options. * rs/i18n-cannot-be-used-together: i18n: factorize even more 'incompatible options' messages
2024-02-08Merge branch 'jc/revision-parse-int' into maint-2.43Junio C Hamano1-11/+30
The command line parser for the "log" family of commands was too loose when parsing certain numbers, e.g., silently ignoring the extra 'q' in "git log -n 1q" without complaining, which has been tightened up. * jc/revision-parse-int: revision: parse integer arguments to --max-count, --skip, etc., more carefully
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano1-2/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren1-2/+0
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-20Merge branch 'rs/incompatible-options-messages'Junio C Hamano1-12/+15
Clean-up code that handles combinations of incompatible options. * rs/incompatible-options-messages: worktree: simplify incompatibility message for --orphan and commit-ish worktree: standardize incompatibility messages clean: factorize incompatibility message revision, rev-parse: factorize incompatibility messages about - -exclude-hidden revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs repack: use die_for_incompatible_opt3() for -A/-k/--cruft push: use die_for_incompatible_opt4() for - -delete/--tags/--all/--mirror
2023-12-20Merge branch 'jc/revision-parse-int'Junio C Hamano1-11/+30
The command line parser for the "log" family of commands was too loose when parsing certain numbers, e.g., silently ignoring the extra 'q' in "git log -n 1q" without complaining, which has been tightened up. * jc/revision-parse-int: revision: parse integer arguments to --max-count, --skip, etc., more carefully
2023-12-09revision: parse integer arguments to --max-count, --skip, etc., more carefullyJunio C Hamano1-11/+30
The "rev-list" and other commands in the "log" family, being the oldest part of the system, use their own custom argument parsers, and integer values of some options are parsed with atoi(), which allows a non-digit after the number (e.g., "1q") to be silently ignored. As a natural consequence, an argument that does not begin with a digit (e.g., "q") silently becomes zero, too. Switch to use strtol_i() and parse_timestamp() appropriately to catch bogus input. Note that one may naïvely expect that --max-count, --skip, etc., to only take non-negative values, but we must allow them to also take negative values, as an escape hatch to countermand a limit set by an earlier option on the command line; the underlying variables are initialized to (-1) and "--max-count=-1", for example, is a legitimate way to reinitialize the limit. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09revision, rev-parse: factorize incompatibility messages about - -exclude-hiddenRené Scharfe1-6/+12
Use the standard parameterized message for reporting incompatible options to report options that are not accepted in combination with --exclude-hidden. This reduces the number of strings to translate and makes the UI a bit more consistent. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogsRené Scharfe1-6/+3
The revision option --reverse is incompatible with --walk-reflogs and --graph is incompatible with both --reverse and --walk-reflogs. So they are all incompatible with each other. Use the function for checking three mutually incompatible options, die_for_incompatible_opt3(), to perform this check in one place and without repetition. This is shorter and clearer. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-27i18n: factorize even more 'incompatible options' messagesRené Scharfe1-2/+2
Continue the work of 12909b6b8a (i18n: turn "options are incompatible" into "cannot be used together", 2022-01-05) and a699367bb8 (i18n: factorize more 'incompatible options' messages, 2022-01-31) to use the same parameterized error message for reporting incompatible command line options. This reduces the number of strings to translate and makes the UI slightly more consistent. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-01rev-list: add commit object support in `--missing` optionKarthik Nayak1-2/+14
The `--missing` object option in rev-list currently works only with missing blobs/trees. For missing commits the revision walker fails with a fatal error. Let's extend the functionality of `--missing` option to also support commit objects. This is done by adding a `missing_objects` field to `rev_info`. This field is an `oidset` to which we'll add the missing commits as we encounter them. The revision walker will now continue the traversal and call `show_commit()` even for missing commits. In rev-list we can then check if the commit is a missing commit and call the existing code for parsing `--missing` objects. A scenario where this option would be used is to find the boundary objects between different object directories. Consider a repository with a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a repository, using the `--missing=print` option while disabling the alternate object directory allows us to find the boundary objects between the main and alternate object directory. Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-13Merge branch 'jk/decoration-and-other-leak-fixes'Junio C Hamano1-0/+9
Leakfix. * jk/decoration-and-other-leak-fixes: daemon: free listen_addr before returning revision: clear decoration structs during release_revisions() decorate: add clear_decoration() function
2023-10-05revision: clear decoration structs during release_revisions()Jeff King1-0/+9
The point of release_revisions() is to free memory associated with the rev_info struct, but we have several "struct decoration" members that are left untouched. Since the previous commit introduced a function to do that, we can just call it. We do have to provide some specialized callbacks to map the void pointers onto real ones (the alternative would be casting the existing function pointers; this generally works because "void *" is usually interchangeable with a struct pointer, but it is technically forbidden by the standard). Since the line-log code does not expose the type it stores in the decoration (nor of course the function to free it), I put this behind a generic line_log_free() entry point. It's possible we may need to add more line-log specific bits anyway (running t4211 shows a number of other leaks in the line-log code). While this doubtless cleans up many leaks triggered by the test suite, the only script which becomes leak-free is t4217, as it does very little beyond a simple traversal (its existing leak was from the use of --children, which is now fixed). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-04Merge branch 'ps/revision-cmdline-stdin-not'Junio C Hamano1-5/+5
"git rev-list --stdin" learned to take non-revisions (like "--not") recently from the standard input, but the way such a "--not" was handled was quite confusing, which has been rethought. This is potentially a change that breaks backward compatibility. * ps/revision-cmdline-stdin-not: revision: make pseudo-opt flags read via stdin behave consistently
2023-10-02tree-walk: init_tree_desc take an oid to get the hash algorithmEric W. Biederman1-2/+2
To make it possible for git ls-tree to display the tree encoded in the hash algorithm of the oid specified to git ls-tree, update init_tree_desc to take as a parameter the oid of the tree object. Update all callers of init_tree_desc and init_tree_desc_gently to pass the oid of the tree object. Use the oid of the tree object to discover the hash algorithm of the oid and store that hash algorithm in struct tree_desc. Use the hash algorithm in decode_tree_entry and update_tree_entry_internal to handle reading a tree object encoded in a hash algorithm that differs from the repositories hash algorithm. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-25revision: make pseudo-opt flags read via stdin behave consistentlyPatrick Steinhardt1-5/+5
When reading revisions from stdin via git-rev-list(1)'s `--stdin` option then these revisions never honor flags like `--not` which have been passed on the command line. Thus, an invocation like e.g. `git rev-list --all --not --stdin` will not treat all revisions read from stdin as uninteresting. While this behaviour may be surprising to a user, it's been this way ever since it has been introduced via 42cabc341c4 (Teach rev-list an option to read revs from the standard input., 2006-09-05). With that said, in c40f0b7877 (revision: handle pseudo-opts in `--stdin` mode, 2023-06-15) we have introduced a new mode to read pseudo opts from standard input where this behaviour is a lot more confusing. If you pass `--not` via stdin, it will: - Influence subsequent revisions or pseudo-options passed on the command line. - Influence pseudo-options passed via standard input. - _Not_ influence normal revisions passed via standard input. This behaviour is extremely inconsistent and bound to cause confusion. While it would be nice to retroactively change the behaviour for how `--not` and `--stdin` behave together, chances are quite high that this would break existing scripts that expect the current behaviour that has been around for many years by now. This is thus not really a viable option to explore to fix the inconsistency. Instead, we change the behaviour of how pseudo-opts read via standard input influence the flags such that the effect is fully localized. With this change, when reading `--not` via standard input, it will: - _Not_ influence subsequent revisions or pseudo-options passed on the command line, which is a change in behaviour. - Influence pseudo-options passed via standard input. - Influence normal revisions passed via standard input, which is a change in behaviour. Thus, all flags read via standard input are fully self-contained to that standard input, only. While this is a breaking change as well, the behaviour has only been recently introduced with Git v2.42.0. Furthermore, the current behaviour can be regarded as a simple bug. With that in mind it feels like the right thing to retroactively change it and make the behaviour sane. Signed-off-by: Patrick Steinhardt <ps@pks.im> Reported-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-19range-diff: treat notes like `log`Kristoffer Haugsbakk1-0/+7
Currently, `range-diff` shows the default notes if no notes-related arguments are given. This is also how `log` behaves. But unlike `range-diff`, `log` does *not* show the default notes if `--notes=<custom>` are given. In other words, this: git log --notes=custom is equivalent to this: git log --no-notes --notes=custom While: git range-diff --notes=custom acts like this: git log --notes --notes-custom This can’t be how the user expects `range-diff` to behave given that the man page for `range-diff` under `--[no-]notes[=<ref>]` says: > This flag is passed to the `git log` program (see git-log(1)) that > generates the patches. This behavior also affects `format-patch` since it uses `range-diff` for the cover letter. Unlike `log`, though, `format-patch` is not supposed to show the default notes if no notes-related arguments are given.[1] But this promise is broken when the range-diff happens to have something to say about the changes to the default notes, since that will be shown in the cover letter. Remedy this by introducing `--show-notes-by-default` that `range-diff` can use to tell the `log` subprocess what to do. § Authors • Fix by Johannes • Tests by Kristoffer † 1: See e.g. 66b2ed09c2 (Fix "log" family not to be too agressive about showing notes, 2010-01-20). Co-authored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-25Merge branch 'jk/unused-parameter'Junio C Hamano1-1/+1
Mark-up unused parameters in the code so that we can eventually enable -Wunused-parameter by default. * jk/unused-parameter: t/helper: mark unused callback void data parameters tag: mark unused parameters in each_tag_name_fn callbacks rev-parse: mark unused parameter in for_each_abbrev callback replace: mark unused parameter in each_mergetag_fn callback replace: mark unused parameter in ref callback merge-tree: mark unused parameter in traverse callback fsck: mark unused parameters in various fsck callbacks revisions: drop unused "opt" parameter in "tweak" callbacks count-objects: mark unused parameter in alternates callback am: mark unused keep_cr parameters http-push: mark unused parameter in xml callback http: mark unused parameters in curl callbacks do_for_each_ref_helper(): mark unused repository parameter test-ref-store: drop unimplemented reflog-expire command
2023-07-21Merge branch 'tb/refs-exclusion-and-packed-refs'Junio C Hamano1-2/+2
Enumerating refs in the packed-refs file, while excluding refs that match certain patterns, has been optimized. * tb/refs-exclusion-and-packed-refs: ls-refs.c: avoid enumerating hidden refs where possible upload-pack.c: avoid enumerating hidden refs where possible builtin/receive-pack.c: avoid enumerating hidden references refs.h: implement `hidden_refs_to_excludes()` refs.h: let `for_each_namespaced_ref()` take excluded patterns revision.h: store hidden refs in a `strvec` refs/packed-backend.c: add trace2 counters for jump list refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) refs/packed-backend.c: refactor `find_reference_location()` refs: plumb `exclude_patterns` argument throughout builtin/for-each-ref.c: add `--exclude` option ref-filter.c: parameterize match functions over patterns ref-filter: add `ref_filter_clear()` ref-filter: clear reachable list pointers after freeing ref-filter.h: provide `REF_FILTER_INIT` refs.c: rename `ref_filter`
2023-07-17Merge branch 'cw/compat-util-header-cleanup'Junio C Hamano1-1/+0
Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header
2023-07-13revisions: drop unused "opt" parameter in "tweak" callbacksJeff King1-1/+1
The setup_revision_opt struct has a "tweak" function pointer, which can be used to adjust parameters after setup_revisions() parses arguments, but before it finalizes setup. In addition to the rev_info struct, the callback receives a pointer to the setup_revision_opt, as well. But none of the existing callbacks looks at the extra "opt" parameter, leading to -Wunused-parameter warnings. We could mark it as UNUSED, but instead let's remove it entirely. It's conceivable that it could be useful for a callback to have access to the "opt" struct. But in the 13 years that this mechanism has existed, nobody has used it. So let's just drop it in the name of simplifying. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10revision.h: store hidden refs in a `strvec`Taylor Blau1-1/+1
In subsequent commits, it will be convenient to have a 'const char **' of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`, etc.), instead of a `string_list`. Convert spots throughout the tree that store the list of hidden refs from a `string_list` to a `strvec`. Note that in `parse_hide_refs_config()` there is an ugly const-cast used to avoid an extra copy of each value before trimming any trailing slash characters. This could instead be written as: ref = xstrdup(value); len = strlen(ref); while (len && ref[len - 1] == '/') ref[--len] = '\0'; strvec_push(hide_refs, ref); free(ref); but the double-copy (once when calling `xstrdup()`, and another via `strvec_push()`) is wasteful. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs: plumb `exclude_patterns` argument throughoutTaylor Blau1-1/+1
The subsequent patch will want to access an optional `excluded_patterns` array within `refs/packed-backend.c` that will cull out certain references matching any of the given patterns on a best-effort basis. To do so, the refs subsystem needs to be updated to pass this value across a number of different locations. Prepare for a future patch by introducing this plumbing now, passing NULLs at top-level APIs in order to make that patch less noisy and more easily readable. Signed-off-by: Taylor Blau <me@ttaylorr.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-06Merge branch 'gc/config-context'Junio C Hamano1-1/+3
Reduce reliance on a global state in the config reading API. * gc/config-context: config: pass source to config_parser_event_fn_t config: add kvi.path, use it to evaluate includes config.c: remove config_reader from configsets config: pass kvi to die_bad_number() trace2: plumb config kvi config.c: pass ctx with CLI config config: pass ctx with config files config.c: pass ctx in configsets config: add ctx arg to config_fn_t urlmatch.h: use config_fn_t type config: inline git_color_default_config
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan1-1/+0
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-04Merge branch 'ps/revision-stdin-with-options'Junio C Hamano1-34/+48
The set-up code for the get_revision() API now allows feeding options like --all and --not in the --stdin mode. * ps/revision-stdin-with-options: revision: handle pseudo-opts in `--stdin` mode revision: small readability improvement for reading from stdin revision: reorder `read_revisions_from_stdin()`
2023-06-28config: add ctx arg to config_fn_tGlen Choo1-1/+3
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren1-1/+1
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21git-compat-util.h: remove unneccessary include of wildmatch.hElijah Newren1-0/+1
The include of wildmatch.h in git-compat-util.h was added in cebcab189aa (Makefile: add USE_WILDMATCH to use wildmatch as fnmatch, 2013-01-01) as a way to be able to compile-time force any calls to fnmatch() to instead invoke wildmatch(). The defines and inline function were removed in 70a8fc999d9 (stop using fnmatch (either native or compat), 2014-02-15), and this include in git-compat-util.h has been unnecessary ever since. Remove the include from git-compat-util.h, but add it to the .c files that had omitted the direct #include they needed. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren1-1/+1
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache*.h: move declarations for read-cache.c functions from cache.hElijah Newren1-0/+1
For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21sparse-index.h: move declarations for sparse-index.c from cache.hElijah Newren1-0/+1
Note in particular that this reverses the decision made in 118a2e8bde0 ("cache: move ensure_full_index() to cache.h", 2021-04-01). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-15revision: handle pseudo-opts in `--stdin` modePatrick Steinhardt1-4/+17
While both git-rev-list(1) and git-log(1) support `--stdin`, it only accepts commits and files. Most notably, it is impossible to pass any of the pseudo-opts like `--all`, `--glob=` or others via stdin. This makes it hard to use this function in certain scripted scenarios, like when one wants to support queries against specific revisions, but also against reference patterns. While this is theoretically possible by using arguments, this may run into issues once we hit platform limits with sufficiently large queries. And because `--stdin` cannot handle pseudo-opts, the only alternative would be to use a mixture of arguments and standard input, which is cumbersome. Implement support for handling pseudo-opts in both commands to support this usecase better. One notable restriction here is that `--stdin` only supports "stuck" arguments in the form of `--glob=foo`. This is because "unstuck" arguments would also require us to read the next line, which would add quite some complexity to the code. This restriction should be fine for scripted usage though. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-15revision: small readability improvement for reading from stdinPatrick Steinhardt1-8/+9
The code that reads lines from standard input manually compares whether the read line matches "--", which is a bit awkward to read. Furthermore, we're about to extend the code to also support reading pseudo-options via standard input, requiring more elaborate handling of lines with a leading dash. Refactor the code by hoisting out the check for "--" outside of the block that checks for a leading dash. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-15revision: reorder `read_revisions_from_stdin()`Patrick Steinhardt1-33/+33
Reorder `read_revisions_from_stdin()` so that we can start using `handle_revision_pseudo_opt()` without a forward declaration in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-25Merge branch 'en/header-split-cache-h'Junio C Hamano1-1/+4
Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-11treewide: reduce includes of cache.h in other headersElijah Newren1-1/+1
We had a handful of headers including cache.h that didn't need to anymore. Drop those includes and replace them with includes of smaller files, or forward declarations. However, note that two .c files now need to directly include cache.h, though they should have been including it all along given they are directly using structs defined in it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on trace.h & trace2.hElijah Newren1-0/+1
Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano1-0/+3
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano1-13/+14
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-13/+14
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-30Merge branch 'sg/parse-options-h-users'Junio C Hamano1-0/+1
Code clean-up to include and/or uninclude parse-options.h file as needed. * sg/parse-options-h-users: treewide: remove unnecessary inclusions of parse-options.h from headers treewide: include parse-options.h in source files
2023-03-28cocci: apply the "commit.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-6/+6
Apply the part of "the_repository.pending.cocci" pertaining to "commit.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "commit-reach.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-2/+2
Apply the part of "the_repository.pending.cocci" pertaining to "commit-reach.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "cache.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-5/+6
Apply the part of "the_repository.pending.cocci" pertaining to "cache.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21setup.h: move declarations for setup.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20treewide: include parse-options.h in source filesSZEDER Gábor1-0/+1
The builtins 'ls-remote', 'pack-objects', 'receive-pack', 'reflog' and 'send-pack' use parse_options(), but their source files don't directly include 'parse-options.h'. Furthermore, the source files 'diagnose.c', 'list-objects-filter-options.c', 'remote.c' and 'send-pack.c' define option parsing callback functions, while 'revision.c' defines an option parsing helper function, and thus need access to various fields in 'struct option' and 'struct parse_opt_ctx_t', but they don't directly include 'parse-options.h' either. They all can still be built, of course, because they include one of the header files that does include 'parse-options.h' (though unnecessarily, see the next commit). Add those missing includes to these files, as our general rule is that "a C file must directly include the header files that declare the functions and the types it uses". Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'ew/fetch-hiderefs'Junio C Hamano1-1/+2
A new "fetch.hideRefs" option can be used to exclude specified refs from "rev-list --objects --stdin --not --all" traversal for checking object connectivity, most useful when there are many unrelated histories in a single repository. * ew/fetch-hiderefs: fetch: support hideRefs to speed up connectivity checks
2023-03-17Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano1-3/+3
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-27fetch: support hideRefs to speed up connectivity checksEric Wong1-1/+2
With roughly 800 remotes all fetching into their own refs/remotes/$REMOTE/* island, the connectivity check[1] gets expensive for each fetch on systems which lack sufficient RAM to cache objects. To do a no-op fetch on one $REMOTE out of hundreds, hideRefs now allows the no-op fetch to take ~30 seconds instead of ~20 minutes on a noisy, RAM-constrained machine (localhost, so no network latency): git -c fetch.hideRefs=refs \ -c fetch.hideRefs='!refs/remotes/$REMOTE/' \ fetch $REMOTE [1] `git rev-list --objects --stdin --not --all --quiet --alternate-refs' Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24for_each_object: mark unused callback parametersJeff King1-2/+2
The for_each_{loose,packed}_object interface uses callback functions, but not every callback needs all of the parameters. Mark the unused ones to satisfy -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24mark "pointless" data pointers in callbacksJeff King1-1/+1
Both the object_array_filter() and trie_find() functions use callback functions that let the caller specify which elements match. These callbacks take a void pointer in case the caller wants to pass in extra data. But in each case, the single user of these functions just passes NULL, and the callback ignores the extra pointer. We could just remove these unused parameters from the callback interface entirely. But it's good practice to provide such a pointer, as it guides future callers of the function in the right direction (rather than tempting them to access global data). Plus it's consistent with other generic callback interfaces. So let's instead annotate the unused parameters, in order to silence the compiler's -Wunused-parameter warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren1-1/+2
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-17treewide: always have a valid "index_state.repo" memberÆvar Arnfjörð Bjarmason1-1/+1
When the "repo" member was added to "the_index" in [1] the repo_read_index() was made to populate it, but the unpopulated "the_index" variable didn't get the same treatment. Let's do that in initialize_the_repository() when we set it up, and likewise for all of the current callers initialized an empty "struct index_state". This simplifies code that needs to deal with "the_index" or a custom "struct index_state", we no longer need to second-guess this part of the "index_state" deep in the stack. A recent example of such second-guessing is the "istate->repo ? istate->repo : the_repository" code in [2]. We can now simply use "istate->repo". We're doing this by making use of the INDEX_STATE_INIT() macro (and corresponding function) added in [3], which now have mandatory "repo" arguments. Because we now call index_state_init() in repository.c's initialize_the_repository() we don't need to handle the case where we have a "repo->index" whose "repo" member doesn't match the "repo" we're setting up, i.e. the "Complete the double-reference" code in repo_read_index() being altered here. That logic was originally added in [1], and was working around the lack of what we now have in initialize_the_repository(). For "fsmonitor-settings.c" we can remove the initialization of a NULL "r" argument to "the_repository". This was added back in [4], and was needed at the time for callers that would pass us the "r" from an "istate->repo". Before this change such a change to "fsmonitor-settings.c" would segfault all over the test suite (e.g. in t0002-gitfile.sh). This change has wider eventual implications for "fsmonitor-settings.c". The reason the other lazy loading behavior in it is required (starting with "if (!r->settings.fsmonitor) ..." is because of the previously passed "r" being "NULL". I have other local changes on top of this which move its configuration reading to "prepare_repo_settings()" in "repo-settings.c", as we could now start to rely on it being called for our "r". But let's leave all of that for now, and narrowly remove this particular part of the lazy-loading. 1. 1fd9ae517c4 (repository: add repo reference to index_state, 2021-01-23) 2. ee1f0c242ef (read-cache: add index.skipHash config option, 2023-01-06) 3. 2f6b1eb794e (cache API: add a "INDEX_STATE_INIT" macro/function, add release_index(), 2023-01-12) 4. 1e0ea5c4316 (fsmonitor: config settings are repository-specific, 2022-03-25) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-16cache API: add a "INDEX_STATE_INIT" macro/function, add release_index()Ævar Arnfjörð Bjarmason1-1/+1
Hopefully in some not so distant future, we'll get advantages from always initializing the "repo" member of the "struct index_state". To make that easier let's introduce an initialization macro & function. The various ad-hoc initialization of the structure can then be changed over to it, and we can remove the various "0" assignments in discard_index() in favor of calling index_state_init() at the end. While not strictly necessary, let's also change the CALLOC_ARRAY() of various "struct index_state *" to use an ALLOC_ARRAY() followed by index_state_init() instead. We're then adding the release_index() function and converting some callers (including some of these allocations) over to it if they either won't need to use their "struct index_state" again, or are just about to call index_state_init(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-26Merge branch 'jk/unused-post-2.39'Junio C Hamano1-10/+15
Code clean-up around unused function parameters. * jk/unused-post-2.39: userdiff: mark unused parameter in internal callback list-objects-filter: mark unused parameters in virtual functions diff: mark unused parameters in callbacks xdiff: mark unused parameter in xdl_call_hunk_func() xdiff: drop unused parameter in def_ff() ws: drop unused parameter from ws_blank_line() list-objects: drop process_gitlink() function blob: drop unused parts of parse_blob_buffer() ls-refs: use repository parameter to iterate refs
2022-12-14Merge branch 'ab/various-leak-fixes'Junio C Hamano1-0/+1
Various leak fixes. * ab/various-leak-fixes: built-ins: use free() not UNLEAK() if trivial, rm dead code revert: fix parse_options_concat() leak cherry-pick: free "struct replay_opts" members rebase: don't leak on "--abort" connected.c: free the "struct packed_git" sequencer.c: fix "opts->strategy" leak in read_strategy_opts() ls-files: fix a --with-tree memory leak revision API: call graph_clear() in release_revisions() unpack-file: fix ancient leak in create_temp_file() built-ins & libs & helpers: add/move destructors, fix leaks dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache" read-cache.c: clear and free "sparse_checkout_patterns" commit: discard partial cache before (re-)reading it {reset,merge}: call discard_index() before returning tests: mark tests as passing with SANITIZE=leak
2022-12-13diff: mark unused parameters in callbacksJeff King1-10/+15
The diff code provides a format_callback interface, but not every callback needs each parameter (e.g., the "opt" and "data" parameters are frequently left unused). Likewise for the output_prefix callback, the low-level change/add_remove interfaces, the callbacks used by xdi_diff(), etc. Mark unused arguments in the callback implementations to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-23Merge branch 'ps/receive-use-only-advertised'Junio C Hamano1-40/+89
"git receive-pack" used to use all the local refs as the boundary for checking connectivity of the data "git push" sent, but now it uses only the refs that it advertised to the pusher. In a repository with the .hideRefs configuration, this reduces the resources needed to perform the check. cf. <221028.86bkpw805n.gmgdl@evledraar.gmail.com> cf. <xmqqr0yrizqm.fsf@gitster.g> * ps/receive-use-only-advertised: receive-pack: only use visible refs for connectivity check rev-parse: add `--exclude-hidden=` option revision: add new parameter to exclude hidden refs revision: introduce struct to handle exclusions revision: move together exclusion-related functions refs: get rid of global list of hidden refs refs: fix memory leak when parsing hideRefs config
2022-11-21revision API: call graph_clear() in release_revisions()Ævar Arnfjörð Bjarmason1-0/+1
Call graph_clear() in release_revisions(), this will free memory allocated by e.g. this command, which will now run without memory leaks: git -P log -1 --graph --no-graph --graph Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17revision: add new parameter to exclude hidden refsPatrick Steinhardt1-1/+54
Users can optionally hide refs from remote users in git-upload-pack(1), git-receive-pack(1) and others via the `transfer.hideRefs`, but there is not an easy way to obtain the list of all visible or hidden refs right now. We'll require just that though for a performance improvement in our connectivity check. Add a new option `--exclude-hidden=` that excludes any hidden refs from the next pseudo-ref like `--all` or `--branches`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17revision: introduce struct to handle exclusionsPatrick Steinhardt1-26/+22
The functions that handle exclusion of refs work on a single string list. We're about to add a second mechanism for excluding refs though, and it makes sense to reuse much of the same architecture for both kinds of exclusion. Introduce a new `struct ref_exclusions` that encapsulates all the logic related to excluding refs and move the `struct string_list` that holds all wildmatch patterns of excluded refs into it. Rename functions that operate on this struct to match its name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17revision: move together exclusion-related functionsPatrick Steinhardt1-26/+26
Move together the definitions of functions that handle exclusions of refs so that related functionality sits in a single place, only. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08revisions API: extend the nascent REV_INFO_INIT macroÆvar Arnfjörð Bjarmason1-17/+2
Have the REV_INFO_INIT macro added in [1] declare more members of "struct rev_info" that we can initialize statically, and have repo_init_revisions() do so with the memcpy(..., &blank) idiom introduced in [2]. As the comment for the "REV_INFO_INIT" macro notes this still isn't sufficient to initialize a "struct rev_info" for use yet. But we are getting closer to that eventual goal. Even though we can't fully initialize a "struct rev_info" with REV_INFO_INIT it's useful for readability to clearly separate those things that we can statically initialize, and those that we can't. This change could replace the: list_objects_filter_init(&revs->filter); In the repo_init_revisions() with this line, at the end of the REV_INFO_INIT deceleration in revisions.h: .filter = LIST_OBJECTS_FILTER_INIT, \ But doing so would produce a minor conflict with an outstanding topic[3]. Let's skip that for now. I have follow-ups to initialize more of this statically, e.g. changes to get rid of grep_init(). We can initialize more members with the macro in a future series. 1. f196c1e908d (revisions API users: use release_revisions() needing REV_INFO_INIT, 2022-04-13) 2. 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT macro, 2021-07-01) 3. https://lore.kernel.org/git/265b292ed5c2de19b7118dfe046d3d9d932e2e89.1667901510.git.ps@pks.im/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-25Merge branch 'rs/diff-caret-bang-with-parents'Junio C Hamano1-3/+2
"git diff rev^!" did not show combined diff to go to the rev from its parents. * rs/diff-caret-bang-with-parents: diff: support ^! for merges revisions.txt: unspecify order of resolved parts of ^! revision: use strtol_i() for exclude_parent
2022-10-10CodingGuidelines: allow declaring variables in for loopsÆvar Arnfjörð Bjarmason1-7/+0
Since 44ba10d6712 (revision: use C99 declaration of variable in for() loop, 2021-11-14) released with v2.35.0 we've had a variable declared with in a for loop. Since then we've had inadvertent follow-ups to that with at least cb2607759e2 (merge-ort: store more specific conflict information, 2022-06-18) released with v2.38.0. As November 2022 is within the window of this upcoming release, let's update the guideline to allow this. We can have the promised "revisit" discussion while this patch cooks, and drop it if it turns out that it is still premature, which is not expected to happen at this moment. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-01revision: use strtol_i() for exclude_parentRené Scharfe1-3/+2
Avoid silent overflow of the int exclude_parent by using the appropriate function, strtol_i(), to parse its value. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-19Merge branch 'jk/list-objects-filter-cleanup'Junio C Hamano1-0/+1
A couple of bugfixes with code clean-up. * jk/list-objects-filter-cleanup: list-objects-filter: convert filter_spec to a strbuf list-objects-filter: add and use initializers list-objects-filter: handle null default filter spec list-objects-filter: don't memset after releasing filter struct
2022-09-14Merge branch 'ab/unused-annotation'Junio C Hamano1-9/+9
Undoes 'jk/unused-annotation' topic and redoes it to work around Coccinelle rules misfiring false positives in unrelated codepaths. * ab/unused-annotation: git-compat-util.h: use "deprecated" for UNUSED variables git-compat-util.h: use "UNUSED", not "UNUSED(var)"
2022-09-14Merge branch 'jk/unused-annotation'Junio C Hamano1-7/+11
Annotate function parameters that are not used (but cannot be removed for structural reasons), to prepare us to later compile with -Wunused warning turned on. * jk/unused-annotation: is_path_owned_by_current_uid(): mark "report" parameter as unused run-command: mark unused async callback parameters mark unused read_tree_recursive() callback parameters hashmap: mark unused callback parameters config: mark unused callback parameters streaming: mark unused virtual method parameters transport: mark bundle transport_options as unused refs: mark unused virtual method parameters refs: mark unused reflog callback parameters refs: mark unused each_ref_fn parameters git-compat-util: add UNUSED macro
2022-09-13Merge branch 'jk/rev-list-verify-objects-fix'Junio C Hamano1-0/+1
"git rev-list --verify-objects" ought to inspect the contents of objects and notice corrupted ones, but it didn't when the commit graph is in use, which has been corrected. * jk/rev-list-verify-objects-fix: rev-list: disable commit graph with --verify-objects lookup_commit_in_graph(): use prepare_commit_graph() to check for graph
2022-09-12list-objects-filter: add and use initializersJeff King1-0/+1
In 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec strings, 2022-09-08), we noted that the filter_spec string_list was inconsistent in how it handled memory ownership of strings stored in the list. The fix there was a bit of a band-aid to set the "strdup_strings" variable right before adding anything. That works OK, and it lets the users of the API continue to zero-initialize the struct. But it makes the code a bit hard to follow and accident-prone, as any other spots appending the filter_spec need to think about whether to set the strdup_strings value, too (there's one such spot in partial_clone_get_default_filter_spec(), which is probably a possible memory leak). So let's do that full cleanup now. We'll introduce a LIST_OBJECTS_FILTER_INIT macro and matching function, and use them as appropriate (though it is for the "_options" struct, this matches the corresponding list_objects_filter_release() function). This is harder than it seems! Many other structs, like git_transport_data, embed the filter struct. So they need to initialize it themselves even if the rest of the enclosing struct is OK with zero-initialization. I found all of the relevant spots by grepping manually for declarations of list_objects_filter_options. And then doing so recursively for structs which embed it, and ones which embed those, and so on. I'm pretty sure I got everything, but there's no change that would alert the compiler if any topics in flight added new declarations. To catch this case, we now double-check in the parsing function that things were initialized as expected and BUG() if appropriate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-07parse_object(): check commit-graph when skip_hash setJeff King1-13/+3
If the caller told us that they don't care about us checking the object hash, then we're free to implement any optimizations that get us the parsed value more quickly. An obvious one is to check the commit graph before loading an object from disk. And in fact, both of the callers who pass in this flag are already doing so before they call parse_object()! So we can simplify those callers, as well as any possible future ones, by moving the logic into parse_object(). There are two subtle things to note in the diff, but neither has any impact in practice: - it seems least-surprising here to do the graph lookup on the git-replace'd oid, rather than the original. This is in theory a change of behavior from the earlier code, as neither caller did a replace lookup itself. But in practice it doesn't matter, as we disable the commit graph entirely if there are any replace refs. - the caller in get_reference() passes the skip_hash flag only if revs->verify_objects isn't set, whereas it would look in the commit graph unconditionally. In practice this should not matter as we should disable the commit graph entirely when using verify_objects (and that was done recently in another patch). So this should be a pure cleanup with no behavior change. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-07upload-pack: skip parse-object re-hashing of "want" objectsJeff King1-1/+3
Imagine we have a history with commit C pointing to a large blob B. If a client asks us for C, we can generally serve both objects to them without accessing the uncompressed contents of B. In upload-pack, we figure out which commits we have and what the client has, and feed those tips to pack-objects. In pack-objects, we traverse the commits and trees (or use bitmaps!) to find the set of objects needed, but we never open up B. When we serve it to the client, we can often pass the compressed bytes directly from the on-disk packfile over the wire. But if a client asks us directly for B, perhaps because they are doing an on-demand fetch to fill in the missing blob of a partial clone, we end up much slower. Upload-pack calls parse_object() on the oid we receive, which opens up the object and re-checks its hash (even though if it were a commit, we might skip this parse entirely in favor of the commit graph!). And then we feed the oid directly to pack-objects, which again calls parse_object() and opens the object. And then finally, when we write out the result, we may send bytes straight from disk, but only after having unnecessarily uncompressed and computed the sha1 of the object twice! This patch teaches both code paths to use the new SKIP_HASH_CHECK flag for parse_object(). You can see the speed-up in p5600, which does a blob:none clone followed by a checkout. The savings for git.git are modest: Test HEAD^ HEAD ---------------------------------------------------------------------- 5600.3: checkout of result 2.23(4.19+0.24) 1.72(3.79+0.18) -22.9% But the savings scale with the number of bytes. So on a repository like linux.git with more files, we see more improvement (in both absolute and relative numbers): Test HEAD^ HEAD ---------------------------------------------------------------------------- 5600.3: checkout of result 51.62(77.26+2.76) 34.86(61.41+2.63) -32.5% And here's an even more extreme case. This is the android gradle-plugin repository, whose tip checkout has ~3.7GB of files: Test HEAD^ HEAD -------------------------------------------------------------------------- 5600.3: checkout of result 79.51(90.84+5.55) 40.28(51.88+5.67) -49.3% Keep in mind that these timings are of the whole checkout operation. So they count the client indexing the pack and actually writing out the files. If we want to see just the server's view, we can hack up the GIT_TRACE_PACKET output from those operations and replay it via upload-pack. For the gradle example, that gives me: Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input Time (mean ± σ): 50.884 s ± 0.239 s [User: 51.450 s, System: 1.726 s] Range (min … max): 50.608 s … 51.025 s 3 runs Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input Time (mean ± σ): 9.728 s ± 0.112 s [User: 10.466 s, System: 1.535 s] Range (min … max): 9.618 s … 9.842 s 3 runs Summary 'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran 5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input' So a server would see an 80% reduction in CPU serving the initial checkout of a partial clone for this repository. Or possibly even more depending on the packing; most of the time spent in the faster one were objects we had to open during the write phase. In both cases skipping the extra hashing on the server should be pretty safe. The client doesn't trust the server anyway, so it will re-hash all of the objects via index-pack. There is one thing to note, though: the change in get_reference() affects not just pack-objects, but rev-list, git-log, etc. We could use a flag to limit to index-pack here, but we may already skip hash checks in this instance. For commits, we'd skip anything we load via the commit-graph. And while before this commit we would check a blob fed directly to rev-list on the command-line, we'd skip checking that same blob if we found it by traversing a tree. The exception for both is if --verify-objects is used. In that case, we'll skip this optimization, and the new test makes sure we do this correctly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-07rev-list: disable commit graph with --verify-objectsJeff King1-0/+1
Since the point of --verify-objects is to actually load and checksum the bytes of each object, optimizing out reads using the commit graph runs contrary to our goal. The most targeted way to implement this would be for the revision traversal code to check revs->verify_objects and avoid using the commit graph. But it's difficult to be sure we've hit all of the correct spots. For instance, I started this patch by writing the first of the included test cases, where the corrupted commit is directly on rev-list's command line. And that is easy to fix by teaching get_reference() to check revs->verify_objects before calling lookup_commit_in_graph(). But that doesn't cover the second test case: when we traverse to a corrupted commit, we'd parse the parent in process_parents(). So we'd need to check there, too. And it keeps going. In handle_commit() we sometimes parses commits, too, though I couldn't figure out a way to trigger it that did not already parse via get_reference() or tag peeling. And try_to_simplify_commit() has its own parse call, and so on. So it seems like the safest thing is to just disable the commit graph for the whole process when we see the --verify-objects option. We can do that either in builtin/rev-list.c, where we use the option, or in revision.c, where we parse it. There are some subtleties: - putting it in rev-list.c is less surprising in some ways, because there we know we are just doing a single traversal. In a command which does multiple traversals in a single process, it's rather unexpected to globally disable the commit graph. - putting it in revision.c is less surprising in some ways, because the caller does not have to remember to disable the graph themselves. But this is already tricky! The verify_objects flag in rev_info doesn't do anything by itself. The caller has to provide an object callback which does the right thing. - for that reason, in practice nobody but rev-list uses this option in the first place. So the distinction is probably not important either way. Arguably it should just be an option of rev-list, and not the general revision machinery; right now you can run "git log --verify-objects", but it does not actually do anything useful. - checking for a parsed revs.verify_objects flag in rev-list.c is too late. By that time we've already passed the arguments to setup_revisions(), which will have parsed the commits using the graph. So this commit disables the graph as soon as we see the option in revision.c. That's a pretty broad hammer, but it does what we want, and in practice nobody but rev-list is using this flag anyway. The tests cover both the "tip" and "parent" cases. Obviously our hammer hits them both in this case, but it's good to check both in case somebody later tries the more focused approach. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason1-9/+9
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19hashmap: mark unused callback parametersJeff King1-2/+2
Hashmap comparison functions must conform to a particular callback interface, but many don't use all of their parameters. Especially the void cmp_data pointer, but some do not use keydata either (because they can easily form a full struct to pass when doing lookups). Let's mark these to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused reflog callback parametersJeff King1-2/+5
Functions used with for_each_reflog_ent() need to conform to a particular interface, but not every function needs all of the parameters. Mark the unused ones to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused each_ref_fn parametersJeff King1-3/+4
Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19revision: allow --ancestry-path to take an argumentElijah Newren1-30/+59
We have long allowed users to run e.g. git log --ancestry-path master..seen which shows all commits which satisfy all three of these criteria: * are an ancestor of seen * are not an ancestor of master * have master as an ancestor This commit allows another variant: git log --ancestry-path=$TOPIC master..seen which shows all commits which satisfy all of these criteria: * are an ancestor of seen * are not an ancestor of master * have $TOPIC in their ancestry-path that last bullet can be defined as commits meeting any of these criteria: * are an ancestor of $TOPIC * have $TOPIC as an ancestor * are $TOPIC This also allows multiple --ancestry-path arguments, which can be used to find commits with any of the given topics in their ancestry path. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12Merge branch 'ab/plug-revisions-leak'Junio C Hamano1-0/+2
Plug a bit more leaks in the revisions API. * ab/plug-revisions-leak: revisions API: don't leak memory on argv elements that need free()-ing bisect.c: partially fix bisect_rev_setup() memory leak log: refactor "rev.pending" code in cmd_show() log: fix a memory leak in "git show <revision>..." test-fast-rebase helper: use release_revisions() (again) bisect.c: add missing "goto" for release_revisions()
2022-08-10Merge branch 'jc/resolve-undo' into maintJunio C Hamano1-0/+36
The resolve-undo information in the index was not protected against GC, which has been corrected. source: <xmqq35f7kzad.fsf@gitster.g> * jc/resolve-undo: fsck: do not dereference NULL while checking resolve-undo data revision: mark blobs needed for resolve-undo as reachable
2022-08-03Merge branch 'sa/cat-file-mailmap'Junio C Hamano1-47/+3
"git cat-file" learned an option to use the mailmap when showing commit and tag objects. * sa/cat-file-mailmap: cat-file: add mailmap support ident: rename commit_rewrite_person() to apply_mailmap_to_header() ident: move commit_rewrite_person() to ident.c revision: improve commit_rewrite_person()
2022-08-03revisions API: don't leak memory on argv elements that need free()-ingÆvar Arnfjörð Bjarmason1-0/+2
Add a "free_removed_argv_elements" member to "struct setup_revision_opt", and use it to fix several memory leaks. We have various memory leaks in APIs that take and munge "const char **argv", e.g. parse_options(). Sometimes these APIs are given the "argv" we get to the "main" function, in which case we don't leak memory, but other times we're giving it the "v" member of a "struct strvec" we created. There's several potential ways to fix those sort of leaks, we could add a "nodup" mode to "struct strvec", which would work for the cases where we push constant strings to it. But that wouldn't work as soon as we used strvec_pushf(), or otherwise needed to duplicate or create a string for that "struct strvec". Let's instead make it the responsibility of the revisions API. If it's going to clobber elements of argv it can also free() them, which it will now do if instructed to do so via "free_removed_argv_elements". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19Merge branch 'jc/resolve-undo'Junio C Hamano1-0/+36
The resolve-undo information in the index was not protected against GC, which has been corrected. * jc/resolve-undo: fsck: do not dereference NULL while checking resolve-undo data revision: mark blobs needed for resolve-undo as reachable
2022-07-18ident: rename commit_rewrite_person() to apply_mailmap_to_header()Siddharth Asthana1-1/+1
commit_rewrite_person() takes a commit buffer and replaces the idents in the header with their canonical versions using the mailmap mechanism. The name "commit_rewrite_person()" is misleading as it doesn't convey what kind of rewrite are we going to do to the buffer. It also doesn't clearly mention that the function will limit itself to the header part of the buffer. The new name, "apply_mailmap_to_header()", expresses the functionality of the function pretty clearly. We intend to use apply_mailmap_to_header() in git-cat-file to replace idents in the headers of commit and tag object buffers. So, we will be extending this function to take tag objects buffer as well and replace idents on the tagger header using the mailmap mechanism. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18ident: move commit_rewrite_person() to ident.cSiddharth Asthana1-74/+0
commit_rewrite_person() and rewrite_ident_line() are static functions defined in revision.c. Their usages are as follows: - commit_rewrite_person() takes a commit buffer and replaces the author and committer idents with their canonical versions using the mailmap mechanism - rewrite_ident_line() takes author/committer header lines from the commit buffer and replaces the idents with their canonical versions using the mailmap mechanism. This patch moves commit_rewrite_person() and rewrite_ident_line() to ident.c which contains many other functions related to idents like split_ident_line(). By moving commit_rewrite_person() to ident.c, we also intend to use it in git-cat-file to replace committer and author idents from the headers to their canonical versions using the mailmap mechanism. The function is moved as is for now to make it clear that there are no other changes, but it will be renamed in a following commit. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18revision: improve commit_rewrite_person()Siddharth Asthana1-17/+47
The function, commit_rewrite_person(), is designed to find and replace an ident string in the header part, and the way it avoids a random occurrence of "author A U Thor <author@example.com" in the text is by insisting "author" to appear at the beginning of line by passing "\nauthor " as "what". The implementation also doesn't make any effort to limit itself to the commit header by locating the blank line that appears after the header part and stopping the search there. Also, the interface forces the caller to make multiple calls if it wants to rewrite idents on multiple headers. It shouldn't be the case. To support the existing caller better, update commit_rewrite_person() to: - Make a single pass in the input buffer to locate headers named "author" and "committer" and replace idents on them. - Stop at the end of the header, ensuring that nothing in the body of the commit object is modified. The return type of the function commit_rewrite_person() has also been changed from int to void. This has been done because the caller of the function doesn't do anything with the return value of the function. By simplifying the interface of the commit_rewrite_person(), we also intend to expose it as a public function. We will also be renaming the function in a future commit to a different name which clearly tells that the function replaces idents in the header of the commit buffer. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-09revision: mark blobs needed for resolve-undo as reachableJunio C Hamano1-0/+36
The resolve-undo extension was added to the index in cfc5789a (resolve-undo: record resolved conflicts in a new index extension section, 2009-12-25). This extension records the blob object names and their modes of conflicted paths when the path gets resolved (e.g. with "git add"), to allow "undoing" the resolution with "checkout -m path". These blob objects should be guarded from garbage-collection while we have the resolve-undo information in the index (otherwise unresolve operation may try to use a blob object that has already been pruned away). But the code called from mark_reachable_objects() for the index forgets to do so. Teach add_index_objects_to_pending() helper to also add objects referred to by the resolve-undo extension. Also make matching changes to "fsck", which has code that is fairly similar to the reachability stuff, but have parallel implementations for all these stuff, which may (or may not) someday want to be unified. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07Merge branch 'ab/plug-leak-in-revisions'Junio C Hamano1-15/+55
Plug the memory leaks from the trickiest API of all, the revision walker. * ab/plug-leak-in-revisions: (27 commits) revisions API: add a TODO for diff_free(&revs->diffopt) revisions API: have release_revisions() release "topo_walk_info" revisions API: have release_revisions() release "date_mode" revisions API: call diff_free(&revs->pruning) in revisions_release() revisions API: release "reflog_info" in release revisions() revisions API: clear "boundary_commits" in release_revisions() revisions API: have release_revisions() release "prune_data" revisions API: have release_revisions() release "grep_filter" revisions API: have release_revisions() release "filter" revisions API: have release_revisions() release "cmdline" revisions API: have release_revisions() release "mailmap" revisions API: have release_revisions() release "commits" revisions API users: use release_revisions() for "prune_data" users revisions API users: use release_revisions() with UNLEAK() revisions API users: use release_revisions() in builtin/log.c revisions API users: use release_revisions() in http-push.c revisions API users: add "goto cleanup" for release_revisions() stash: always have the owner of "stash_info" free it revisions API users: use release_revisions() needing REV_INFO_INIT revision.[ch]: document and move code declared around "init" ...
2022-05-20Merge branch 'ep/maint-equals-null-cocci'Junio C Hamano1-2/+2
Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02Merge branch 'ep/maint-equals-null-cocci' for maint-2.35Junio C Hamano1-2/+2
* ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02tree-wide: apply equals-null.cocciJunio C Hamano1-2/+2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-23log: "--since-as-filter" option is a non-terminating "--since" variantMiklos Vajna1-0/+10
The "--since=<time>" option of "git log" limits the commits displayed by the command by stopping the traversal once it sees a commit whose timestamp is older than the given time and not digging further into its parents. This is OK in a history where a commit always has a newer timestamp than any of its parents'. Once you see a commit older than the given <time>, all ancestor commits of it are even older than the time anyway. It poses, however, a problem when there is a commit with a wrong timestamp that makes it appear older than its parents. Stopping traversal at the "incorrectly old" commit will hide its ancestors that are newer than that wrong commit and are newer than the cut-off time given with the --since option. --max-age and --after being the synonyms to --since, they share the same issue. Add a new "--since-as-filter" option that is a variant of "--since=<time>". Instead of stopping the traversal to hide an old enough commit and its all ancestors, exclude commits with an old timestamp from the output but still keep digging the history. Without other traversal stopping options, this will force the command in "git log" family to dig down the history to the root. It may be an acceptable cost for a small project with short history and many commits with screwy timestamps. It is quite unlikely for us to add traversal stopper other than since, so have this as a --since-as-filter option, rather than a separate --as-filter, that would be probably more confusing. Signed-off-by: Miklos Vajna <vmiklos@vmiklos.hu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: add a TODO for diff_free(&revs->diffopt)Ævar Arnfjörð Bjarmason1-0/+1
Add a TODO comment indicating that we should release "diffopt" in release_revisions(). In a preceding commit we started releasing the "pruning" member of the same type, but handling "diffopt" will require us to untangle the "no_free" conditions I added in e900d494dcf (diff: add an API for deferred freeing, 2021-02-11). Let's leave a TODO comment to that effect, and so that we don't forget refactor code that was changed to use release_revisions() in earlier commits to stop using the "diffopt" member after a call to release_revisions(). This works currently, but would become a logic error as soon as we started freeing "diffopt". Doing that change now doesn't harm anything, and future-proofs us against a later change to release_revisions(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: have release_revisions() release "topo_walk_info"Ævar Arnfjörð Bjarmason1-4/+12
Refactor the existing reset_topo_walk() into a thin wrapper for a release_revisions_topo_walk_info() + resetting the member to "NULL", and call release_revisions_topo_walk_info() from release_revisions(). This fixes memory leaks that have been with us ever since "topo_walk_info" was added to revision.[ch] in f0d9cc4196a (revision.c: begin refactoring --topo-order logic, 2018-11-01). Due to various other leaks this makes no tests pass in their entirety, but e.g. before this running this on git.git: ./git -P log --pretty=tformat:"%P %H | %s" --parents --full-history --topo-order -3 -- README.md Would report under SANITIZE=leak: SUMMARY: LeakSanitizer: 531064 byte(s) leaked in 6 allocation(s). Now we'll free all of that memory. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: have release_revisions() release "date_mode"Ævar Arnfjörð Bjarmason1-0/+1
Extend the the release_revisions() function so that it frees the "date_mode" in the "struct ref_info". This uses the date_mode_release() function added in 974c919d36d (date API: add and use a date_mode_release(), 2022-02-16). As that commit notes "t7004-tag.sh" tests for the leaks that are being fixed here. That test now fails "only" 44 tests, instead of the 46 it failed before this change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: call diff_free(&revs->pruning) in revisions_release()Ævar Arnfjörð Bjarmason1-0/+1
Call diff_free() on the "pruning" member of "struct rev_info". Doing so makes several tests pass under SANITIZE=leak. This was also the last missing piece that allows us to remove the UNLEAK() in "cmd_diff" and "cmd_diff_index", which allows us to use those commands as a canary for general leaks in the revisions API. See [1] for further rationale, and 886e1084d78 (builtin/: add UNLEAKs, 2017-10-01) for the commit that added the UNLEAK() there. 1. https://lore.kernel.org/git/220218.861r00ib86.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: release "reflog_info" in release revisions()Ævar Arnfjörð Bjarmason1-0/+1
Add a missing reflog_walk_info_release() to "reflog-walk.c" and use it in release_revisions(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: clear "boundary_commits" in release_revisions()Ævar Arnfjörð Bjarmason1-0/+1
Clear the "boundary_commits" object_array in release_revisions(). This makes a few more tests pass under SANITIZE=leak, including "t/t4126-apply-empty.sh" which started failed as an UNLEAK() in cmd_format_patch() was removed in a preceding commit. This also re-marks the various tests relying on "git format-patch" as passing under "SANITIZE=leak", in the preceding "revisions API users: use release_revisions() in builtin/log.c" commit those were marked as failing as we removed the UNLEAK(rev) from cmd_format_patch() in "builtin/log.c". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: have release_revisions() release "prune_data"Ævar Arnfjörð Bjarmason1-0/+1
Extend the the release_revisions() function so that it frees the "prune_data" in the "struct rev_info". This means that any code that calls "release_revisions()" already can get rid of adjacent calls to clear_pathspec(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: have release_revisions() release "grep_filter"Ævar Arnfjörð Bjarmason1-0/+1
Extend the the release_revisions() function so that it frees the "grep_filter" in the "struct rev_info".This allows us to mark a test as passing under "TEST_PASSES_SANITIZE_LEAK=true". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: have release_revisions() release "filter"Ævar Arnfjörð Bjarmason1-0/+1
Extend the the release_revisions() function so that it frees the "filter" in the "struct rev_info". This in combination with a preceding change to free "cmdline" means that we can mark another set of tests as passing under "TEST_PASSES_SANITIZE_LEAK=true". The "filter" member was added recently in ffaa137f646 (revision: put object filter into struct rev_info, 2022-03-09), and this fixes leaks intruded in the subsequent leak 7940941de1f (pack-objects: use rev.filter when possible, 2022-03-09) and 105c6f14ad3 (bundle: parse filter capability, 2022-03-09). The "builtin/pack-objects.c" leak in 7940941de1f was effectively with us already, but the variable was referred to by a "static" file-scoped variable. The "bundle.c " leak in 105c6f14ad3 was newly introduced with the new "filter" feature for bundles. The "t5600-clone-fail-cleanup.sh" change here to add "TEST_PASSES_SANITIZE_LEAK=true" is one of the cases where run-command.c in not carrying the abort() exit code upwards would have had that test passing before, but now it *actually* passes[1]. We should fix the lack of 1=1 mapping of SANITIZE=leak testing to actual leaks some other time, but it's an existing edge case, let's just mark the really-passing test as passing for now. 1. https://lore.kernel.org/git/220303.86fsnz5o9w.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: have release_revisions() release "cmdline"Ævar Arnfjörð Bjarmason1-0/+10
Extend the the release_revisions() function so that it frees the "cmdline" in the "struct rev_info". This in combination with a preceding change to free "commits" and "mailmap" means that we can whitelist another test under "TEST_PASSES_SANITIZE_LEAK=true". There was a proposal in [1] to do away with xstrdup()-ing this add_rev_cmdline(), perhaps that would be worthwhile, but for now let's just free() it. We could also make that a "char *" in "struct rev_cmdline_entry" itself, but since we own it let's expose it as a constant to outside callers. I proposed that in [2] but have since changed my mind. See 14d30cdfc04 (ref-filter: fix memory leak in `free_array_item()`, 2019-07-10), c514c62a4fd (checkout: fix leak of non-existent branch names, 2020-08-14) and other log history hits for "free((char *)" for prior art. This includes the tests we had false-positive passes on before my 6798b08e848 (perl Git.pm: don't ignore signalled failure in _cmd_close(), 2022-02-01), now they pass for real. Since there are 66 tests matching t/t[0-9]*git-svn*.sh it's easier to list those that don't pass than to touch most of those 66. So let's introduce a "TEST_FAILS_SANITIZE_LEAK=true", which if set in the tests won't cause lib-git-svn.sh to set "TEST_PASSES_SANITIZE_LEAK=true. This change also marks all the tests that we removed "TEST_FAILS_SANITIZE_LEAK=true" from in an earlier commit due to removing the UNLEAK() from cmd_format_patch(), we can now assert that its API use doesn't leak any "struct rev_info" memory. This change also made commit "t5503-tagfollow.sh" pass on current master, but that would regress when combined with ps/fetch-atomic-fixup's de004e848a9 (t5503: simplify setup of test which exercises failure of backfill, 2022-03-03) (through no fault of that topic, that change started using "git clone" in the test, which has an outstanding leak). Let's leave that test out for now to avoid in-flight semantic conflicts. 1. https://lore.kernel.org/git/YUj%2FgFRh6pwrZalY@carlos-mbp.lan/ 2. https://lore.kernel.org/git/87o88obkb1.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: have release_revisions() release "mailmap"Ævar Arnfjörð Bjarmason1-0/+9
Extend the the release_revisions() function so that it frees the "mailmap" in the "struct rev_info". The log family of functions now calls the clear_mailmap() function added in fa8afd18e5a (revisions API: provide and use a release_revisions(), 2021-09-19), allowing us to whitelist some tests with "TEST_PASSES_SANITIZE_LEAK=true". Unfortunately having a pointer to a mailmap in "struct rev_info" instead of an embedded member that we "own" get a bit messy, as can be seen in the change to builtin/commit.c. When we free() this data we won't be able to tell apart a pointer to a "mailmap" on the heap from one on the stack. As seen in ea57bc0d41b (log: add --use-mailmap option, 2013-01-05) the "log" family allocates it on the heap, but in the find_author_by_nickname() code added in ea16794e430 (commit: search author pattern against mailmap, 2013-08-23) we allocated it on the stack instead. Ideally we'd simply change that member to a "struct string_list mailmap" and never free() the "mailmap" itself, but that would be a much larger change to the revisions API. We have code that needs to hand an existing "mailmap" to a "struct rev_info", while we could change all of that, let's not go there now. The complexity isn't in the ownership of the "mailmap" per-se, but that various things assume a "rev_info.mailmap == NULL" means "doesn't want mailmap", if we changed that to an init'd "struct string_list we'd need to carefully refactor things to change those assumptions. Let's instead always free() it, and simply declare that if you add such a "mailmap" it must be allocated on the heap. Any modern libc will correctly panic if we free() a stack variable, so this should be safe going forward. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API: have release_revisions() release "commits"Ævar Arnfjörð Bjarmason1-0/+1
Extend the the release_revisions() function so that it frees the "commits" in the "struct rev_info". We don't expect to use this "struct rev_info" again, so there's no reason to NULL out revs->commits, as e.g. simplify_merges() and create_boundary_commit_list() do. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revision.[ch]: document and move code declared around "init"Ævar Arnfjörð Bjarmason1-0/+4
A subsequent commit will add "REV_INFO_INIT" macro adjacent to repo_init_revisions(), unfortunately between the "struct rev_info" itself and that function we've added various miscellaneous code between the two over the years. Let's move that code either lower in revision.h, giving it API docs while we're at it, or in cases where it wasn't public API at all move it into revision.c No lines of code are changed here, only moved around. The only changes are the addition of new API comments. The "tree_difference" variable could also be declared like this, which I think would be a lot clearer, but let's leave that for now to keep this a move-only change: static enum { REV_TREE_SAME, REV_TREE_NEW, /* Only new files */ REV_TREE_OLD, /* Only files removed */ REV_TREE_DIFFERENT, /* Mixed changes */ } tree_difference = REV_TREE_SAME; Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revision.[ch]: provide and start using a release_revisions()Ævar Arnfjörð Bjarmason1-0/+5
The users of the revision.[ch] API's "struct rev_info" are a major source of memory leaks in the test suite under SANITIZE=leak, which in turn adds a lot of noise when trying to mark up tests with "TEST_PASSES_SANITIZE_LEAK=true". The users of that API are largely one-shot, e.g. "git rev-list" or "git log", or the "git checkout" and "git stash" being modified here For these callers freeing the memory is arguably a waste of time, but in many cases they've actually been trying to free the memory, and just doing that in a buggy manner. Let's provide a release_revisions() function for these users, and start migrating them over per the plan outlined in [1]. Right now this only handles the "pending" member of the struct, but more will be added in subsequent commits. Even though we only clear the "pending" member now, let's not leave a trap in code like the pre-image of index_differs_from(), where we'd start doing the wrong thing as soon as the release_revisions() learned to clear its "diffopt". I.e. we need to call release_revisions() after we've inspected any state in "struct rev_info". This leaves in place e.g. clear_pathspec(&rev.prune_data) in stash_working_tree() in builtin/stash.c, subsequent commits will teach release_revisions() to free "prune_data" and other members that in some cases are individually cleared by users of "struct rev_info" by reaching into its members. Those subsequent commits will remove the relevant calls to e.g. clear_pathspec(). We avoid amending code in index_differs_from() in diff-lib.c as well as wt_status_collect_changes_index(), has_unstaged_changes() and has_uncommitted_changes() in wt-status.c in a way that assumes that we are already clearing the "diffopt" member. That will be handled in a subsequent commit. 1. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13cocci: add and apply free_commit_list() rulesÆvar Arnfjörð Bjarmason1-11/+6
Add and apply coccinelle rules to remove "if (E)" before "free_commit_list(E)", the function can accept NULL, and further change cases where "E = NULL" followed to also be unconditionally. The code changes in this commit were entirely made by the coccinelle rule being added here, and applied with: make contrib/coccinelle/free.cocci.patch patch -p1 <contrib/coccinelle/free.cocci.patch The only manual intervention here is that the the relevant code in commit.c has been manually re-indented. Suggested-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-23list-objects-filter: remove CL_ARG__FILTERDerrick Stolee1-2/+2
We have established the command-line interface for the --[no-]filter options for a while now, so we do not need a helper to make this editable in the future. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-21Merge branch 'ds/partial-bundles'Junio C Hamano1-0/+7
Bundle file format gets extended to allow a partial bundle, filtered by similar criteria you would give when making a partial/lazy clone. * ds/partial-bundles: clone: fail gracefully when cloning filtered bundle bundle: unbundle promisor packs bundle: create filtered bundles rev-list: move --filter parsing into revision.c bundle: parse filter capability list-objects: handle NULL function pointers MyFirstObjectWalk: update recommended usage list-objects: consolidate traverse_commit_list[_filtered] pack-bitmap: drop filter in prepare_bitmap_walk() pack-objects: use rev.filter when possible revision: put object filter into struct rev_info list-objects-filter-options: create copy helper index-pack: document and test the --promisor option
2022-03-09rev-list: move --filter parsing into revision.cDerrick Stolee1-0/+7
Now that 'struct rev_info' has a 'filter' member and most consumers of object filtering are using that member instead of an external struct, move the parsing of the '--filter' option out of builtin/rev-list.c and into revision.c. This use within handle_revision_pseudo_opt() allows us to find the option within setup_revisions() if the arguments are passed directly. In the case of a command such as 'git blame', the arguments are first scanned and checked with parse_revision_opt(), which complains about the option, so 'git blame --filter=blob:none <file>' does not become valid with this change. Some commands, such as 'git diff' gain this option without having it make an effect. And 'git diff --objects' was already possible, but does not actually make sense in that builtin. The key addition that is coming is 'git bundle create --filter=<X>' so we can create bundles containing promisor packs. More work is required to make them fully functional, but that will follow. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25Merge branch 'ab/grep-patterntype'Junio C Hamano1-3/+1
Some code clean-up in the "git grep" machinery. * ab/grep-patterntype: grep: simplify config parsing and option parsing grep.c: do "if (bool && memchr())" not "if (memchr() && bool)" grep.h: make "grep_opt.pattern_type_option" use its enum grep API: call grep_config() after grep_init() grep.c: don't pass along NULL callback value built-ins: trust the "prefix" from run_builtin() grep tests: add missing "grep.patternType" config tests grep tests: create a helper function for "BRE" or "ERE" log tests: check if grep_config() is called by "log"-like cmds grep.h: remove unused "regex_t regexp" from grep_opt
2022-02-23Merge branch 'ah/log-no-graph'Junio C Hamano1-4/+16
"git log --graph --graph" used to leak a graph structure, and there was no way to countermand "--graph" that appear earlier on the command line. A "--no-graph" option has been added and resource leakage has been plugged. * ah/log-no-graph: log: add a --no-graph option log: fix memory leak if --graph is passed multiple times