aboutsummaryrefslogtreecommitdiffstats
path: root/range-diff.c
AgeCommit message (Collapse)AuthorFilesLines
2025-09-25range-diff: rename other_arg to log_argKristoffer Haugsbakk1-5/+5
Rename `other_arg` to `log_arg` in `range_diff_options` and related places. “Other argument” comes from bd361918 (range-diff: pass through --notes to `git log`, 2019-11-20) which introduced Git notes handling to git-range-diff(1) by passing that option on to git-log(1). And that kind of name might be fine in a local context. However, it was initially spread among multiple files, and is now[1] part of the `range_diff_options` struct. It is, prima facie, difficult to guess what “other” means, especially when just looking at the struct. But with a little reading we find out that it is used for `--[no-]notes` and `--diff-merges`, which are both passed on to git-log(1). We should just rename it to reflect this role; `log_arg` suggests, along with the `strvec` type, that it is used to pass extra arguments to git-log(1). † 1: since f1ce6c19 (range-diff: combine all options in a single data structure, 2021-02-05) Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-29range-diff: add configurable memory limit for cost matrixPaulo Casaretto1-4/+16
When comparing large commit ranges (e.g., 250,000+ commits), range-diff attempts to allocate an n×n cost matrix that can exhaust available memory. For example, with 256,784 commits (n = 513,568), the matrix would require approximately 256GB of memory (513,568² × 4 bytes), causing either immediate segmentation faults due to integer overflow or system hangs. Add a memory limit check in get_correspondences() before allocating the cost matrix. This check uses the total size in bytes (n² × sizeof(int)) and compares it against a configurable maximum, preventing both excessive memory usage and integer overflow issues. The limit is configurable via a new --max-memory option that accepts human-readable sizes (e.g., "1G", "500M"). The default is 4GB for 64 bit systems and 2GB for 32 bit systems. This allows comparing ranges of approximately 32,000 (16,000) commits - generous for real-world use cases while preventing impractical operations. When the limit is exceeded, range-diff now displays a clear error message showing both the requested memory size and the maximum allowed, formatted in human-readable units for better user experience. Example usage: git range-diff --max-memory=1G branch1...branch2 git range-diff --max-memory=500M base..topic1 base..topic2 This approach was chosen over alternatives: - Pre-counting commits: Would require spawning additional git processes and reading all commits twice - Limiting by commit count: Less precise than actual memory usage - Streaming approach: Would require significant refactoring of the current algorithm This issue was previously discussed in: https://lore.kernel.org/git/RFC-cover-v2-0.5-00000000000-20211210T122901Z-avarab@gmail.com/ Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Paulo Casaretto <pcasaretto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt1-1/+1
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'js/range-diff-diff-merges'Junio C Hamano1-4/+11
"git range-diff" learned to optionally show and compare merge commits in the ranges being compared, with the --diff-merges option. * js/range-diff-diff-merges: range-diff: introduce the convenience option `--remerge-diff` range-diff: optionally include merge commits' diffs in the analysis
2024-12-16range-diff: optionally include merge commits' diffs in the analysisJohannes Schindelin1-4/+11
The `git log` command already offers support for including diffs for merges, via the `--diff-merges=<format>` option. Let's add corresponding support for `git range-diff`, too. This makes it more convenient to spot differences between commit ranges that contain merges. This is especially true in scenarios with non-trivial merges, i.e. merges introducing changes other than, or in addition to, what merge ORT would have produced. Merging a topic branch that changes a function signature into a branch that added a caller of that function, for example, would require the merge commit itself to adjust that caller to the modified signature. In my code reviews, I found the `--diff-merges=remerge` option particularly useful. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10Merge branch 'jk/output-prefix-cleanup'Junio C Hamano1-2/+2
Code clean-up. * jk/output-prefix-cleanup: diff: store graph prefix buf in git_graph struct diff: return line_prefix directly when possible diff: return const char from output_prefix callback diff: drop line_prefix_length field line-log: use diff_line_prefix() instead of custom helper
2024-10-03diff: return const char from output_prefix callbackJeff King1-2/+2
The diff_options structure has an output_prefix callback for returning a prefix string, but it does so by returning a pointer to a strbuf. This makes the interface awkward. There's no reason the callback should need to use a strbuf, and it creates questions about whether the ownership of the resulting buffer should be transferred to the caller (it should not be, but a recent attempt to clean up this code led to a double-free in some cases). The one advantage we get is that the strbuf contains a ptr/len pair, so we could in theory have a prefix with embedded NULs. But we can observe that none of the existing callbacks would ever produce such a NUL (they are usually just indentation or graph symbols, and even the "--line-prefix" option takes a NUL-terminated string). And anyway, only one caller (the one in log_tree_diff_flush) actually looks at the strbuf length. In every other case we use a helper function which discards the length and just returns the NUL-terminated string. So let's just have the callback return a "const char *" pointer. It's up to the callbacks themselves if they want to use a strbuf under the hood. And now the caller in log_tree_diff_flush() can just use the helper function along with everybody else. That lets us even simplify out the function pointer check, since the helper returns an empty string (technically this does mean we'll sometimes issue an empty fputs() call, but I don't think this code path is hot enough to care about that). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14userdiff: fix leaking memory for configured diff driversPatrick Steinhardt1-2/+4
The userdiff structures may be initialized either statically on the stack or dynamically via configuration keys. In the latter case we end up leaking memory because we didn't have any infrastructure to discern those strings which have been allocated statically and those which have been allocated dynamically. Refactor the code such that we have two pointers for each of these strings: one that holds the value as accessed by other subsystems, and one that points to the same string in case it has been allocated. Like this, we can safely free the second pointer and thus plug those memory leaks. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+2
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-29Merge branch 'kh/range-diff-notes'Junio C Hamano1-1/+1
"git range-diff --notes=foo" compared "log --notes=foo --notes" of the two ranges, instead of using just the specified notes tree. * kh/range-diff-notes: range-diff: treat notes like `log`
2023-09-19range-diff: treat notes like `log`Kristoffer Haugsbakk1-1/+1
Currently, `range-diff` shows the default notes if no notes-related arguments are given. This is also how `log` behaves. But unlike `range-diff`, `log` does *not* show the default notes if `--notes=<custom>` are given. In other words, this: git log --notes=custom is equivalent to this: git log --no-notes --notes=custom While: git range-diff --notes=custom acts like this: git log --notes --notes-custom This can’t be how the user expects `range-diff` to behave given that the man page for `range-diff` under `--[no-]notes[=<ref>]` says: > This flag is passed to the `git log` program (see git-log(1)) that > generates the patches. This behavior also affects `format-patch` since it uses `range-diff` for the cover letter. Unlike `log`, though, `format-patch` is not supposed to show the default notes if no notes-related arguments are given.[1] But this promise is broken when the range-diff happens to have something to say about the changes to the default notes, since that will be shown in the cover letter. Remedy this by introducing `--show-notes-by-default` that `range-diff` can use to tell the `log` subprocess what to do. § Authors • Fix by Johannes • Tests by Kristoffer † 1: See e.g. 66b2ed09c2 (Fix "log" family not to be too agressive about showing notes, 2010-01-20). Co-authored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-19hashmap: use expected signatures for comparison functionsJeff King1-4/+7
We prefer for callback functions to match the signature with which they'll be called, rather than casting them to the correct type when assigning function pointers. Even though casting often works in the real world, it is a violation of the standard. We did a mass conversion in 939af16eac (hashmap_cmp_fn takes hashmap_entry params, 2019-10-06), but have grown a few new cases since then. Because of the cast, the compiler does not complain. However, as of clang-18, UBSan will catch these at run-time, and the case in range-diff.c triggers when running t3206. After seeing that one, I scanned the results of: git grep '_fn)[^(]' '*.c' | grep -v typedef and found a similar case in compat/terminal.c (which presumably isn't called in the test suite, since it doesn't trigger UBSan). There might be other cases lurking if the cast is done using a typedef that doesn't end in "_fn", but loosening it finds too many false positives. I also looked for: git grep ' = ([a-z_]*) *[a-z]' '*.c' to find assignments that cast, but nothing looked like a function. The resulting code is unfortunately a little longer, but the bonus of using container_of() is that we are no longer restricted to the hashmap_entry being at the start of the struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21diff.h: remove unnecessary include of oidset.hElijah Newren1-0/+1
This also made it clear that several .c files depended upon various things that oidset included, but had omitted the direct #include for those headers. Add those now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24treewide: remove cache.h inclusion due to previous changesElijah Newren1-1/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11pager.h: move declarations for pager.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-6/+6
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "revision.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "revision.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "diff.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "diff.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "cache.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-4/+4
Apply the part of "the_repository.pending.cocci" pertaining to "cache.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-28range-diff: avoid compiler warning when char is unsignedRené Scharfe1-1/+1
Since 2b15969f61 (range-diff: let '--abbrev' option takes effect, 2023-02-20), GCC 11.3 on Ubuntu 22.04 on aarch64 warns (and errors out if the make variable DEVELOPER is set): range-diff.c: In function ‘output_pair_header’: range-diff.c:388:20: error: comparison is always false due to limited range of data type [-Werror=type-limits] 388 | if (abbrev < 0) | ^ cc1: all warnings being treated as errors That's because char is unsigned on that platform. Use int instead, just like in struct diff_options, to copy the value faithfully. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-21range-diff: let '--abbrev' option takes effectTeng Long1-4/+7
As mentioned in 'git-range-diff.txt': "`git range-diff` also accepts the regular diff options (see linkgit:git-diff[1])...", but '--abbrev' is not in the "regular" scope. In Git, the "abbrev" of an object may not be a fixed value in different repositories, depending on the needs of the them(Linus mentioned in e6c587c7 in 2016: "the Linux kernel project needs 11 to 12 hexdigits" at that time ), that's why a user may want to display abbrev according to a specified length. Although a similar effect can be achieved through configuration (like: git -c core.abbrev=<abbrev>), but based on ease of use (many users may not know that the -c option can be specified) and the description in existing document, supporting users to directly use '--abbrev', could be a good way. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13diff: mark unused parameters in callbacksJeff King1-4/+8
The diff code provides a format_callback interface, but not every callback needs each parameter (e.g., the "opt" and "data" parameters are frequently left unused). Likewise for the output_prefix callback, the low-level change/add_remove interfaces, the callbacks used by xdi_diff(), etc. Mark unused arguments in the callback implementations to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-14Merge branch 'ab/unused-annotation'Junio C Hamano1-1/+1
Undoes 'jk/unused-annotation' topic and redoes it to work around Coccinelle rules misfiring false positives in unrelated codepaths. * ab/unused-annotation: git-compat-util.h: use "deprecated" for UNUSED variables git-compat-util.h: use "UNUSED", not "UNUSED(var)"
2022-09-14Merge branch 'jk/unused-annotation'Junio C Hamano1-2/+4
Annotate function parameters that are not used (but cannot be removed for structural reasons), to prepare us to later compile with -Wunused warning turned on. * jk/unused-annotation: is_path_owned_by_current_uid(): mark "report" parameter as unused run-command: mark unused async callback parameters mark unused read_tree_recursive() callback parameters hashmap: mark unused callback parameters config: mark unused callback parameters streaming: mark unused virtual method parameters transport: mark bundle transport_options as unused refs: mark unused virtual method parameters refs: mark unused reflog callback parameters refs: mark unused each_ref_fn parameters git-compat-util: add UNUSED macro
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason1-1/+1
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-26range-diff: optionally accept pathspecsJohannes Schindelin1-1/+1
The `git range-diff` command can be quite expensive, which is not a surprise given that the underlying algorithm to match up pairs of commits between the provided two commit ranges has a cubic runtime. Therefore it makes sense to restrict the commit ranges as much as possible, to reduce the amount of input to that O(N^3) algorithm. In chatty repositories with wide trees, this is not necessarily possible merely by choosing commit ranges wisely. Let's give users another option to restrict the commit ranges: by providing a pathspec. That helps in repositories with wide trees because it is likely that the user has a good idea which subset of the tree they are actually interested in. Example: git range-diff upstream/main upstream/seen HEAD -- range-diff.c This shows commits that are either in the local branch or in `seen`, but not in `main`, skipping all commits that do not touch `range-diff.c`. Note: Since we piggy-back the pathspecs onto the `other_arg` mechanism that was introduced to be able to pass through the `--notes` option to the revision machinery, we must now ensure that the `other_arg` array is appended at the end (the revision range must come before the pathspecs, if any). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19hashmap: mark unused callback parametersJeff King1-2/+4
Hashmap comparison functions must conform to a particular callback interface, but many don't use all of their parameters. Especially the void cmp_data pointer, but some do not use keydata either (because they can easily form a full struct to pass when doing lookups). Let's mark these to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-13Merge branch 'pb/range-diff-with-submodule'Junio C Hamano1-1/+1
"git -c diff.submodule=log range-diff" did not show anything for submodules that changed in the ranges being compared, and "git -c diff.submodule=diff range-diff" did not work correctly. Fix this by including the "--submodule=short" output unconditionally to be compared. * pb/range-diff-with-submodule: range-diff: show submodule changes irrespective of diff.submodule
2022-06-06range-diff: show submodule changes irrespective of diff.submodulePhilippe Blain1-1/+1
After generating diffs for each range to be compared using a 'git log' invocation, range-diff.c::read_patches looks for the "diff --git" header in those diffs to recognize the beginning of a new change. In a project with submodules, and with 'diff.submodule=log' set in the config, this header is missing for the diff of a changed submodule, so any submodule changes are quietly ignored in the range-diff. When 'diff.submodule=diff' is set in the config, the "diff --git" header is also missing for the submodule itself, but is shown for submodule content changes, which can easily confuse 'git range-diff' and lead to errors such as: error: git apply: bad git-diff - inconsistent old filename on line 1 error: could not parse git header 'diff --git path/to/submodule/and/some/file/within ' error: could not parse log for '@{u}..@{1}' Force the submodule diff format to its default ("short") when invoking 'git log' to generate the patches for each range, such that submodule changes are always detected. Add a test, including an invocation with '--creation-factor=100' to force the second commit in the range not to be considered a complete rewrite, in order to verify we do indeed get the "short" format. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revision.[ch]: provide and start using a release_revisions()Ævar Arnfjörð Bjarmason1-1/+1
The users of the revision.[ch] API's "struct rev_info" are a major source of memory leaks in the test suite under SANITIZE=leak, which in turn adds a lot of noise when trying to mark up tests with "TEST_PASSES_SANITIZE_LEAK=true". The users of that API are largely one-shot, e.g. "git rev-list" or "git log", or the "git checkout" and "git stash" being modified here For these callers freeing the memory is arguably a waste of time, but in many cases they've actually been trying to free the memory, and just doing that in a buggy manner. Let's provide a release_revisions() function for these users, and start migrating them over per the plan outlined in [1]. Right now this only handles the "pending" member of the struct, but more will be added in subsequent commits. Even though we only clear the "pending" member now, let's not leave a trap in code like the pre-image of index_differs_from(), where we'd start doing the wrong thing as soon as the release_revisions() learned to clear its "diffopt". I.e. we need to call release_revisions() after we've inspected any state in "struct rev_info". This leaves in place e.g. clear_pathspec(&rev.prune_data) in stash_working_tree() in builtin/stash.c, subsequent commits will teach release_revisions() to free "prune_data" and other members that in some cases are individually cleared by users of "struct rev_info" by reaching into its members. Those subsequent commits will remove the relevant calls to e.g. clear_pathspec(). We avoid amending code in index_differs_from() in diff-lib.c as well as wt_status_collect_changes_index(), has_unstaged_changes() and has_uncommitted_changes() in wt-status.c in a way that assumes that we are already clearing the "diffopt" member. That will be handled in a subsequent commit. 1. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-04range-diff: plug memory leak in read_patches()Ævar Arnfjörð Bjarmason1-17/+12
Amend code added in d9c66f0b5bf (range-diff: first rudimentary implementation, 2018-08-13) to use a "goto cleanup" pattern. This makes for less code, and frees memory that we'd previously leak. The reason for changing free(util) to FREE_AND_NULL(util) is because at the end of the function we append the contents of "util" to a "struct string_list" if it's non-NULL. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-04range-diff: plug memory leak in common invocationÆvar Arnfjörð Bjarmason1-0/+1
Create a public release_patch() version of the private free_patch() function added in 13b5af22f39 (apply: move libified code from builtin/apply.c to apply.{c,h}, 2016-04-22). Unlike the existing function this one doesn't free() the "struct patch" itself, so we can use it for variables on the stack. Use it in range-diff.c to fix a memory leak in common range-diff invocations, e.g.: git -P range-diff origin/master origin/next origin/seen Would emit several errors when compiled with SANITIZE=leak, but now runs cleanly. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05i18n: refactor "foo and bar are mutually exclusive"Jean-Noël Avila1-1/+1
Use static strings for constant parts of the sentences. They are all turned into "cannot be used together". Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-15Merge branch 'rs/range-diff-avoid-segfault-with-I'Junio C Hamano1-0/+3
"git range-diff -I... <range> <range>" segfaulted, which has been corrected. * rs/range-diff-avoid-segfault-with-I: range-diff: avoid segfault with -I
2021-09-07range-diff: avoid segfault with -IRené Scharfe1-0/+3
output() reuses the same struct diff_options for multiple calls of diff_flush(). Set the option no_free to instruct it to keep the ignore regexes between calls and release them explicitly at the end. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30Merge branch 'jk/range-diff-fixes'Junio C Hamano1-16/+13
"git range-diff" code clean-up. * jk/range-diff-fixes: range-diff: use ssize_t for parsed "len" in read_patches() range-diff: handle unterminated lines in read_patches() range-diff: drop useless "offset" variable from read_patches()
2021-08-10range-diff: use ssize_t for parsed "len" in read_patches()Jeff King1-1/+1
As we iterate through the buffer containing git-log output, parsing lines, we use an "int" to store the size of an individual line. This should be a size_t, as we have no guarantee that there is not a malicious 2GB+ commit-message line in the output. Overflowing this integer probably doesn't do anything _too_ terrible. We are not using the value to size a buffer, so the worst case is probably an out-of-bounds read from before the array. But it's easy enough to fix. Note that we have to use ssize_t here, since we also store the length result from parse_git_diff_header(), which may return a negative value for error. That function actually returns an int itself, which has a similar overflow problem, but I'll leave that for another day. Much of the apply.c code uses ints and should be converted as a whole; in the meantime, a negative return from parse_git_diff_header() will be interpreted as an error, and we'll bail (so we can't handle such a case, but given that it's likely to be malicious anyway, the important thing is we don't have any memory errors). Signed-off-by: Jeff King <peff@peff.net> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-10range-diff: handle unterminated lines in read_patches()Jeff King1-14/+11
When parsing our buffer of output from git-log, we have a find_end_of_line() helper that finds the next newline, and gives us the number of bytes to move past it, or the size of the whole remaining buffer if there is no newline. But trying to handle both those cases leads to some oddities: - we try to overwrite the newline with NUL in the caller, by writing over line[len-1]. This is at best redundant, since the helper will already have done so if it saw a newline. But if it didn't see a newline, it's actively wrong; we'll overwrite the byte at the end of the (unterminated) line. We could solve this just dropping the extra NUL assignment in the caller and just letting the helper do the right thing. But... - if we see a "diff --git" line, we'll restore the newline on top of the NUL byte, so we can pass the string to parse_git_diff_header(). But if there was no newline in the first place, we can't do this. There's no place to put it (the current code writes a newline over whatever byte we obliterated earlier). The best we can do is feed the complete remainder of the buffer to the function (which is, in fact, a string, by virtue of being a strbuf). To solve this, the caller needs to know whether we actually found a newline or not. We could modify find_end_of_line() to return that information, but we can further observe that it has only one caller. So let's just inline it in that caller. Nobody seems to have noticed this case, probably because git-log would never produce input that doesn't end with a newline. Arguably we could just return an error as soon as we see that the output does not end in a newline. But the code to do so actually ends up _longer_, mostly because of the cleanup we have to do in handling the error. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-10range-diff: drop useless "offset" variable from read_patches()Jeff King1-2/+2
The "offset" variable was was introduced in 44b67cb62b (range-diff: split lines manually, 2019-07-11), but it has never done anything useful. We use it to count up the number of bytes we've consumed, but we never look at the result. It was probably copied accidentally from an almost-identical loop in apply.c:find_header() (and the point of that commit was to make use of the parse_git_diff_header() function which underlies both). Because the variable was set but not used, most compilers didn't seem to notice, but the upcoming clang-14 does complain about it, via its -Wunused-but-set-variable warning. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-13Merge branch 'ab/pickaxe-pcre2'Junio C Hamano1-1/+2
Rewrite the backend for "diff -G/-S" to use pcre2 engine when available. * ab/pickaxe-pcre2: (22 commits) xdiff-interface: replace discard_hunk_line() with a flag xdiff users: use designated initializers for out_line pickaxe -G: don't special-case create/delete pickaxe -G: terminate early on matching lines xdiff-interface: allow early return from xdiff_emit_line_fn xdiff-interface: prepare for allowing early return pickaxe -S: slightly optimize contains() pickaxe: rename variables in has_changes() for brevity pickaxe -S: support content with NULs under --pickaxe-regex pickaxe: assert that we must have a needle under -G or -S pickaxe: refactor function selection in diffcore-pickaxe() perf: add performance test for pickaxe pickaxe/style: consolidate declarations and assignments diff.h: move pickaxe fields together again pickaxe: die when --find-object and --pickaxe-all are combined pickaxe: die when -G and --pickaxe-regex are combined pickaxe tests: add missing test for --no-pickaxe-regex being an error pickaxe tests: test for -G, -S and --find-object incompatibility pickaxe tests: add test for "log -S" not being a regex pickaxe tests: add test for diffgrep_consume() internals ...
2021-05-11xdiff-interface: prepare for allowing early returnÆvar Arnfjörð Bjarmason1-1/+2
Change the function prototype of xdiff_emit_line_fn to return an "int" instead of "void". Change all of those functions to "return 0", nothing checks those return values yet, and no behavior is being changed. In subsequent commits the interface will be changed to allow early return via this new return value. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27hash: provide per-algorithm null OIDsbrian m. carlson1-1/+1
Up until recently, object IDs did not have an algorithm member, only a hash. Consequently, it was possible to share one null (all-zeros) object ID among all hash algorithms. Now that we're going to be handling objects from multiple hash algorithms, it's important to make sure that all object IDs have a correct algorithm field. Introduce a per-algorithm null OID, and add it to struct hash_algo. Introduce a wrapper function as well, and use it everywhere we used to use the null_oid constant. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13use CALLOC_ARRAYRené Scharfe1-1/+1
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17Merge branch 'js/range-diff-one-side-only'Junio C Hamano1-46/+55
The "git range-diff" command learned "--(left|right)-only" option to show only one side of the compared range. * js/range-diff-one-side-only: range-diff: offer --left-only/--right-only options range-diff: move the diffopt initialization down one layer range-diff: combine all options in a single data structure range-diff: simplify code spawning `git log` range-diff: libify the read_patches() function again range-diff: avoid leaking memory in two error code paths
2021-02-06range-diff/format-patch: handle commit ranges other than A..BJohannes Schindelin1-1/+25
In the `SPECIFYING RANGES` section of gitrevisions[7], two ways are described to specify commit ranges that `range-diff` does not yet accept: "<commit>^!" and "<commit>^-<n>". Let's accept them, by parsing them via the revision machinery and looking for at least one interesting and one uninteresting revision in the resulting `pending` array. This also finally lets us reject arguments that _do_ contain `..` but are not actually ranges, e.g. `HEAD^{/do.. match this}`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-06range-diff: offer --left-only/--right-only optionsJohannes Schindelin1-3/+8
When comparing commit ranges, one is frequently interested only in one side, such as asking the question "Has this patch that I submitted to the Git mailing list been applied?": one would only care about the part of the output that corresponds to the commits in a local branch. To make that possible, imitate the `git rev-list` options `--left-only` and `--right-only`. This addresses https://github.com/gitgitgadget/git/issues/206 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-06range-diff: move the diffopt initialization down one layerJohannes Schindelin1-33/+31
It is actually only the `output()` function that uses those diffopts. By moving the diffopt initialization down into that function, it is encapsulated better. Incidentally, it will also make it easier to implement the `--left-only` and `--right-only` options in `git range-diff` because the `output()` function is now receiving all range-diff options as a parameter, not just the diffopts. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-06range-diff: combine all options in a single data structureJohannes Schindelin1-9/+9
This will make it easier to implement the `--left-only` and `--right-only` options. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-04range-diff: simplify code spawning `git log`Johannes Schindelin1-5/+2
Previously, we waited for the child process to be finished in every failing code path as well as at the end of the function `show_range_diff()`. However, we do not need to wait that long. Directly after reading the output of the child process, we can wrap up the child process. This also has the advantage that we don't do a bunch of unnecessary work in case `finish_command()` returns with an error anyway. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-04range-diff: libify the read_patches() function againJohannes Schindelin1-3/+10
In library functions, we do want to avoid the (simple, but rather final) `die()` calls, instead returning with a value indicating an error. Let's do exactly that in the code introduced in b66885a30cb8 (range-diff: add section header instead of diff header, 2019-07-11) that wants to error out if a diff header could not be parsed. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-04range-diff: avoid leaking memory in two error code pathsJohannes Schindelin1-0/+2
In the code paths in question, we already release a lot of memory, but the `current_filename` variable was missed. Fix that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-27range-diff/format-patch: refactor check for commit rangeJohannes Schindelin1-0/+5
Currently, when called with exactly two arguments, `git range-diff` tests for a literal `..` in each of the two. Likewise, the argument provided via `--range-diff` to `git format-patch` is checked in the same manner. However, `<commit>^!` is a perfectly valid commit range, equivalent to `<commit>^..<commit>` according to the `SPECIFYING RANGES` section of gitrevisions[7]. In preparation for allowing more sophisticated ways to specify commit ranges, let's refactor the check into its own function. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11Use new HASHMAP_INIT macro to simplify hashmap initializationElijah Newren1-3/+1
Now that hashamp has lazy initialization and a HASHMAP_INIT macro, hashmaps allocated on the stack can be initialized without a call to hashmap_init() and in some cases makes the code a bit shorter. Convert some callsites over to take advantage of this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02hashmap: provide deallocation function namesElijah Newren1-1/+1
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30strvec: rename struct fieldsJeff King1-1/+1
The "argc" and "argv" names made sense when the struct was argv_array, but now they're just confusing. Let's rename them to "nr" (which we use for counts elsewhere) and "v" (which is rather terse, but reads well when combined with typical variable names like "args.v"). Note that we have to update all of the callers immediately. Playing tricks with the preprocessor is hard here, because we wouldn't want to rewrite unrelated tokens. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: fix indentation in renamed callsJeff King1-14/+14
Code which split an argv_array call across multiple lines, like: argv_array_pushl(&args, "one argument", "another argument", "and more", NULL); was recently mechanically renamed to use strvec, which results in mis-matched indentation like: strvec_pushl(&args, "one argument", "another argument", "and more", NULL); Let's fix these up to align the arguments with the opening paren. I did this manually by sifting through the results of: git jump grep 'strvec_.*,$' and liberally applying my editor's auto-format. Most of the changes are of the form shown above, though I also normalized a few that had originally used a single-tab indentation (rather than our usual style of aligning with the open paren). I also rewrapped a couple of obvious cases (e.g., where previously too-long lines became short enough to fit on one), but I wasn't aggressive about it. In cases broken to three or more lines, the grouping of arguments is sometimes meaningful, and it wasn't worth my time or reviewer time to ponder each case individually. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: convert remaining callers away from argv_array nameJeff King1-5/+5
We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '*.c' '*.h' | xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: rename files from argv-array to strvecJeff King1-1/+1
This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '*.c' '*.h' | xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15range-diff: avoid negative string precisionVasil Dimov1-1/+4
If the supplied integer for "precision" is negative in `"%.*s", len, line` then it is ignored. So the current code is equivalent to just `"%s", line` because it is executed only if `len` is negative. Fix this by saving the value of `len` before overwriting it with the return value of `parse_git_diff_header()`. Signed-off-by: Vasil Dimov <vd@FreeBSD.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15range-diff: fix a crash in parsing git-log outputVasil Dimov1-0/+13
`git range-diff` calls `git log` internally and tries to parse its output. But `git log` output can be customized by the user in their git config and for certain configurations either an error will be returned by `git range-diff` or it will crash. To fix this explicitly set the output format of the internally executed `git log` with `--pretty=medium`. Because that cancels `--notes`, add explicitly `--notes` at the end. Also, make sure we never crash in the same way - trying to dereference `util` which was never created and has remained NULL. It would happen if the first line of `git log` output does not begin with 'commit '. Alternative considered but discarded - somehow disable all git configs and behave as if no config is present in the internally executed `git log`, but that does not seem to be possible. GIT_CONFIG_NOSYSTEM is the closest to it, but even with that we would still read `.git/config`. Signed-off-by: Vasil Dimov <vd@FreeBSD.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06range-diff: mark pointers as constDenton Liu1-3/+3
The contents pointed to by `diffopt` and `other_arg` should not be modified. Mark these as `const` to indicate this. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-21range-diff: pass through --notes to `git log`Denton Liu1-5/+10
When a commit being range-diff'd has a note attached to it, the note will be compared as well. However, if a user has multiple notes refs or if they want to suppress notes from being printed, there is currently no way to do this. Pass through `--[no-]notes[=<ref>]` to the `git log` call so that this option is customizable. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-21range-diff: output `## Notes ##` headerDenton Liu1-0/+6
When notes were included in the output of range-diff, they were just mashed together with the rest of the commit message. As a result, users wouldn't be able to clearly distinguish where the commit message ended and where the notes started. Output a `## Notes ##` header when notes are detected so that notes can be compared more clearly. Note that we handle case of `Notes (<ref>): -> ## Notes (<ref>) ##` with this code as well. We can't test this in this patch, however, since there is currently no way to pass along different notes refs to `git log`. This will be fixed in a future patch. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-15Merge branch 'ew/hashmap'Junio C Hamano1-5/+5
Code clean-up of the hashmap API, both users and implementation. * ew/hashmap: hashmap_entry: remove first member requirement from docs hashmap: remove type arg from hashmap_{get,put,remove}_entry OFFSETOF_VAR macro to simplify hashmap iterators hashmap: introduce hashmap_free_entries hashmap: hashmap_{put,remove} return hashmap_entry * hashmap: use *_entry APIs for iteration hashmap_cmp_fn takes hashmap_entry params hashmap_get{,_from_hash} return "struct hashmap_entry *" hashmap: use *_entry APIs to wrap container_of hashmap_get_next returns "struct hashmap_entry *" introduce container_of macro hashmap_put takes "struct hashmap_entry *" hashmap_remove takes "const struct hashmap_entry *" hashmap_get takes "const struct hashmap_entry *" hashmap_add takes "struct hashmap_entry *" hashmap_get_next takes "const struct hashmap_entry *" hashmap_entry_init takes "struct hashmap_entry *" packfile: use hashmap_entry in delta_base_cache_entry coccicheck: detect hashmap_entry.hash assignment diff: use hashmap_entry_init on moved_entry.ent
2019-10-07hashmap: remove type arg from hashmap_{get,put,remove}_entryEric Wong1-3/+1
Since these macros already take a `keyvar' pointer of a known type, we can rely on OFFSETOF_VAR to get the correct offset without relying on non-portable `__typeof__' and `offsetof'. Argument order is also rearranged, so `keyvar' and `member' are sequential as they are used as: `keyvar->member' Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap: introduce hashmap_free_entriesEric Wong1-1/+1
`hashmap_free_entries' behaves like `container_of' and passes the offset of the hashmap_entry struct to the internal `hashmap_free_' function, allowing the function to free any struct pointer regardless of where the hashmap_entry field is located. `hashmap_free' no longer takes any arguments aside from the hashmap itself. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap: hashmap_{put,remove} return hashmap_entry *Eric Wong1-1/+3
And add *_entry variants to perform container_of as necessary to simplify most callers. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_remove takes "const struct hashmap_entry *"Eric Wong1-1/+1
This is less error-prone than "const void *" as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_add takes "struct hashmap_entry *"Eric Wong1-1/+1
This is less error-prone than "void *" as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_entry_init takes "struct hashmap_entry *"Eric Wong1-2/+2
C compilers do type checking to make life easier for us. So rely on that and update all hashmap_entry_init callers to take "struct hashmap_entry *" to avoid future bugs while improving safety and readability. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-03range-diff: internally force `diff.noprefix=true`Johannes Schindelin1-1/+2
When parsing the diffs, `range-diff` expects to see the prefixes `a/` and `b/` in the diff headers. These prefixes can be forced off via the config setting `diff.noprefix=true`. As `range-diff` is not prepared for that situation, this will cause a segmentation fault. Let's avoid that by passing the `--no-prefix` option to the `git log` process that generates the diffs that `range-diff` wants to parse. And of course expect the output to have no prefixes, then. Reported-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11range-diff: add headers to the outer hunk headerThomas Gummerer1-3/+6
Add the section headers/hunk headers we introduced in the previous commits to the outer diff's hunk headers. This makes it easier to understand which change we are actually looking at. For example an outer hunk header might now look like: @@ Documentation/config/interactive.txt while previously it would have only been @@ which doesn't give a lot of context for the change that follows. For completeness also add section headers for the commit metadata and the commit message, although they are arguably less important. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11range-diff: add filename to inner diffThomas Gummerer1-2/+13
In a range-diff it's not always clear which file a certain funcname of the inner diff belongs to, because the diff header (or section header as added in a previous commit) is not always visible in the range-diff. Add the filename to the inner diffs header, so it's always visible to users. This also allows us to add the filename + the funcname to the outer diffs hunk headers using a custom userdiff pattern, which will be done in the next commit. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11range-diff: add section header instead of diff headerThomas Gummerer1-7/+27
Currently range-diff keeps the diff header of the inner diff intact (apart from stripping lines starting with index). This diff header is somewhat useful, especially when files get different names in different ranges. However there is no real need to keep the whole diff header for that. The main reason we currently do that is probably because it is easy to do. Introduce a new range diff hunk header, that's enclosed by "##", similar to how line numbers in diff hunks are enclosed by "@@", and give human readable information of what exactly happened to the file, including the file name. This improves the readability of the range-diff by giving more concise information to the users. For example if a file was renamed in one iteration, but not in another, the diff of the headers would be quite noisy. However the diff of a single line is concise and should be easier to understand. Additionally, this allows us to add these range diff section headers to the outer diffs hunk headers using a custom userdiff pattern, which should help making the range-diff more readable. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11range-diff: suppress line count in outer diffThomas Gummerer1-0/+1
The line count in the outer diff's hunk headers of a range diff is not all that interesting. It merely shows how far along the inner diff are on both sides. That number is of no use for human readers, and range-diffs are not meant to be machine readable. In a subsequent commit we're going to add some more contextual information such as the filename corresponding to the diff to the hunk headers. Remove the unnecessary information, and just keep the "@@" to indicate that a new hunk of the outer diff is starting. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11range-diff: don't remove funcname from inner diffThomas Gummerer1-3/+4
When postprocessing the inner diff in range-diff, we currently replace the whole hunk header line with just "@@". This matches how 'git tbdiff' used to handle hunk headers as well. Most likely this is being done because line numbers in the hunk header are not relevant without other changes. They can for example easily change if a range is rebased, and lines are added/removed before a change that we actually care about in our ranges. However it can still be useful to have the function name that 'git diff' extracts as additional context for the change. Note that it is not guaranteed that the hunk header actually shows up in the range-diff, and this change only aims to improve the case where a hunk header would already be included in the final output. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11range-diff: split lines manuallyThomas Gummerer1-26/+42
Currently range-diff uses the 'strbuf_getline()' function for doing its line by line processing. In a future patch we want to do parts of that parsing using the 'parse_git_diff_header()' function. That function does its own line by line reading of the input, and doesn't use strbufs. This doesn't match with how we do the line-by-line processing in range-diff currently. Switch range-diff to do our own line by line parsing, so we can re-use the 'parse_git_diff_header()' function later. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11range-diff: fix function parameter indentationThomas Gummerer1-2/+2
Fix the indentation of the function parameters for a couple of functions, to match the style in the rest of the file. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-30format-patch: do not let its diff-options affect --range-diffJunio C Hamano1-1/+5
Stop leaking how the primary output of format-patch is customized to the range-diff machinery and instead let the latter use its own "reasonable default", in order to correct the breakage introduced by a5170794 ("Merge branch 'ab/range-diff-no-patch'", 2018-11-18) on the 'master' front. "git format-patch --range-diff..." without any weird diff option started to include the "range-diff --stat" output, which is rather useless right now, that made the whole thing unusable and this is probably the least disruptive way to whip the codebase into a shippable shape. We may want to later make the range-diff driven by format-patch more configurable, but that would have to wait until we have a good design. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-18Merge branch 'ab/range-diff-no-patch'Junio C Hamano1-1/+2
The "--no-patch" option, which can be used to get a high-level overview without the actual line-by-line patch difference shown, of the "range-diff" command was earlier broken, which has been corrected. * ab/range-diff-no-patch: range-diff: make diff option behavior (e.g. --stat) consistent range-diff: fix regression in passing along diff options range-diff doc: add a section about output stability
2018-11-14range-diff: make diff option behavior (e.g. --stat) consistentÆvar Arnfjörð Bjarmason1-1/+2
Make the behavior when diff options (e.g. "--stat") are passed consistent with how "diff" behaves. Before 73a834e9e2 ("range-diff: relieve callers of low-level configuration burden", 2018-07-22) running range-diff with "--stat" would produce stat output and the diff output, as opposed to how "diff" behaves where once "--stat" is specified "--patch" also needs to be provided to emit the patch output. As noted in a previous change ("range-diff doc: add a section about output stability", 2018-11-07) the "--stat" output with "range-diff" is useless at the moment. But we should behave consistently with "diff" in anticipation of such output being useful in the future, because it would make for confusing UI if "diff" and "range-diff" behaved differently when it came to how they interpret diff options. The new behavior is also consistent with the existing documentation added in ba931edd28 ("range-diff: populate the man page", 2018-08-13). See "[...]also accepts the regular diff options[...]" in git-range-diff(1). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13Merge branch 'jk/xdiff-interface'Junio C Hamano1-1/+9
The interface into "xdiff" library used to discover the offset and size of a generated patch hunk by first formatting it into the textual hunk header "@@ -n,m +k,l @@" and then parsing the numbers out. A new interface has been introduced to allow callers a more direct access to them. * jk/xdiff-interface: xdiff-interface: drop parse_hunk_header() range-diff: use a hunk callback diff: convert --check to use a hunk callback combine-diff: use an xdiff hunk callback diff: use hunk callback for word-diff diff: discard hunk headers for patch-ids earlier diff: avoid generating unused hunk header lines xdiff-interface: provide a separate consume callback for hunks xdiff: provide a separate emit callback for hunks
2018-11-12range-diff: fix regression in passing along diff optionsÆvar Arnfjörð Bjarmason1-1/+1
In 73a834e9e2 ("range-diff: relieve callers of low-level configuration burden", 2018-07-22) we broke passing down options like --no-patch, --stat etc. Fix that regression, and add a test asserting the pre-73a834e9e2 behavior for some of these diff options. As noted in a change leading up to this ("range-diff doc: add a section about output stability", 2018-11-07) the output is not meant to be stable. So this regression test will likely need to be tweaked once we get a "proper" --stat option. See https://public-inbox.org/git/nycvar.QRO.7.76.6.1811071202480.39@tvgsbejvaqbjf.bet/ for a further explanation of the regression. The fix here is not the same as in Johannes's on-list patch, for reasons that'll be explained in a follow-up commit. The quoting of "EOF" here mirrors that of an earlier test. Perhaps that should be fixed, but let's leave that up to a later cleanup change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05range-diff: use a hunk callbackJeff King1-1/+9
When we count the lines in a diff, we don't actually care about the contents of each line. By using a hunk callback, we tell xdiff that it does not need to even bother generating a hunk header line, saving a small amount of work. Arguably we could even ignore the hunk headers completely, since we're just computing a cost function between patches. But doing it this way maintains the exact same behavior before and after. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-02xdiff-interface: provide a separate consume callback for hunksJeff King1-1/+1
The previous commit taught xdiff to optionally provide the hunk header data to a specialized callback. But most users of xdiff actually use our more convenient xdi_diff_outf() helper, which ensures that our callbacks are always fed whole lines. Let's plumb the special hunk-callback through this interface, too. It will follow the same rule as xdiff when the hunk callback is NULL (i.e., continue to pass a stringified hunk header to the line callback). Since we add NULL to each caller, there should be no behavior change yet. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-25range-diff: allow to diff files regardless of submodule configLucas De Marchi1-1/+1
If we have `submodule.diff = log' in the configuration file or `--submodule=log' is given as argument, range-diff fails to compare both diffs and we only get the following output: Submodule a 0000000...0000000 (new submodule) Even if the repository doesn't have any submodule. That's because the mode in diff_filespec is not correct and when flushing the diff, down in builtin_diff() we will enter the condition: if (o->submodule_format == DIFF_SUBMODULE_LOG && (!one->mode || S_ISGITLINK(one->mode)) && (!two->mode || S_ISGITLINK(two->mode))) { show_submodule_summary(o, one->path ? one->path : two->path, &one->oid, &two->oid, two->dirty_submodule); return; It turns out that S_ISGITLINK will return true (mode == 0160000 here). Similar thing happens if submodule.diff is "diff". Do like it's done in grep.c when calling fill_filespec() and force it to be recognized as a file by adding S_IFREG to the mode. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'es/format-patch-rangediff'Junio C Hamano1-3/+23
"git format-patch" learned a new "--range-diff" option to explain the difference between this version and the previous attempt in the cover letter (or after the tree-dashes as a comment). * es/format-patch-rangediff: format-patch: allow --range-diff to apply to a lone-patch format-patch: add --creation-factor tweak for --range-diff format-patch: teach --range-diff to respect -v/--reroll-count format-patch: extend --range-diff to accept revision range format-patch: add --range-diff option to embed diff in cover letter range-diff: relieve callers of low-level configuration burden range-diff: publish default creation factor range-diff: respect diff_option.file rather than assuming 'stdout'
2018-08-20range-diff: indent special lines as contextStefan Beller1-0/+2
The range-diff coloring is a bit fuzzy when it comes to special lines of a diff, such as indicating new and old files with +++ and ---, as it would pickup the first character and interpret it for its coloring, which seems annoying as in regular diffs, these lines are colored bold via DIFF_METAINFO. By indenting these lines by a white space, they will be treated as context which is much more useful, an example [1] on the range diff series itself: [...] + diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt + new file mode 100644 + --- /dev/null + +++ b/Documentation/git-range-diff.txt +@@ ++git-range-diff(1) [...] + diff --git a/Makefile b/Makefile --- a/Makefile +++ b/Makefile [...] The first lines that introduce the new file for the man page will have the '+' sign colored and the rest of the line will be bold. The later lines that indicate a change to the Makefile will be treated as context both in the outer and inner diff, such that those lines stay regular color. [1] ./git-range-diff pr-1/dscho/branch-diff-v3...pr-1/dscho/branch-diff-v4 These tags are found at https://github.com/gitgitgadget/git Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20range-diff: make use of different output indicatorsStefan Beller1-1/+19
This change itself only changes the internal communication and should have no visible effect to the user. We instruct the diff code that produces the inner diffs to use other markers instead of the usual markers for new, old and context lines. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-14range-diff: relieve callers of low-level configuration burdenEric Sunshine1-2/+22
There are a number of very low-level configuration details which need to be managed precisely to generate a proper range-diff. In particular, 'diff_options' output format, header suppression, indentation, and dual-color mode must all be set appropriately to ensure proper behavior. Handle these details locally in the libified range-diff back-end rather than forcing each caller to have specialized knowledge of these implementation details, and to avoid duplication as new callers are added. While at it, localize these tweaks to be active only while generating the range-diff, so they don't clobber the caller-provided 'diff_options', which might be used beyond range-diff generation. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-14range-diff: respect diff_option.file rather than assuming 'stdout'Eric Sunshine1-1/+1
The actual diffs output by range-diff respect diff_option.file, which range-diff passes down the call-chain, thus are destination-agnostic. However, output_pair_header() is hard-coded to emit to 'stdout'. Fix this by making output_pair_header() respect diff_option.file, as well. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13range-diff: left-pad patch numbersJohannes Schindelin1-7/+9
As pointed out by Elijah Newren, tbdiff has this neat little alignment trick where it outputs the commit pairs with patch numbers that are padded to the maximal patch number's width: 1: cafedead = 1: acefade first patch [...] 314: beefeada < 314: facecab up to PI! Let's do the same in range-diff, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13range-diff: use color for the commit pairsJohannes Schindelin1-13/+38
Arguably the most important part of `git range-diff`'s output is the list of commits in the two branches, together with their relationships. For that reason, tbdiff introduced color-coding that is pretty intuitive, especially for unchanged patches (all dim yellow, like the first line in `git show`'s output) vs modified patches (old commit is red, new commit is green). Let's imitate that color scheme. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13range-diff: do not show "function names" in hunk headersJohannes Schindelin1-0/+6
We are comparing complete, formatted commit messages with patches. There are no function names here, so stop looking for them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13range-diff: adjust the output of the commit pairsJohannes Schindelin1-9/+50
This not only uses "dashed stand-ins" for "pairs" where one side is missing (i.e. unmatched commits that are present only in one of the two commit ranges), but also adds onelines for the reader's pleasure. This change brings `git range-diff` yet another step closer to feature parity with tbdiff: it now shows the oneline, too, and indicates with `=` when the commits have identical diffs. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13range-diff: right-trim commit messagesJohannes Schindelin1-0/+1
When comparing commit messages, we need to keep in mind that they are indented by four spaces. That is, empty lines are no longer empty, but have "trailing whitespace". When displaying them in color, that results in those nagging red lines. Let's just right-trim the lines in the commit message, it's not like trailing white-space in the commit messages are important enough to care about in `git range-diff`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13range-diff: also show the diff between patchesJohannes Schindelin1-3/+31
Just like tbdiff, we now show the diff between matching patches. This is a "diff of two diffs", so it can be a bit daunting to read for the beginner. An alternative would be to display an interdiff, i.e. the hypothetical diff which is the result of first reverting the old diff and then applying the new diff. Especially when rebasing frequently, an interdiff is often not feasible, though: if the old diff cannot be applied in reverse (due to a moving upstream), an interdiff can simply not be inferred. This commit brings `range-diff` closer to feature parity with regard to tbdiff. To make `git range-diff` respect e.g. color.diff.* settings, we have to adjust git_branch_config() accordingly. Note: while we now parse diff options such as --color, the effect is not yet the same as in tbdiff, where also the commit pairs would be colored. This is left for a later commit. Note also: while tbdiff accepts the `--no-patches` option to suppress these diffs between patches, we prefer the `-s` (or `--no-patch`) option that is automatically supported via our use of diff_opt_parse(). And finally note: to support diff options, we have to call `parse_options()` such that it keeps unknown options, and then loop over those and let `diff_opt_parse()` handle them. After that loop, we have to call `parse_options()` again, to make sure that no unknown options are left. Helped-by: Thomas Gummerer <t.gummerer@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13range-diff: improve the order of the shown commitsJohannes Schindelin1-19/+40
This patch lets `git range-diff` use the same order as tbdiff. The idea is simple: for left-to-right readers, it is natural to assume that the `git range-diff` is performed between an older vs a newer version of the branch. As such, the user is probably more interested in the question "where did this come from?" rather than "where did that one go?". To that end, we list the commits in the order of the second commit range ("the newer version"), inserting the unmatched commits of the first commit range as soon as all their predecessors have been shown. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13range-diff: first rudimentary implementationJohannes Schindelin1-0/+311
At this stage, `git range-diff` can determine corresponding commits of two related commit ranges. This makes use of the recently introduced implementation of the linear assignment algorithm. The core of this patch is a straight port of the ideas of tbdiff, the apparently dormant project at https://github.com/trast/tbdiff. The output does not at all match `tbdiff`'s output yet, as this patch really concentrates on getting the patch matching part right. Note: due to differences in the diff algorithm (`tbdiff` uses the Python module `difflib`, Git uses its xdiff fork), the cost matrix calculated by `range-diff` is different (but very similar) to the one calculated by `tbdiff`. Therefore, it is possible that they find different matching commits in corner cases (e.g. when a patch was split into two patches of roughly equal length). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>