aboutsummaryrefslogtreecommitdiffstats
path: root/merge-ort.c
AgeCommit message (Collapse)AuthorFilesLines
2025-11-26Merge branch 'ad/blame-diff-algorithm'Junio C Hamano1-1/+0
"git blame" learns "--diff-algorithm=<algo>" option. * ad/blame-diff-algorithm: blame: make diff algorithm configurable xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASK
2025-11-17merge-ort: fix failing merges in special corner caseElijah Newren1-1/+28
At GitHub, we had a repository that was triggering git: merge-ort.c:3032: process_renames: Assertion `newinfo && !newinfo->merged.clean` failed. during git replay. This sounds similar to the somewhat recent f6ecb603ff8a (merge-ort: fix directory rename on top of source of other rename/delete, 2025-08-06), but the cause is different. Unlike that case, there are no rename-to-self situations arising in this case, and new to this case it can only be triggered during a replay operation on the 2nd or later commit being replayed, never on the first merge in the sequence. To trigger, the repository needs: * an upstream which: * renames a file to a different directory, e.g. old/file -> new/file * leaves other files remaining in the original directory (so that e.g. "old/" still exists upstream even though file has been removed from it and placed elsewhere) * a topic branch being rebased where: * a commit in the sequence: * modifies old/file * a subsequent commit in the sequence being replayed: * does NOT touch *anything* under new/ * does NOT touch old/file * DOES modify other paths under old/ * does NOT have any relevant renames that we need to detect _anywhere_ elsewhere in the tree (meaning this interacts interestingly with both directory renames and cached renames) In such a case, the assertion will trigger. The fix turns out to be surprisingly simple. I have a very vague recollection that I actually considered whether to add such an if-check years ago when I added the very similar one for oldinfo in 1b6b902d95a5 (merge-ort: process_renames() now needs more defensiveness, 2021-01-19), but I think I couldn't figure out a possible way to trigger it and was worried at the time that if I didn't know how to trigger it then I wasn't so sure that simply skipping it was correct. Waiting did give me a chance to put more thorough tests and checks into place for the rename-to-self cases a few months back, which I might not have found as easily otherwise. Anyway, put the check in place now and add a test that demonstrates the fix. Note that this bug, as demonstrated by the conditions listed above, runs at the intersection of relevant renames, trivial directory resolutions, and cached renames. All three of those optimizations are ones that unfortunately make the code (and testcases!) a bit more complex, and threading all three makes it a bit more so. However, the testcase isn't crazy enough that I'd expect no one to ever hit it in practice, and was confused why we didn't see it before. After some digging, I discovered that merge.directoryRenames=false is a workaround to this bug, and GitHub used that setting until recently (it was a "temporary" match-what-libgit2-does piece of code that lasted years longer than intended). Since the conditions I gave above for triggering this bug rule out the possibility of there being directory renames, one might assume that it shouldn't matter whether you try to detect such renames if there aren't any. However, due to commit a16e8efe5c2b (merge-ort: fix merge.directoryRenames=false, 2025-03-13), the heavy hammer used there means that merge.directoryRenames=false ALSO turns off rename caching, which is critical to triggering the bug. This becomes a bit more than an aside since... Re-reading that old commit, a16e8efe5c2b (merge-ort: fix merge.directoryRenames=false, 2025-03-13), it appears that the solution to this latest bug might have been at least a partial alternative solution to that old commit. And it may have been an improved alternative (or at least help implement one), since it may be able to avoid the heavy-handed disabling of rename cache. That might be an interesting future thing to investigate, but is not critical for the current fix. However, since I spent time digging it all up, at least leave a small comment tweak breadcrumb to help some future reader (myself or others) who wants to dig further to connect the dots a little quicker. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17merge-ort: remove debugging crudElijah Newren1-1/+1
While developing commit a16e8efe5c2b (merge-ort: fix merge.directoryRenames=false, 2025-03-13), I was testing things out and had an extra condition on one of the if-blocks that I occasionally swapped between '&& 0' and '&& 1' to see the effects of the changes. I forgot to remove it before submitting and it wasn't caught in review. Remove it now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASKAntonin Delpeuch1-1/+0
The XDF_DIFF_ALGORITHM_MASK bit mask only includes bits for the patience and histogram diffs, not for the minimal one. This means that when reseting the diff algorithm to the default one, one needs to separately clear the bit for the minimal diff. There are places in the code that fail to do that: merge-ort.c and builtin/merge-file.c. Add the XDF_NEED_MINIMAL bit to the bit mask, and remove the separate clearing of this bit in the places where it hasn't been forgotten. Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-15Merge branch 'en/ort-rename-fixes' into maint-2.51Junio C Hamano1-15/+40
Various bugs about rename handling in "ort" merge strategy have been fixed. * en/ort-rename-fixes: merge-ort: fix directory rename on top of source of other rename/delete merge-ort: fix incorrect file handling merge-ort: clarify the interning of strings in opt->priv->path t6423: fix missed staging of file in testcases 12i,12j,12k t6423: document two bugs with rename-to-self testcases merge-ort: drop unnecessary temporary in check_for_directory_rename() merge-ort: update comments to modern testfile location
2025-08-21Merge branch 'en/ort-rename-fixes'Junio C Hamano1-15/+40
Various bugs about rename handling in "ort" merge strategy have been fixed. * en/ort-rename-fixes: merge-ort: fix directory rename on top of source of other rename/delete merge-ort: fix incorrect file handling merge-ort: clarify the interning of strings in opt->priv->path t6423: fix missed staging of file in testcases 12i,12j,12k t6423: document two bugs with rename-to-self testcases merge-ort: drop unnecessary temporary in check_for_directory_rename() merge-ort: update comments to modern testfile location
2025-08-07merge-ort: fix directory rename on top of source of other rename/deleteElijah Newren1-5/+13
At GitHub, we've got a real-world repository that has been triggering failures of the form: git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed. which comes from the line: VERIFY_CI(newinfo); Unfortunately, this one has been quite complex to unravel, and is a bit complex to explain. So, I'm going to carefully try to explain each relevant piece needed to understand the fix, then carefully build up from a simple testcase to some of the relevant testcases. == New special case we need to consider == Rename pairs in the diffcore machinery connect the source path of a rename with the destination path of a rename. Since we have rename pairs to consider on both sides of history since the merge base, merging has to consider a few special cases of possible overlap: A) two rename pairs having the same target path B) two rename pairs having the same source path C) the source path of one rename pair being the target path of a different rename pair Some of these came up often enough that we gave them names: A) a rename/rename(2to1) conflict (looks similar to an add/add conflict) B) a rename/rename(1to2) conflict, which represents the same path being renamed differently on the two sides of history C) not yet named merge-ort is well-prepared to handle cases (A) and (B), as was merge-recursive (which was merge-ort's predecessor). Case (C) was briefly considered during the years of merge-recursive maintenance, but the full extent of support it got was a few FIXME/TODO comments littered around the code highlighting some of the places that would probably need to be fixed to support it. When I wrote merge-ort I ignored case (C) entirely, since I believed that case (C) was only possible if we were to support break detection during merges. Not only had break detection never been supported by any merge algorithm, I thought break detection wasn't worth the effort to support in a merge algorithm. However, it turns out that case (C) can be triggered without break detection, if there's enough moving pieces. Before I dive into how to trigger case (C) with directory renames plus other renames, it might be helpful to use a simpler example with break detection first. And before we get to that it may help to explain some more basics of handling renames in the merge algorithm. So, let me first backup and provide a quick refresher on each of * handling renames * what break detection would mean, if supported in merging * handling directory renames From there, I'll build up from a basic directory rename detection case to one that triggers a failure currently. == Handling renames == In the merge machinery when we have a rename of a path A -> B, processing that rename needs to remove path A, and make sure that path B has the relevant information. Note that if the content was also modified on both sides, this may mean that we have 3 different stages that need to be stored at path B instead of having some stored at path A. Having all stages stored at path B makes it much easier for users to investigate and resolve the content conflict associated with a renamed path. For example: * "git status" doesn't have to figure out how to list paths A & B and attempt to connect them for users; it can just list path B. * Users can use "git ls-files -u B" (instead of trying to find the previous name of the file so they can list both, i.e. "git ls-files -u A B") * Users can resolve via "git add B" (without needing to "git rm A") == What break detection would mean == If break detection were supported, we might have cases where A -> B *and* C -> A, meaning that both rename pairs might believe they need to update A. In particular, the processing of A -> B would need to be careful to not clear out all stages of A and mark it resolved, while both renames would need to figure out which stages of A belong with A and which belong with B, so that both paths have the right stages associated with them. merge-ort (like merge-recursive before it) makes no attempt to handle break detection; it runs with break detection turned off. It would need to be retrofitted to handle such cases. == Directory rename detection == If one side of history renames directory D/ -> E/, and the other side of history adds new files to D/, then directory rename detection notices and suggests moving those new files to E/. A similar thing is done for paths renamed into D/, causing them to be transitively renamed into E/. The default in the merge machinery is to report a conflict whenever a directory rename might modify the location of a path, so that users can decide whether they wanted the original path or the directory-rename-induced location. However, that means the default codepath still runs through all the directory rename detection logic, it just supplements it with providing conflict notices when it is done. == Building up increasingly complex testcases == I'll start with a really simple directory rename example, and then slowly add twists that explain new pieces until we get to the problematic cases: === Testcase 1 === Let's start with a concrete example, where particular files/directories of interest that exist or are changed on each side are called out: Original: <nothing of note> our side: rename B/file -> C/file their side: rename C/ -> A/ For this case, we'd expect to see the original B/file appear not at C/file but at A/file. (We would also expect a conflict notice that the user will want to choose between C/file and A/file, but I'm going to ignore conflict notices from here on by assuming merge.directoryRenames is set to `true` rather than `conflict`; the only difference that assumption makes is whether that makes the merge be considered to be conflicted and whether it prints a conflict notice; what is written to the index or working directory is unchanged.) === Testcase 2 === Modify testcase 1 by having A/file exist from the start: Original: A/file exists our side: rename B/file -> C/file their side: rename C/ -> A/ In such a case, to avoid user confusion at what looks kind of like an add/add conflict (even though the original path at A/file was not added by either side of the merge), we turn off directory rename detection for this path and print a "in the way" warning to the user: CONFLICT (implicit dir rename): Existing file/dir ... in the way ... The testcases in section 5 of t6423 explore these in more detail. === Testcase 3 === Let's modify testcase 1 in a slightly different way: have A/file be added by their side rather than it already existing. Original: <nothing of note> our side: rename B/file -> C/file their side: rename C/ -> A/ add A/file In this case, the directory rename detection basically transforms our side's original B/file -> C/file into a B/file -> A/file, and so we get a rename/add conflict, with one version of A/file coming from the renamed file, and another coming from the new A/file, each stored as stages 2 and 3 in conflicts. This kind of add/add conflict is perhaps slightly more complex than a regular add/add conflict, but with the printed messages it makes sense where it came from and we have different stages of the file to work with to resolve the conflict. === Testcase 4 === Let's do something similar to testcase 3, but have the opposite side of history add A/file: Original: <nothing of note> our side: rename B/file -> C/file add A/file their side: rename C/ -> A/ Now if we allow directory rename detection to modify C/file to A/file, then we also get a rename/add conflict, but in this case we'd need both higher order stages being recorded on side 2, which makes no sense. The index can't store multiple stage 2 entries, and even if we could, it would probably be confusing for users to work with. So, similar to what we do when there was an A/file in the original version, we simply turn off directory rename detection for cases like this and provide the "in the way" CONFLICT notice to the user. === Testcase 5 === We're slowly getting closer. Let's mix it up by having A/file exist at the beginning but not exist on their side: original: A/file exists our side: rename B/file -> C/file their side: rename C/ -> A/ rename A/file -> D/file For this case, you could say that since A/file -> D/file, it's no longer in the way of C/file being moved by directory rename detection to A/file. But that would give us a case where A/file is both the source and the target of a rename, similar to break detection, which the code isn't currently equipped to handle. This is not yet the case that causes current failures; to the current code, this kind of looks like testcase 4 in that A/file is in the way on our side (since A/file was in the original and was umodified by our side). So, it results in a "in the way" notification with directory rename detection being turned off for A/file so that B/file ends up at C/file. Perhaps the resolution could be improved in the future, but our "in the way" checks prevented such problems by noticing that A/file exists on our side and thus turns off directory rename detection from affecting C/file's location. So, while the merge result could be perhaps improved, the fact that this is currently handled by giving the user an "in the way" message gives the user a chance to resolve and prevents the code from tripping itself up. === Testcase 6 === Let's modify testcase 5 a bit more, to also delete A/file on our side: original: A/file exists our side: rename B/file -> C/file delete A/file their side: rename C/ -> A/ rename A/file -> D/file Now the "in the way" logic doesn't detect that there's an A/file in the way (neither side has an A/file anymore), so it's fine to transitively rename C/file further to A/file...except that we end up with A/file being both the source of one rename, and the target of a different rename. Each rename pair tries to handle the resolution of the source and target paths of its own rename. But when we go to process the second rename pair in process_renames(), we do not expect either the source or the destination to be marked as handled already; so, when we hit the sanity checks that these are not handled: VERIFY_CI(oldinfo); VERIFY_CI(newinfo); then one of these is going to throw an assertion failure since the previous rename pair already marked both of its paths as handled. This will give us an error of the form: git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed. This is the failure we're currently triggering, and it fundamentally depends on: * a path existing in the original * that original path being removed or renamed on *both* sides * some kind of directory rename moving some *other* path into that original path This was added as testcase 12q in t6423. === Testcase 7 === Bonus bug found while investigating! Let's go back to the comparison between testcases 2 & 3, and set up a file present on their side that we need to consider: Original: A/file exists our side: rename B/file -> C/file rename A/file -> D/file their side: rename C/ -> A/ Here, there is no A/file in the way on our side like testcase 4. There is an A/file present on their side like testcase 3, which was an add/add conflict, but that's associated with the file be renamed to D/file. So, that really shouldn't be an add/add conflict because we instead want all modes of the original A/file to be transported to D/file. Unfortunately, the current code kind of treats it like an add/add conflict instead...but even worse. There is also a valid mode for A/file in the original, which normally goes to stage 1. However, an add/add conflict should be represented in the index with no mode at stage 1 (for the original side), only modes at stages 2 and 3 (for our and their side), so for an add/add we'd expect that mode for A/file in the original version to be cleared out (or be transported to D/file). Unfortunately, the code currently leaves not only the stage 3 entry for A/file intact, it also leaves the stage 1 entry for A/file. This results in `git ls-files -u A/file` output of the form: 100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 1 A/file 100644 0cfbf08886fca9a91cb753ec8734c84fcbe52c9f 2 A/file 100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 3 A/file This would likely cause users to believe this isn't an add/add conflict; rather, this would lead them to believe that A/file was only modified on our side and that therefore it should not have been a conflict in the first place. And while resolving the conflict in favor of our side is the correct resolution (because stages 1 and 3 should have been cleared out in the first place), this is certainly likely to cause confusion for anyone attempting to investigate why this path was marked as conflicted. This was added as testcase 12p in t6423. == Attempted solutions that I discarded == 1) For each side of history, create a strset of the sources of each rename on the other side of history. Then when using directory renames to modify existing renames, verify that we aren't renaming to a source of another rename. Unfortunately, the "relevant renames" optimization in merge-ort means we often don't detect renames -- we just see a delete and an add -- which is easy to forget and makes debugging testcases harder, but it also turns out that this solution in insufficient to solve the related problems in the area (more on that below). 2) Modify the code to be aware of the possibility of renaming to the source of another side's rename, and make all the conflict resolution logic for each case (including existing rename/rename(2to1) and rename/rename(1to2) cases) handle the additional complexity. It turns out there was much more code to audit than I wanted, for a really niche case. I didn't like how many changes were needed, and aborted. == Solution == We do not want the stages of unrelated files appearing at the same path in the index except when dealing with an add/add conflict. While we previously handled this for stages 2 & 3, we also need to worry about stage 1. So check for a stage 1 index entry being in the way of a directory rename. However, if we can detect that the stage 1 index entry is actually from a related file due to a directory-rename-causes-rename-to-self situation, then we can allow the stage 1 entry to remain. From this wording, you may note that it's not just rename cases that are a problem; bugs could be triggered with directory renames vs simple adds. That leads us to... == Testcases 8+ == Another bonus bug, found via understanding our final solutions (and the failure of our first attempted solution)! Let's tweak testcase 7 a bit: Original: A/file exists our side: delete A/file add -> C/file their side: delete A/file rename C/ -> A/ Here, there doesn't seem to be a big problem. Sure C/file gets modified via the directory rename of C/ -> A/ so that it becomes A/file, but there's no file in the way, right? Actually, here we have a problem that the stage 1 entry of A/file would be combined with the stage 2 entry of C/file, and make it look like a modify/delete conflict. Perhaps there is some extra checking that could be added to the code to make it attempt to clear out the stage 1 entry of A/file, but the various rename-to-self-via-directory-rename testcases make that a bit more difficult. For now, it's easier to just treat this as a path-in-the-way situation and not allow the directory rename to modify C/file. That sounds all well and good, but it does have an interesting side effect. Due to the "relevant renames" optimizations in merge-ort (i.e. only detect the renames you need), 100% renames whose files weren't modified on the other side often go undetected. This means that if we modify this testcase slightly to: Original: A/file exists our side: A/file -> C/file their side: rename C/ -> A/ Then although this looks like where the directory rename just moves C/file back to A/file and there's no problem, we may not detect the A/file -> C/file rename. Instead it will look like a deletion of A/file and an addition of C/file. The directory rename then appears to be moving C/file to A/file, which is on top of an "unrelated" file (or at least a file it doesn't know is related). So, we will report path-in-the-way conflicts now in cases where we didn't before. That's better than silently and accidentally combining stages of unrelated files and making them look like a modify/delete; users can investigate the reported conflict and simply resolve it. This means we tweak the expected solution for testcases 12i, 12j, and 12k. (Those three tests are basically the same test repeated three times, but I was worried when I added those that subtle differences in parent/child, sibling/sibling, and toplevel directories might mess up how rename-to-self testcases actually get handled.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07merge-ort: fix incorrect file handlingElijah Newren1-0/+14
We have multiple bugs here -- accidental silent file deletion, accidental silent file retention for files that should be deleted, and incorrect number of entries left in the index. The series merged at commit d3b88be1b450 (Merge branch 'en/merge-dir-rename-corner-case-fix', 2021-07-16) introduced testcase 12i-12k in t6423 which checked for rename-to-self cases, and fixed bugs that merge-ort and merge-recursive had with these testcases. At the time, I noted that merge-ort had one bug for these cases, while merge-recursive had two. It turns out that merge-ort did in fact have another bug, but the "relevant renames" optimizations were masking it. If we modify testcase 12i from t6423 to modify the file in the commit that renames it (but only modify it enough that it can still be detected as a rename), then we can trigger silent deletion of the file. Tweak testcase 12i slightly to make the file in question have more than one line in it. This leaves the testcase intact other than changing the initial contents of this one file. The purpose of this tweak is to minimize the changes between this testcase and a new one that we want to add. Then duplicate testcase 12i as 12i2, changing it so that it adds a single line to the file in question when it is renamed; testcase 12i2 then serves as a testcase for this merge-ort bug that I previously overlooked. Further, commit 98a1a00d5301 (t6423: add a testcase causing a failed assertion in process_renames, 2025-03-06), fixed an issue with rename-to-self but added a new testcase, 12n, that only checked for whether the merge ran to completion. A few commits ago, we modified this test to check for the number of entries in the index -- but noted that the number was wrong. And we also noted a silently-keep-instead-of-delete bug at the same time in the new testcase 12n2. In summary, we have the following bugs with rename-to-self cases: * silent deletion of file expected to be kept (t6423 testcase 12i2) * silent retention of file expected to be removed (t6423 testcase 12n2) * wrong number of extries left in the index (t6423 testcase 12n) All of these bugs arise because in a rename-to-self case, when we have a rename A->B, both A and B name the same file. The code in process_renames() assumes A & B are different, and tries to move the higher order stages and file contents so that they are associated just with the new path, but the assumptions of A & B being different can cause A to be deleted when it's not supposed to be or mark B as resolved and kept in place when it's supposed to be deleted. Since A & B are already the same path in the rename-to-self case, simply skip the steps in process_renames() for such files to fix these bugs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07merge-ort: clarify the interning of strings in opt->priv->pathElijah Newren1-3/+8
Because merge-ort is dealing with potentially all the pathnames in the repository, it sometimes needs to do an awful lot of string comparisons. Because of this, struct merge_options_internal's path member was envisioned from the beginning to contain an interned value for every path in order to allow us to compare strings via pointer comparison instead of using strcmp. See * 5b59c3db059d (merge-ort: setup basic internal data structures, 2020-12-13) * f591c4724615 (merge-ort: copy and adapt merge_3way() from merge-recursive.c, 2021-01-01) for some of the early comments. However, the original comment was slightly misleading when it switched from mentioning paths to only mentioning directories. Fix that, and while at it also point to an example in the code which applies the extra needed care to permit the pointer comparison optimization. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07merge-ort: drop unnecessary temporary in check_for_directory_rename()Elijah Newren1-4/+2
check_for_directory_rename() had a weirdly coded check for whether a strmap contained a certain key. Replace the temporary variable and call to strmap_get_entry() with the more natural strmap_contains() call. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-07merge-ort: update comments to modern testfile locationElijah Newren1-3/+3
In commit 919df3195553 (Collect merge-related tests to t64xx, 2020-08-10), merge related tests were moved from t60xx to t64xx. Some comments in merge-ort relating to some tricky code referenced specific testcases within certain testfiles for additional information, but referred to their historical testfile names; update the testfile names to mention their modern location. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-05Merge branch 'ps/object-file-wo-the-repository'Junio C Hamano1-3/+4
Reduce implicit assumption and dependence on the_repository in the object-file subsystem. * ps/object-file-wo-the-repository: object-file: get rid of `the_repository` in index-related functions object-file: get rid of `the_repository` in `force_object_loose()` object-file: get rid of `the_repository` in `read_loose_object()` object-file: get rid of `the_repository` in loose object iterators object-file: remove declaration for `for_each_file_in_obj_subdir()` object-file: inline `for_each_loose_file_in_objdir_buf()` object-file: get rid of `the_repository` when writing objects odb: introduce `odb_write_object()` loose: write loose objects map via their source object-file: get rid of `the_repository` in `finalize_object_file()` object-file: get rid of `the_repository` in `loose_object_info()` object-file: get rid of `the_repository` when freshening objects object-file: inline `check_and_freshen()` functions object-file: get rid of `the_repository` in `has_loose_object()` object-file: stop using `the_hash_algo` object-file: fix -Wsign-compare warnings
2025-07-23config: drop `git_config_get_bool()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_bool()`. All callsites are adjusted so that they use `repo_config_get_bool(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_int()` wrapperPatrick Steinhardt1-3/+3
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_int()`. All callsites are adjusted so that they use `repo_config_get_int(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_string()` wrapperPatrick Steinhardt1-4/+4
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_string()`. All callsites are adjusted so that they use `repo_config_get_string(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config()` wrapperPatrick Steinhardt1-1/+1
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16odb: introduce `odb_write_object()`Patrick Steinhardt1-3/+4
We do not have a backend-agnostic way to write objects into an object database. While there is `write_object_file()`, this function is rather specific to the loose object format. Introduce `odb_write_object()` to plug this gap. For now, this function is a simple wrapper around `write_object_file()` and doesn't even use the passed-in object database yet. This will change in subsequent commits, where `write_object_file()` is converted so that it works on top of an `odb_source`. `odb_write_object()` will then become responsible for deciding which source an object shall be written to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `repo_read_object_file()`Patrick Steinhardt1-1/+1
Rename `repo_read_object_file()` to `odb_read_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt1-2/+2
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt1-1/+1
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16merge-ort: add a new mergeability_only optionElijah Newren1-7/+31
Git Forges may be interested in whether two branches can be merged while not being interested in what the resulting merge tree is nor which files conflicted. For such cases, add a new mergeability_only option. This option allows the merge machinery to, in the "outer layer" of the merge: * exit upon first[-ish] conflict * avoid (not prevent) writing merged blobs/trees to the object store I have a number of qualifiers there, so let me explain each: "outer layer": Note that since the recursive merge of merge bases (corresponding to call_depth > 0) can conflict without the outer final merge (corresponding to call_depth == 0) conflicting, we can't short-circuit nor avoid writing merged blobs/trees to the object store during those inner merges. "first-ish conflict": The current patch only exits early from process_entries() on the first conflict it detects, but conflicts could have been detected in a previous function call, namely detect_and_process_renames(). However: * conflicts detected by detect_and_process_renames() are quite rare conflict types * the detection would still come after regular rename detection (which is the expensive part of detect_and_process_renames()), so it is not saving us much in computation time given that process_entries() directly follows detect_and_process_renames() * [this overlaps with the next bullet point] process_entries() is the place where virtually all object writing occurs (object writing is sometimes more of a concern for Forges than computation time), so exiting early here isn't saving us much in object writes either * the code changes needed to handle an earlier exit are slightly more invasive in detect_and_process_renames() than for process_entries(). Given the rareness of the even earlier conflicts, the limited savings we'd get from exiting even earlier, and in an attempt to keep this patch simpler, we don't guarantee that we actually exit on the first conflict detected. We can always revisit this decision later if we decide that a further micro-optimization to exit slightly earlier in rare cases is worthwhile. "avoid (not prevent) writing objects": The detect_and_process_renames() call can also write objects to the object store, when rename/rename conflicts involve one (or more) files that have also been modified on both sides. Because of this alternate call path leading to handle_content_merges(), our "early exit" does not prevent writing objects entirely, even within the "outer layer" (i.e. even within call_depth == 0). I figure that's fine though, since we're already writing objects for the inner merges (i.e. for call_depth > 0), which are likely going to represent vastly more objects than files involved in rename/rename+modify/modify cases in the outer merge, on average. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup'Junio C Hamano1-1/+2
Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-17Merge branch 'en/merge-recursive-debug'Junio C Hamano1-3/+159
Remove remnants of the recursive merge strategy backend, which was superseded by the ort merge strategy. * en/merge-recursive-debug: builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM tests: remove GIT_TEST_MERGE_ALGORITHM and test_expect_merge_algorithm merge-recursive.[ch]: thoroughly debug these merge, sequencer: switch recursive merges over to ort sequencer: switch non-recursive merges over to ort merge-ort: enable diff-algorithms other than histogram builtin/merge-recursive: switch to using merge_ort_generic() checkout: replace merge_trees() with merge_ort_nonrecursive()
2025-04-15Merge branch 'ps/object-wo-the-repository'Junio C Hamano1-13/+13
The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-15object-file: split out functions relating to object store subsystemPatrick Steinhardt1-1/+2
While we have the "object-store.h" header, most of the functionality for object stores is actually hosted in "object-file.c". This makes it hard to find relevant functions and causes us to mix up concerns. Split out functions relating to the object store subsystem into a new "object-store.c" file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanupJunio C Hamano1-13/+13
* ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-08merge-recursive.[ch]: thoroughly debug theseElijah Newren1-0/+159
As a wise man once told me, "Deleted code is debugged code!" So, move the functions that are shared between merge-recursive and merge-ort from the former to the latter, and then debug the remainder of merge-recursive.[ch]. Joking aside, merge-ort was always intended to replace merge-recursive. It has numerous advantages over merge-recursive (operates much faster, can operate without a worktree or index, and fixes a number of known bugs and suboptimal merges). Since we have now replaced all callers of merge-recursive with equivalent functions from merge-ort, move the shared functions from the former to the latter, and delete the remainder of merge-recursive.[ch]. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08merge-ort: enable diff-algorithms other than histogramElijah Newren1-3/+0
The ort merge strategy has always used the histogram diff algorithm. The recursive merge strategy, in contrast, defaults to the myers diff algorithm, while allowing it to be changed. Change the ort merge strategy to allow different diff algorithms, by removing the hard coded value in merge_start() and instead just making it a default in init_merge_options(). Technically, this also changes the default diff algorithm for the recursive backend too, but we're going to remove the final callers of the recursive backend in the next two commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08Merge branch 'en/assert-wo-side-effects'Junio C Hamano1-2/+2
Ensure what we write in assert() does not have side effects, and introduce ASSERT() macro to mark those that cannot be mechanically checked for lack of side effects. * en/assert-wo-side-effects: treewide: replace assert() with ASSERT() in special cases ci: add build checking for side-effects in assert() calls git-compat-util: introduce ASSERT() macro
2025-03-29Merge branch 'en/random-cleanups'Junio C Hamano1-5/+5
Miscellaneous code clean-ups. * en/random-cleanups: merge-ort: remove extraneous word in comment merge-ort: fix accidental strset<->strintmap t7615: be more explicit about diff algorithm used t6423: fix a comment that accidentally reversed two commits stash: remove merge-recursive.h include
2025-03-29Merge branch 'en/merge-ort-prepare-to-remove-recursive'Junio C Hamano1-7/+46
First step of deprecating and removing merge-recursive. * en/merge-ort-prepare-to-remove-recursive: am: switch from merge_recursive_generic() to merge_ort_generic() merge-ort: fix merge.directoryRenames=false t3650: document bug when directory renames are turned off merge-ort: support having merge verbosity be set to 0 merge-ort: allow rename detection to be disabled merge-ort: add new merge_ort_generic() function
2025-03-21treewide: replace assert() with ASSERT() in special casesElijah Newren1-2/+2
When the compiler/linker cannot verify that an assert() invocation is free of side effects for us (e.g. because the assertion includes some kind of function call), replace the use of assert() with ASSERT(). Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-18merge-ort: fix merge.directoryRenames=falseElijah Newren1-2/+29
There are two issues here. First, when merge.directoryRenames is set to false, there are a few code paths that should be turned off. I missed one; collect_renames() was still doing some directory rename detection logic unconditionally. It ended up not having much effect because get_provisional_directory_renames() was skipped earlier and not setting up renames->dir_renames, but the code should still be skipped. Second, the larger issue is that sometimes we get a cached_pair rename from a previous commit being replayed mapping A->B, but in a subsequent commit but collect_merge_info() doesn't even recurse into the directory containing B because there are no source pairings for that rename that are relevant; we can merge that commit fine without knowing the rename. But since the cached renames are added to the normal renames, when we go to process it and find that B is not part of opt->priv->paths, we hit the assertion error process_renames: Assertion `newinfo && ~newinfo->merged.clean` failed. I think we could fix this at the beginning of detect_regular_renames() by pruning from cached_pairs any entry whose destination isn't in opt->priv->paths, but it's suboptimal in that we'd kind of like the cached_pair to be restored afterwards so that it can help the subsequent commit, but more importantly since it sits at the intersection of the caching renames optimization and the relevant renames optimization, and the trivial directory resolution optimization, and I don't currently have Documentation/technical/remembering-renames.txt fully paged in, I'm not sure if that's a full solution or a bandaid for the current testcase. However, since the remembering renames optimization was the weakest of the set, and the optimization is far less important when directory rename detection is off (as that implies far fewer potential renames), let's just use a bigger hammer to ensure this special case is fixed: turn off the rename caching. We do the same thing already when we encounter rename/rename(1to1) cases (as per `git grep -3 disabling.the.optimization`, though it uses a slightly different triggering mechanism since it's trying to affect the next time that merge_check_renames_reusable() is called), and I think it makes sense to do the same here. Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-18merge-ort: allow rename detection to be disabledElijah Newren1-0/+5
When merge-ort was written, I did not at first allow rename detection to be disabled, because I suspected that most folks disabling rename detection were doing so solely for performance reasons. Since I put a lot of working into providing dramatic speedups for rename detection performance as used by the merge machinery, I wanted to know if there were still real world repositories where rename detection was problematic from a performance perspective. We have had years now to collect such information, and while we never received one, waiting longer with the option disabled seems unlikely to help surface such issues at this point. Also, there has been at least one request to allow rename detection to be disabled for behavioral rather than performance reasons (see the thread including https://lore.kernel.org/git/CABPp-BG-Nx6SCxxkGXn_Fwd2wseifMFND8eddvWxiZVZk0zRaA@mail.gmail.com/ ), so let's start heeding the config and command line settings. Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-18merge-ort: add new merge_ort_generic() functionElijah Newren1-5/+12
merge-recursive.[ch] have three entry points: * merge_trees() * merge_recursive() * merge_recursive_generic() merge-ort*.[ch] only has equivalents for the first two. Add an equivalent for the final entry point, so we can switch callers to use it and remove merge-recursive.[ch]. While porting it over, finally fix the issue with the label for the ancestor (used when merge.conflictStyle=diff3 as a conflict label). merge-recursive.c has traditionally not allowed callers to set that label, but I have found that problematic for years. (Side note: This function was initially part of the merge-ort rewrite, but reviewers questioned the ancestor label funnyness which I was never really happy with anyway. It resulted in me jettisoning it and hoping at the time that I would eventually be able to force the existing callers to use some other API. That worked with `git stash`, as per 874cf2a60444 (stash: apply stash using 'merge_ort_nonrecursive()', 2022-05-10), but this API is the most reasonable one for `git am` and `git merge-recursive`, if we can just allow them some freedom over the ancestor label.) The merge_recursive_generic() function did not know whether it was being invoked by `git stash`, `git merge-recursive`, or `git am`, and the choice of meaningful ancestor label, when there is a unique ancestor, varies for these different callers: * git am: ancestor is a constructed "fake ancestor" that user knows nothing about and has no access to. (And is different than the normal thing we mean by a "virtual merge base" which is the merging of merge bases.) * git merge-recursive: ancestor might be a tree, but at least it was one specified by the user (if they invoked merge-recursive directly) * git stash: ancestor was the commit serving as the stash base Thus, using a label like "constructed merge base" (as merge_recursive_generic() does) presupposes that `git am` is the only caller; it is incorrect for other callers. This label has thrown me off more than once. Allow the caller to override when there is a unique merge base. Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-17merge-ort: remove extraneous word in commentElijah Newren1-3/+3
"is was" -> "was" Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-17merge-ort: fix accidental strset<->strintmapElijah Newren1-2/+2
Both strset_for_each_entry and strintmap_for_each_entry are macros that evaluate to the same thing, so they are technically interchangeable. However, the intent is that we use the one matching the variable type we are passing. Unfortunately, I somehow mistakenly got one of these wrong in 7bee6c100431 (merge-ort: avoid recursing into directories when we don't need to, 2021-07-16) -- possibly related to the fact that relevant_sources was initially a strset and later refactored into a strintmap. Correct which macro we use. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt1-13/+13
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-06merge-ort: fix slightly overzealous assertion for rename-to-selfElijah Newren1-1/+2
merge-ort has a number of sanity checks on the file it is processing in process_renames(). One of these sanity checks was slightly overzealous because it indirectly assumed that a renamed file always ended up at a different path than where it started. That is normally an entirely fair assumption, but directory rename detection can make things interesting. As a quick refresher, if one side of history renames directory A/ -> B/, and the other side of history adds new files to A/, then directory rename detection notices and suggests moving those new files to B/. A similar thing is done for paths renamed into A/, causing them to be transitively renamed into B/. But, if the file originally came from B/, then this can end up causing a file to be renamed back to itself. It turns out the rest of the code following this assertion handled the case fine; the assertion was just an extra sanity check, not a rigid precondition. Therefore, simply adjust the assertion to pass under this special case as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30diff: improve lifecycle management of diff queuesPatrick Steinhardt1-1/+1
The lifecycle management of diff queues is somewhat confusing: - For most of the part this can be attributed to `DIFF_QUEUE_CLEAR()`, which does not release any memory but rather initializes the queue, only. This is in contrast to our common naming schema, where "clearing" means that we release underlying memory and then re-initialize the data structure such that it is ready to use. - A second offender is `diff_free_queue()`, which does not free the queue structure itself. It is rather a release-style function. Refactor the code to make things less confusing. `DIFF_QUEUE_CLEAR()` is replaced by `DIFF_QUEUE_INIT` and `diff_queue_init()`, while `diff_free_queue()` is replaced by `diff_queue_release()`. While on it, adapt callsites where we call `DIFF_QUEUE_CLEAR()` with the intent to release underlying memory to instead call `diff_queue_clear()` to fix memory leaks. This memory leak is exposed by t4211, but plugging it alone does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-25Merge branch 'ak/typofix-2.46-maint'Junio C Hamano1-2/+2
Typofix. * ak/typofix-2.46-maint: upload-pack: fix a typo sideband: fix a typo setup: fix a typo run-command: fix a typo revision: fix a typo refs: fix typos rebase: fix a typo read-cache-ll: fix a typo pretty: fix a typo object-file: fix a typo merge-ort: fix typos merge-ll: fix a typo http: fix a typo gpg-interface: fix a typo git-p4: fix typos git-instaweb: fix a typo fsmonitor-settings: fix a typo diffcore-rename: fix typos config.mak.dev: fix a typo
2024-09-19merge-ort: fix typosAndrew Kreimer1-2/+2
Fix typos in comments. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05merge-ort: fix two leaks when handling directory rename modificationsPatrick Steinhardt1-1/+3
There are two leaks in `apply_directory_rename_modifications()`: - We do not release the `dirs_to_insert` string list. - We do not release some `conflict_info` we put into the `opt->priv->paths` string map. The former is trivial to fix. The latter is a bit less straight forward: the `util` pointer of the string map may sometimes point to data that has been allocated via `CALLOC()`, while at other times it may point to data that has been allocated via a `mem_pool`. It very much seems like an oversight that we didn't also allocate the conflict info in this code path via the memory pool, though. So let's fix that, which will also plug the memory leak for us. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14merge-ort: unconditionally release attributes indexPatrick Steinhardt1-2/+1
We conditionally release the index used for reading gitattributes in merge-ort based on whether or the index has been populated. This check uses `cache_nr` as a condition. This isn't sufficient though, as the variable may be zero even when some other parts of the index have been populated. This leads to memory leaks when sparse checkouts are in use, as we may not end up releasing the sparse checkout patterns. Fix this issue by unconditionally releasing the index. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-16Merge branch 'en/ort-inner-merge-error-fix'Junio C Hamano1-45/+124
The "ort" merge backend saw one bugfix for a crash that happens when inner merge gets killed, and assorted code clean-ups. * en/ort-inner-merge-error-fix: merge-ort: fix missing early return merge-ort: convert more error() cases to path_msg() merge-ort: upon merge abort, only show messages causing the abort merge-ort: loosen commented requirements merge-ort: clearer propagation of failure-to-function from merge_submodule merge-ort: fix type of local 'clean' var in handle_content_merge () merge-ort: maintain expected invariant for priv member merge-ort: extract handling of priv member into reusable function
2024-07-08Merge branch 'ps/leakfixes-more'Junio C Hamano1-4/+8
More memory leaks have been plugged. * ps/leakfixes-more: (29 commits) builtin/blame: fix leaking ignore revs files builtin/blame: fix leaking prefixed paths blame: fix leaking data for blame scoreboards line-range: plug leaking find functions merge: fix leaking merge bases builtin/merge: fix leaking `struct cmdnames` in `get_strategy()` sequencer: fix memory leaks in `make_script_with_merges()` builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()` apply: fix leaking string in `match_fragment()` sequencer: fix leaking string buffer in `commit_staged_changes()` commit: fix leaking parents when calling `commit_tree_extended()` config: fix leaking "core.notesref" variable rerere: fix various trivial leaks builtin/stash: fix leak in `show_stash()` revision: free diff options builtin/log: fix leaking commit list in git-cherry(1) merge-recursive: fix memory leak when finalizing merge builtin/merge-recursive: fix leaking object ID bases builtin/difftool: plug memory leaks in `run_dir_diff()` object-name: free leaking object contexts ...
2024-07-06merge-ort: fix missing early returnElijah Newren1-0/+1
One of the conversions in f19b9165 (merge-ort: convert more error() cases to path_msg(), 2024-06-19) accidentally lost the early return. Restore it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20merge-ort: convert more error() cases to path_msg()Elijah Newren1-12/+41
merge_submodule() stores errors using path_msg(), whereas other call sites make use of the error() function. This is inconsistent, and moving towards path_msg() seems more friendly for libification efforts since it will allow the caller to determine whether the error messages need to be printed. Note that this deferred handling of error messages changes the error message in a recursive merge from error: failed to execute internal merge to From inner merge: error: failed to execute internal merge which provides a little more information about the error which may be useful. Since the recursive merge strategy still only shows the older error, we had to adjust the new testcase introduced a few commits ago to just search for the older message somewhere in the output. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20merge-ort: upon merge abort, only show messages causing the abortElijah Newren1-25/+53
When something goes wrong enough that we need to abort early and not even attempt merging the remaining files, it probably does not make sense to report conflicts messages for the subset of files we processed before hitting the fatal error. Instead, only show the messages associated with paths where we hit the fatal error. Also, print these messages to stderr rather than stdout. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20merge-ort: loosen commented requirementsElijah Newren1-1/+1
The comment above type_short_descriptions claimed that the order had to match what was found in the conflict_info_and_types enum. Since type_short_descriptions uses designated initializers, the order should not actually matter; I am guessing that positional initializers may have been under consideration when that comment was added, but the comment was not updated when designated initializers were chosen. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20merge-ort: clearer propagation of failure-to-function from merge_submoduleElijah Newren1-0/+2
The 'clean' member variable is somewhat of a tri-state (1 = clean, 0 = conflicted, -1 = failure-to-determine), but we often like to think of it as binary (ignoring the possibility of a negative value) and use constructs like '!clean' to reflect this. However, these constructs can make codepaths more difficult to understand, unless we handle the negative case early and return pre-emptively; do that in handle_content_merge() to make the code a bit easier to read. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20merge-ort: fix type of local 'clean' var in handle_content_merge ()Elijah Newren1-2/+3
handle_content_merge() returns an int. Every caller of handle_content_merge() expects an int. However, we declare a local variable 'clean' that we use for the return value to be unsigned. To make matters worse, we also assign 'clean' the return value of merge_submodule() in one codepath, which is defined to return an int. It seems that the only reason to have 'clean' be unsigned was to allow a cutesy bit manipulation operation to be well-defined. Fix the type of the 'clean' local in handle_content_merge(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20merge-ort: maintain expected invariant for priv memberElijah Newren1-1/+2
The calling convention for the merge machinery is One call to init_merge_options() One or more calls to merge_incore_[non]recursive() One call to merge_finalize() (possibly indirectly via merge_switch_to_result()) Both merge_switch_to_result() and merge_finalize() expect opt->priv == NULL && result->priv != NULL which is supposed to be set up by our move_opt_priv_to_result_priv() function. However, two codepaths dealing with error cases did not execute this necessary logic, which could result in assertion failures (or, if assertions were compiled out, could result in segfaults). Fix the oversight and add a test that would have caught one of these problems. While at it, also tighten an existing test for a non-recursive merge to verify that it fails with appropriate status. Most merge tests in the testsuite check either for success or conflicts; those testing for neither are rare and it is good to ensure they support the invariant assumed by builtin/merge.c in this comment: /* * The backend exits with 1 when conflicts are * left to be resolved, with 2 when it does not * handle the given merge at all. */ So, explicitly check for the exit status of 2 in these cases. Reported-by: Matt Cree <matt.cree@gearset.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20merge-ort: extract handling of priv member into reusable functionElijah Newren1-5/+22
In preparation for a subsequent commit which will ensure we do not forget to maintain our invariants for the priv member in error codepaths, extract the necessary functionality out into a separate function. This change is cosmetic at this point, and introduces no changes beyond an extra assertion sanity check. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+2
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11merge: fix leaking merge basesPatrick Steinhardt1-4/+8
When calling either the recursive or the ORT merge machineries we need to provide a list of merge bases. The ownership of that parameter is then implicitly transferred to the callee, which is somewhat fishy. Furthermore, that list may leak in some cases where the merge machinery runs into an error, thus causing a memory leak. Refactor the code such that we stop transferring ownership. Instead, the merge machinery will now create its own local copies of the passed in list as required if they need to modify the list. Free the list at the callsites as required. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-01Merge branch 'pw/checkout-conflict-errorfix'Junio C Hamano1-1/+2
"git checkout --conflict=bad" reported a bad conflictStyle as if it were given to a configuration variable; it has been corrected to report that the command line option is bad. * pw/checkout-conflict-errorfix: checkout: fix interaction between --conflict and --merge checkout: cleanup --conflict=<style> parsing merge options: add a conflict style member merge-ll: introduce LL_MERGE_OPTIONS_INIT xdiff-interface: refactor parsing of merge.conflictstyle
2024-03-28Merge branch 'eb/hash-transition'Junio C Hamano1-5/+6
Work to support a repository that work with both SHA-1 and SHA-256 hash algorithms has started. * eb/hash-transition: (30 commits) t1016-compatObjectFormat: add tests to verify the conversion between objects t1006: test oid compatibility with cat-file t1006: rename sha1 to oid test-lib: compute the compatibility hash so tests may use it builtin/ls-tree: let the oid determine the output algorithm object-file: handle compat objects in check_object_signature tree-walk: init_tree_desc take an oid to get the hash algorithm builtin/cat-file: let the oid determine the output algorithm rev-parse: add an --output-object-format parameter repository: implement extensions.compatObjectFormat object-file: update object_info_extended to reencode objects object-file-convert: convert commits that embed signed tags object-file-convert: convert commit objects when writing object-file-convert: don't leak when converting tag objects object-file-convert: convert tag objects when writing object-file-convert: add a function to convert trees between algorithms object: factor out parse_mode out of fast-import and tree-walk into in object.h cache: add a function to read an OID of a specific algorithm tag: sign both hashes commit: export add_header_signature to support handling signatures on tags ...
2024-03-14merge options: add a conflict style memberPhillip Wood1-0/+1
Add a conflict_style member to `struct merge_options` and `struct ll_merge_options` to allow callers to override the default conflict style. This will be used in the next commit. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-14merge-ll: introduce LL_MERGE_OPTIONS_INITPhillip Wood1-1/+1
Introduce a macro to initialize `struct ll_merge_options` in preparation for the next commit that will add a new member that needs to be initialized to a non-zero value. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-11Merge branch 'js/merge-base-with-missing-commit'Junio C Hamano1-11/+81
Make sure failure return from merge_bases_many() is properly caught. * js/merge-base-with-missing-commit: merge-ort/merge-recursive: do report errors in `merge_submodule()` merge-recursive: prepare for `merge_submodule()` to report errors commit-reach(repo_get_merge_bases_many_dirty): pass on errors commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors commit-reach(get_octopus_merge_bases): pass on "missing commits" errors commit-reach(repo_get_merge_bases): pass on "missing commits" errors commit-reach(get_merge_bases_many_0): pass on "missing commits" errors commit-reach(merge_bases_many): pass on "missing commits" errors commit-reach(paint_down_to_common): start reporting errors commit-reach(paint_down_to_common): prepare for handling shallow commits commit-reach(repo_in_merge_bases_many): report missing commits commit-reach(repo_in_merge_bases_many): optionally expect missing commits commit-reach(paint_down_to_common): plug two memory leaks
2024-03-09merge-ort/merge-recursive: do report errors in `merge_submodule()`Johannes Schindelin1-0/+5
In 24876ebf68b (commit-reach(repo_in_merge_bases_many): report missing commits, 2024-02-28), I taught `merge_submodule()` to handle errors reported by `repo_in_merge_bases_many()`. However, those errors were not passed through to the callers. That was unintentional, and this commit remedies that. Note that `find_first_merges()` can now also return -1 (because it passes through that return value from `repo_in_merge_bases()`), and this commit also adds the forgotten handling for that scenario. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-07Merge branch 'js/merge-tree-3-trees'Junio C Hamano1-5/+11
"git merge-tree" has learned that the three trees involved in the 3-way merge only need to be trees, not necessarily commits. * js/merge-tree-3-trees: fill_tree_descriptor(): mark error message for translation cache-tree: avoid an unnecessary check Always check `parse_tree*()`'s return value t4301: verify that merge-tree fails on missing blob objects merge-ort: do check `parse_tree()`'s return value merge-tree: fail with a non-zero exit code on missing tree objects merge-tree: accept 3 trees as arguments
2024-02-29commit-reach(repo_get_merge_bases): pass on "missing commits" errorsJohannes Schindelin1-1/+5
The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases()` function (which is also surfaced via the `repo_get_merge_bases()` macro) is aware of that, too. Naturally, there are a lot of callers that need to be adjusted now, too. Next step: adjust the callers of `get_octopus_merge_bases()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28commit-reach(repo_in_merge_bases_many): report missing commitsJohannes Schindelin1-10/+71
Some functions in Git's source code follow the convention that returning a negative value indicates a fatal error, e.g. repository corruption. Let's use this convention in `repo_in_merge_bases()` to report when one of the specified commits is missing (i.e. when `repo_parse_commit()` reports an error). Also adjust the callers of `repo_in_merge_bases()` to handle such negative return values. Note: As of this patch, errors are returned only if any of the specified merge heads is missing. Over the course of the next patches, missing commits will also be reported by the `paint_down_to_common()` function, which is called by `repo_in_merge_bases_many()`, and those errors will be properly propagated back to the caller at that stage. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26merge-ort: turn submodule conflict suggestions into an advicePhilippe Blain1-1/+2
Add a new advice type 'submoduleMergeConflict' for the error message shown when a non-trivial submodule conflict is encountered, which was added in 4057523a40 (submodule merge: update conflict error message, 2022-08-04). That commit mentions making this message an advice as possible future work. The message can now be disabled with the advice mechanism. Update the tests as the expected message now appears on stderr instead of stdout. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23Always check `parse_tree*()`'s return valueJohannes Schindelin1-0/+3
Otherwise we may easily run into serious crashes: For example, if we run `init_tree_desc()` directly after a failed `parse_tree()`, we are accessing uninitialized data or trying to dereference `NULL`. Note that the `parse_tree()` function already takes care of showing an error message. The `parse_tree_indirectly()` and `repo_get_commit_tree()` functions do not, therefore those latter call sites need to show a useful error message while the former do not. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23merge-ort: do check `parse_tree()`'s return valueJohannes Schindelin1-2/+4
The previous commit fixed a bug where a missing tree was reported, but not treated as an error. This patch addresses the same issue for the remaining two callers of `parse_tree()`. This change is not accompanied by a regression test because the code in question is only reached at the `checkout` stage, i.e. after the merge has happened (and therefore the tree objects could only be missing if the disk had gone bad in that short time window, or something similarly tricky to recreate in the test suite). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23merge-tree: fail with a non-zero exit code on missing tree objectsJohannes Schindelin1-3/+4
When `git merge-tree` encounters a missing tree object, it should error out and not continue quietly as if nothing had happened. However, as of time of writing, `git merge-tree` _does_ continue, and then offers the empty tree as result. Let's fix this. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06Merge branch 'jc/comment-style-fixes'Junio C Hamano1-1/+1
Rewrite //-comments to /* comments */ in files whose comments prevalently use the latter. * jc/comment-style-fixes: reftable/pq_test: comment style fix merge-ort.c: comment style fix builtin/worktree: comment style fixes
2024-01-29merge-ort.c: comment style fixJunio C Hamano1-1/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-19refs: convert AUTO_MERGE to become a normal pseudo-refPatrick Steinhardt1-7/+12
In 70c70de616 (refs: complete list of special refs, 2023-12-14) we have inrtoduced a new `is_special_ref()` function that classifies some refs as being special. The rule is that special refs are exclusively read and written via the filesystem directly, whereas normal refs exclucsively go via the refs API. The intent of that commit was to record the status quo so that we know to route reads of such special refs consistently. Eventually, the list should be reduced to its bare minimum of refs which really are special, namely FETCH_HEAD and MERGE_HEAD. Follow up on this promise and convert the AUTO_MERGE ref to become a normal pseudo-ref by using the refs API to both read and write it instead of accessing the filesystem directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren1-1/+0
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren1-2/+0
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-30Merge branch 'wx/merge-ort-comment-typofix'Junio C Hamano1-1/+1
Typofix. * wx/merge-ort-comment-typofix: merge-ort.c: fix typo 'neeed' to 'needed'
2023-10-21merge-ort.c: fix typo 'neeed' to 'needed'王常新1-1/+1
Signed-off-by: 王常新 <wchangxin824@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-18Merge branch 'jc/merge-ort-attr-index-fix'Junio C Hamano1-0/+1
Fix "git merge-tree" to stop segfaulting when the --attr-source option is used. * jc/merge-ort-attr-index-fix: merge-ort: initialize repo in index state
2023-10-09merge-ort: initialize repo in index stateJohn Cai1-0/+1
initialize_attr_index() does not initialize the repo member of attr_index. Starting in 44451a2e5e (attr: teach "--attr-source=<tree>" global option to "git", 2023-05-06), this became a problem because istate->repo gets passed down the call chain starting in git_check_attr(). This gets passed all the way down to replace_refs_enabled(), which segfaults when accessing r->gitdir. Fix this by initializing the repository in the index state. Signed-off-by: John Cai <johncai86@gmail.com> Helped-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02tree-walk: init_tree_desc take an oid to get the hash algorithmEric W. Biederman1-5/+6
To make it possible for git ls-tree to display the tree encoded in the hash algorithm of the oid specified to git ls-tree, update init_tree_desc to take as a parameter the oid of the tree object. Update all callers of init_tree_desc and init_tree_desc_gently to pass the oid of the tree object. Use the oid of the tree object to discover the hash algorithm of the oid and store that hash algorithm in struct tree_desc. Use the hash algorithm in decode_tree_entry and update_tree_entry_internal to handle reading a tree object encoded in a hash algorithm that differs from the repositories hash algorithm. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-16merge-ort: lowercase a few error messagesJeff King1-2/+2
As noted in CodingGuidelines, error messages should not be capitalized. Fix up a few of these that were copied verbatim from merge-recursive to match our modern style. We'll likewise fix up the matching ones from merge-recursive. We care a bit less there, since the hope is that it will eventually go away. But besides being the right thing to do in the meantime, it is necessary for t6406 to pass both with and without GIT_TEST_MERGE_ALGORITHM set (one of our CI jobs sets it to "recursive", which will use the merge-recursive.c code). An alternative would be to use "grep -i" in the test to check the message, but it's nice for the test suite to be be more exact (we'd notice if the capitalization fix regressed). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-14merge-ort: drop unused "opt" parameter from merge_check_renames_reusable()Jeff King1-3/+2
The merge_options parameter has never been used since the function was introduced in 64aceb6d73 (merge-ort: add code to check for whether cached renames can be reused, 2021-05-20). In theory some merge options might impact our decisions here, but that has never been the case so far. Let's drop it to appease -Wunused-parameter; it would be easy to add back later if we need to (there is only one caller). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-14merge-ort: drop unused parameters from detect_and_process_renames()Jeff King1-6/+2
This function takes three trees representing the merge base and both sides of the merge, but never looks at any of them. This is due to f78cf97617 (merge-ort: call diffcore_rename() directly, 2021-02-14). Prior to that commit, we passed pairs of trees to diff_tree_oid(). But after that commit, we collect a custom diff_queue for each pair in the merge_options struct, and just run diffcore_rename() on the result. So the function does not need to know about the original trees at all anymore. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-14merge-ort: stop passing "opt" to read_oid_strbuf()Jeff King1-4/+3
This function doesn't look at its merge_options parameter. It used to pass it down to err(), but that function no longer exists (and didn't look at "opt" anyway). We can drop it here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-14merge-ort: drop custom err() functionJeff King1-23/+5
The merge-ort code has an err() function, but it's really just error() in disguise. It differs in two ways: 1. It takes a "struct merge_options" argument. But the function completely ignores it! We can simply remove it. 2. It formats the error string into a strbuf, prepending "error: ", and then feeds the result into error(). But this is wrong! The error() function already adds the prefix, so we end up with: error: error: Failed to execute internal merge So let's just drop this function entirely and call error() directly, as the functions are otherwise identical (note that they both always return -1). Presumably nobody noticed the bogus messages because they are quite hard to trigger (they are mostly internal errors reading and writing objects). However, one easy trigger is a custom merge driver which dies by signal; we have a test already here, but we were not checking the contents of stderr. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren1-1/+1
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21merge-ll: rename from ll-mergeElijah Newren1-1/+1
A long term (but rather minor) pet-peeve of mine was the name ll-merge.[ch]. I thought it made it harder to realize what stuff was related to merging when I was working on the merge machinery and trying to improve it. Further, back in d1cbe1e6d8a ("hash-ll.h: split out of hash.h to remove dependency on repository.h", 2023-04-22), we have split the portions of hash.h that do not depend upon repository.h into a "hash-ll.h" (due to the recommendation to use "ll" for "low-level" in its name[1], but which I used as a suffix precisely because of my distaste for "ll-merge"). When we discussed adding additional "*-ll.h" files, a request was made that we use "ll" consistently as either a prefix or a suffix. Since it is already in use as both a prefix and a suffix, the only way to do so is to rename some files. Besides my distaste for the ll-merge.[ch] name, let me also note that the files ll-fsmonitor.h, ll-hash.h, ll-merge.h, ll-object-store.h, ll-read-cache.h would have essentially nothing to do with each other and make no sense to group. But giving them the common "ll-" prefix would group them. Using "-ll" as a suffix thus seems just much more logical to me. Rename ll-merge.[ch] to merge-ll.[ch] to achieve this consistency, and to ensure we get a more logical grouping of files. [1] https://lore.kernel.org/git/kl6lsfcu1g8w.fsf@chooglen-macbookpro.roam.corp.google.com/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21repository: remove unnecessary include of path.hElijah Newren1-0/+1
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren1-1/+1
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache*.h: move declarations for read-cache.c functions from cache.hElijah Newren1-0/+1
For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21sparse-index.h: move declarations for sparse-index.c from cache.hElijah Newren1-0/+1
Note in particular that this reverses the decision made in 118a2e8bde0 ("cache: move ensure_full_index() to cache.h", 2021-04-01). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano1-0/+1
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-05-02Merge branch 'en/ort-finalize-after-0-merges-fix'Junio C Hamano1-4/+4
A small API fix to the ort merge strategy backend. * en/ort-finalize-after-0-merges-fix: merge-ort: fix calling merge_finalize() with no intermediate merge
2023-04-24merge-ort: fix calling merge_finalize() with no intermediate mergeElijah Newren1-4/+4
If some code sets up the data structures for a merge, but then never actually performs one before calling merge_finalize(), then merge_finalize() wouldn't notice that result->priv was NULL and return early, resulting in following that NULL pointer and getting a segfault. There is currently no code in the git codebase that does this, but this issue was found during testing of some proposed patches that had the following structure: struct merge_options merge_opt; struct merge_result result; init_merge_options(&merge_opt, the_repository); memset(&result, 0, sizeof(result)); <do N merges, for some value of N> merge_finalize(&merge_opt, &result); where some flags could cause the code to have N=0, i.e. doing no merges. Add a check for result->priv being NULL and return early to avoid a segfault in these kinds of cases. While at it, ensure the FREE_AND_NULL() in the function does something useful with the nulling aspect, namely sets result->priv to NULL rather than a mere temporary. Reported-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24match-trees.h: move declarations for match-trees.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on mem-pool.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on oid-array.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on trace.h & trace2.hElijah Newren1-0/+1
Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-3/+3
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "promisor-remote.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "promisor-remote.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "object-store.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "object-store.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "commit-reach.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "commit-reach.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-30Merge branch 'en/ort-dir-rename-and-symlink-fix'Taylor Blau1-2/+34
Merging a branch with directory renames into a branch that changes the directory to a symlink was mishandled by the ort merge strategy, which has been corrected. * en/ort-dir-rename-and-symlink-fix: merge-ort: fix bug with dir rename vs change dir to symlink
2022-10-27Merge branch 'jk/unused-anno-more'Junio C Hamano1-1/+1
More UNUSED annotation to help using -Wunused option with the compiler. * jk/unused-anno-more: ll-merge: mark unused parameters in callbacks diffcore-pickaxe: mark unused parameters in pickaxe functions convert: mark unused parameter in null stream filter apply: mark unused parameters in noop error/warning routine apply: mark unused parameters in handlers date: mark unused parameters in handler functions string-list: mark unused callback parameters object-file: mark unused parameters in hash_unknown functions mark unused parameters in trivial compat functions update-index: drop unused argc from do_reupdate() submodule--helper: drop unused argc from module_list_compute() diffstat_consume(): assert non-zero length
2022-10-22merge-ort: fix bug with dir rename vs change dir to symlinkElijah Newren1-2/+34
When changing a directory to a symlink on one side of history, and renaming the parent of that directory to a different directory name on the other side, e.g. with this kind of setup: Base commit: Has a file named dir/subdir/file Side1: Rename dir/ -> renamed-dir/ Side2: delete dir/subdir/file, add dir/subdir as symlink Then merge-ort was running into an assertion failure: git: merge-ort.c:2622: apply_directory_rename_modifications: Assertion `ci->dirmask == 0' failed merge-recursive did not have as obvious an issue handling this case, likely because we never fixed it to handle the case from commit 902c521a35 ("t6423: more involved directory rename test", 2020-10-15) where we need to be careful about nested renames when a directory rename occurs (dir/ -> renamed-dir/ implies dir/subdir/ -> renamed-dir/subdir/). However, merge-recursive does have multiple problems with this testcase: * Incorrect stages for the file: merge-recursive omits the stage in the index corresponding to the base stage, making `git status` report "added by us" for renamed-dir/subdir/file instead of the expected "deleted by them". * Poor directory/file conflict handling: For the renamed-dir/subdir symlink, instead of reporting a file/directory conflict as expected, it reports "Error: Refusing to lose untracked file at renamed-dir/subdir". This is a lie because there is no untracked file at that location. It then does the normal suboptimal merge-recursive thing of having the symlink be tracked in the index at a location where it can't be written due to D/F conflicts (namely, renamed-dir/subdir), but writes it to the working tree at a different location as a new untracked file (namely, renamed-dir/subdir~B^0) Technically, these problems don't prevent the user from resolving the merge if they can figure out to ignore the confusion, but because both pieces of output are quite confusing I don't want to modify the test to claim the recursive also passes it even if it doesn't have the bug that ort did. So, fix the bug in ort by splitting the conflict_info for "dir/subdir" into two, one for the directory part, one for the file (i.e. symlink) part, since the symlink is being renamed by directory rename detection. The directory part is needed for proper nesting, since there are still conflict_info fields for files underneath it (though those are marked as is_null, they are still present until the entries are processed, and the entry processing wants every non-toplevel entry to have a parent directory). Reported-by: Stefano Rivera <stefano@rivera.za.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17string-list: mark unused callback parametersJeff King1-1/+1
String-lists may be used with callbacks for clearing or iteration. These callbacks need to conform to a particular interface, even though not every callback needs all of its parameters. Mark the unused ones to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-28merge-ort: return early when failing to write a blobJohannes Schindelin1-9/+19
In the previous commit, we fixed a segmentation fault when a tree object could not be written. However, before the tree object is written, `merge-ort` wants to write out a blob object (except in cases where the merge results in a blob that already exists in the database). And this can fail, too, but we ignore that write failure so far. Let's pay close attention and error out early if the blob could not be written. This reduces the error output of t4301.25 ("merge-ort fails gracefully in a read-only repository") from: error: insufficient permission for adding an object to repository database ./objects error: error: Unable to add numbers to database error: insufficient permission for adding an object to repository database ./objects error: error: Unable to add greeting to database error: insufficient permission for adding an object to repository database ./objects fatal: failure to merge to: error: insufficient permission for adding an object to repository database ./objects error: error: Unable to add numbers to database fatal: failure to merge This is _not_ just a cosmetic change: Even though one might assume that the operation would have failed anyway at the point when the new tree object is written (and the corresponding tree object _will_ be new if it contains a blob that is new), but that is not so: As pointed out by Elijah Newren, when Git has previously been allowed to add loose objects via `sudo` calls, it is very possible that the blob object cannot be written (because the corresponding `.git/objects/??/` directory may be owned by `root`) but the tree object can be written (because the corresponding objects directory is owned by the current user). This would result in a corrupt repository because it is missing the blob object, and with this here patch we prevent that. Note: This patch adjusts two variable declarations from `unsigned` to `int` because their purpose is to hold the return value of `handle_content_merge()`, which is of type `int`. The existing users of those variables are only interested whether that variable is zero or non-zero, therefore this type change does not affect the existing code. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-28merge-ort: fix segmentation fault in read-only repositoriesJohannes Schindelin1-25/+41
If the blob/tree objects cannot be written, we really need the merge operations to fail, and not to continue (and then try to access the tree object which is however still set to `NULL`). Let's stop ignoring the return value of `write_object_file()` and `write_tree()` and set `clean = -1` in the error case. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-29Merge branch 'en/ort-unused-code-removal'Junio C Hamano1-20/+2
Code clean-up. * en/ort-unused-code-removal: merge-ort: remove code obsoleted by other changes
2022-08-25Merge branch 'en/submodule-merge-messages-fixes'Junio C Hamano1-9/+116
Further update the help messages given while merging submodules. * en/submodule-merge-messages-fixes: merge-ort: provide helpful submodule update message when possible merge-ort: avoid surprise with new sub_flag variable merge-ort: remove translator lego in new "submodule conflict suggestion" submodule merge: update conflict error message
2022-08-19merge-ort: remove code obsoleted by other changesElijah Newren1-20/+2
Commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) added some code for merge-ort to handle conflicted and skip_worktree entries in general. Included in this was an ugly hack for dealing with present-despite-skipped entries and a testcase (t6428.2) specific to that hack, since at that time users could accidentally get files into that state when using a sparse checkout. However, with the merging of 82386b4496 ("Merge branch 'en/present-despite-skipped'", 2022-03-09), that class of problems was addressed globally and in a much cleaner way. As such, the present-despite-skipped hack in merge-ort is no longer needed and can simply be removed. No additional testcase is needed here; t6428.2 was written to test the necessary functionality and is being kept. The fact that this test continues to pass despite the code being removed shows that the extra code is no longer necessary. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18merge-ort: provide helpful submodule update message when possibleElijah Newren1-11/+7
In commit 4057523a40 ("submodule merge: update conflict error message", 2022-08-04), a more detailed message was provided when submodules conflict, in order to help users know how to resolve those conflicts. There were a couple situations for which a different message would be more appropriate, but that commit left handling those for future work. Unfortunately, that commit would check if any submodules were of the type that it didn't know how to explain, and, if so, would avoid providing the more detailed explanation even for the submodules it did know how to explain. Change this to have the code print the helpful messages for the subset of submodules it knows how to explain. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18merge-ort: avoid surprise with new sub_flag variableElijah Newren1-1/+1
Commit 4057523a40 ("submodule merge: update conflict error message", 2022-08-04) added a sub_flag variable that is used to store a value from enum conflict_and_info_types, but initializes it with a value of -1 that does not correspond to any of the conflict_and_info_types. The code may never set it to a valid value and yet still use it, which can be surprising when reading over the code at first. Initialize it instead to the generic CONFLICT_SUBMODULE_FAILED_TO_MERGE value, which is still distinct from the two values we need to special case. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18merge-ort: remove translator lego in new "submodule conflict suggestion"Elijah Newren1-60/+28
In commit 4057523a40 ("submodule merge: update conflict error message", 2022-08-04), the new "submodule conflict suggestion" code was translating 6 different pieces of the new message and then used carefully crafted logic to allow stitching it back together with special formatting. Keep the components of the message together as much as possible, so that: * we reduce the number of things translators have to translate * translators have more control over the format of the output * the code is much easier for developers to understand too Also, reformat some comments running beyond the 80th column while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08Merge branch 'js/ort-clean-up-after-failed-merge'Junio C Hamano1-0/+5
Plug memory leaks in the failure code path in the "merge-ort" merge strategy backend. * js/ort-clean-up-after-failed-merge: merge-ort: do leave trace2 region even if checkout fails merge-ort: clean up after failed merge
2022-08-04submodule merge: update conflict error messageCalvin Wan1-9/+152
When attempting to merge in a superproject with conflicting submodule pointers that cannot be fast-forwarded or trivially resolved, the merge fails and Git prints an error message that accurately describes the failure, but does not provide steps for the user to resolve the error. Git is left in a conflicted state, which requires the user to: 1. merge submodules or update submodules to an already existing commit that reflects the merge 2. add submodules changes to the superproject 3. finish merging superproject These steps are non-obvious for newer submodule users to figure out based on the error message and neither `git submodule status` nor `git status` provide any useful pointers. Update error message to provide steps to resolve submodule merge conflict. Future work could involve adding an advice flag to the message. Although the message is long, it also has the id of the submodule commit that needs to be merged, which could be useful information for the user. Additionally, 5 merge failures that resulted in an early return have been updated to reflect the status of the merge. 1. Null merge base (null o): CONFLICT_SUBMODULE_NULL_MERGE_BASE added as a new conflict type and will print updated error message. 2. Null merge side a (null a): BUG(). See [1] for discussion 3. Null merge side b (null b): BUG(). See [1] for discussion 4. Submodule not checked out: added NEEDSWORK bit 5. Submodule commits not present: added NEEDSWORK bit The errors with a NEEDSWORK bit deserve a more detailed explanation of how to resolve them. See [2] for more context. [1] https://lore.kernel.org/git/CABPp-BE0qGwUy80dmVszkJQ+tcpfLRW0OZyErymzhZ9+HWY1mw@mail.gmail.com/ [2] https://lore.kernel.org/git/xmqqpmhjjwo9.fsf@gitster.g/ Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-31merge-ort: do leave trace2 region even if checkout failsJohannes Schindelin1-0/+3
In 557ac0350d9 (merge-ort: begin performance work; instrument with trace2_region_* calls, 2021-01-23), we added Trace2 instrumentation, but in the error path that returns early, we forgot to tell Trace2 that we're leaving the region. Let's fix that. Pointed-out-by: Elijah Newren <newren@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-31merge-ort: clean up after failed mergeJohannes Schindelin1-0/+2
In 9fefce68dc8 (merge-ort: basic outline for merge_switch_to_result(), 2020-12-13), we added functionality to lay down the result of a merge on disk. But we forgot to release the data structures in case `unpack_trees()` failed to run properly. This was pointed out by the `linux-leaks` job in our CI runs. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18Merge branch 'en/merge-dual-dir-renames-fix'Junio C Hamano1-26/+48
Fixes a long-standing corner case bug around directory renames in the merge-ort strategy. * en/merge-dual-dir-renames-fix: merge-ort: fix issue with dual rename and add/add conflict merge-ort: shuffle the computation and cleanup of potential collisions merge-ort: make a separate function for freeing struct collisions merge-ort: small cleanups of check_for_directory_rename t6423: add tests of dual directory rename plus add/add conflict
2022-07-06merge-ort: fix issue with dual rename and add/add conflictElijah Newren1-1/+7
There is code in both merge-recursive and merge-ort for avoiding doubly transitive renames (i.e. one side renames directory A/ -> B/, and the other side renames directory B/ -> C/), because this combination would otherwise make a mess for new files added to A/ on the first side and wondering which directory they end up in -- especially if there were even more renames such as the first side renaming C/ -> D/. In such cases, it just turns "off" directory rename detection for the higher order transitive cases. The testcases added in t6423 a couple commits ago are slightly different but similar in principle. They involve a similar case of paired renaming but instead of A/ -> B/ and B/ -> C/, the second side renames a leading directory of B/ to C/. And both sides add a new file somewhere under the directory that the other side will rename. While the new files added start within different directories and thus could logically end up within different directories, it is weird for a file on one side to end up where the other one started and not move along with it. So, let's just turn off directory rename detection in this case as well. Another way to look at this is that if the source name involved in a directory rename on one side is the target name of a directory rename operation for a file from the other side, then we avoid the doubly transitive rename. (More concretely, if a directory rename on side D wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D already had a file named NEW_NAME, and a directory rename on side E wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the directory rename detection for NEW_NAME to prevent the NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add conflict on NEW_NAME.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06merge-ort: shuffle the computation and cleanup of potential collisionsElijah Newren1-5/+15
Run compute_collisions() for renames on both sides of history before any calls to collect_renames(), and do not free the computed collisions until after both calls to collect_renames(). This is just a code reorganization at this point that doesn't make sense on its own, but will permit us to use the computed collision info from both sides within each call to collect_renames() in a subsequent commit. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06merge-ort: make a separate function for freeing struct collisionsElijah Newren1-16/+22
This commit makes no functional changes, it's just some code movement in preparation for later changes. Signed-off-by: Elijah Newren <newren@palantir.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06merge-ort: small cleanups of check_for_directory_renameElijah Newren1-6/+6
No functional changes, just some preparatory cleanups. Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@palantir.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22merge-ort: optionally produce machine-readable outputElijah Newren1-2/+20
With the new `detailed` parameter, a new mode can be triggered when displaying the merge messages: The `detailed` mode prints NUL-delimited fields of the following form: <path-count> NUL <path>... NUL <conflict-type> NUL <message> The `<path-count>` field determines how many `<path>` fields there are. The intention of this mode is to support server-side operations, where worktree-less merges can lead to conflicts and depending on the type and/or path count, the caller might know how to handle said conflict. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22merge-ort: store more specific conflict informationElijah Newren1-55/+212
It is all fine and dandy for a regular Git command that is intended to be run interactively to produce a bunch of messages upon an error. However, in `merge-ort`'s case, we want to call the command e.g. in server-side software, where the actual error messages are not quite as interesting as machine-readable, immutable terms that describe the exact nature of any given conflict. With this patch, the `merge-ort` machinery records the exact type (as specified via an `enum` value) as well as the involved path(s) together with the conflict's message. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22merge-ort: make `path_messages` a strmap to a string_listJohannes Schindelin1-33/+1
This allows us once again to get away with less data copying. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22merge-ort: store messages in a list, not in a single strbufJohannes Schindelin1-43/+80
To prepare for using the `merge-ort` machinery in server operations, we cannot simply produce a free-form string that combines a variable-length list of messages. Instead, we need to list them one by one. The natural fit for this is a `string_list`. We will subsequently add even more information in the `util` attribute of the string list items. Based-on-a-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22merge-ort: remove command-line-centric submodule message from merge-ortElijah Newren1-8/+1
There was one case in merge-ort that would call path_msg() multiple times for the same logical conflict, and it was in order to give advice about how to resolve a conflict. This advice does not make as much sense with remerge-diff, or with merge-tree being invoked by a GitHub GUI for resolution of messages, and is making it hard to provide which-logical-conflict-affects-which-paths information in a machine parseable way to a higher level caller of merge-tree. Let's simply remove this informational message. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22merge-ort: provide a merge_get_conflicted_files() helper functionElijah Newren1-0/+31
After a merge, this function allows the user to extract the same information that would be printed by `ls-files -u`, which means files with their mode, oid, and stage. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22merge-ort: split out a separate display_update_messages() functionElijah Newren1-37/+41
This patch includes no new code; it simply moves a bunch of lines into a new function. As such, there are no functional changes. This is just a preparatory step to allow the printed messages to be handled differently by other callers, such as in `git merge-tree --write-tree`. (Patch best viewed with --color-moved --color-moved-ws=allow-indentation-change to see that it is a simple code movement.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07Merge branch 'ab/plug-leak-in-revisions'Junio C Hamano1-0/+1
Plug the memory leaks from the trickiest API of all, the revision walker. * ab/plug-leak-in-revisions: (27 commits) revisions API: add a TODO for diff_free(&revs->diffopt) revisions API: have release_revisions() release "topo_walk_info" revisions API: have release_revisions() release "date_mode" revisions API: call diff_free(&revs->pruning) in revisions_release() revisions API: release "reflog_info" in release revisions() revisions API: clear "boundary_commits" in release_revisions() revisions API: have release_revisions() release "prune_data" revisions API: have release_revisions() release "grep_filter" revisions API: have release_revisions() release "filter" revisions API: have release_revisions() release "cmdline" revisions API: have release_revisions() release "mailmap" revisions API: have release_revisions() release "commits" revisions API users: use release_revisions() for "prune_data" users revisions API users: use release_revisions() with UNLEAK() revisions API users: use release_revisions() in builtin/log.c revisions API users: use release_revisions() in http-push.c revisions API users: add "goto cleanup" for release_revisions() stash: always have the owner of "stash_info" free it revisions API users: use release_revisions() needing REV_INFO_INIT revision.[ch]: document and move code declared around "init" ...
2022-05-20Merge branch 'ep/maint-equals-null-cocci'Junio C Hamano1-2/+2
Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02Merge branch 'ep/maint-equals-null-cocci' for maint-2.35Junio C Hamano1-2/+2
* ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02tree-wide: apply equals-null.cocciJunio C Hamano1-2/+2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API users: add straightforward release_revisions()Ævar Arnfjörð Bjarmason1-0/+1
Add a release_revisions() to various users of "struct rev_list" in those straightforward cases where we only need to add the release_revisions() call to the end of a block, and don't need to e.g. refactor anything to use a "goto cleanup" pattern. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-16Merge branch 'ab/string-list-count-in-size-t'Junio C Hamano1-2/+2
Count string_list items in size_t, not "unsigned int". * ab/string-list-count-in-size-t: string-list API: change "nr" and "alloc" to "size_t" gettext API users: don't explicitly cast ngettext()'s "n"
2022-03-16Merge branch 'ab/object-file-api-updates'Junio C Hamano1-2/+2
Object-file API shuffling. * ab/object-file-api-updates: object-file API: pass an enum to read_object_with_reference() object-file.c: add a literal version of write_object_file_prepare() object-file API: have hash_object_file() take "enum object_type" object API: rename hash_object_file_literally() to write_*() object-file API: split up and simplify check_object_signature() object API users + docs: check <0, not !0 with check_object_signature() object API docs: move check_object_signature() docs to cache.h object API: correct "buf" v.s. "map" mismatch in *.c and *.h object-file API: have write_object_file() take "enum object_type" object-file API: add a format_object_header() function object-file API: return "void", not "int" from hash_object_file() object-file.c: split up declaration of unrelated variables
2022-03-13Merge branch 'en/merge-ort-align-verbosity-with-recursive'Junio C Hamano1-2/+3
Align the level of verbose output from the ort backend during inner merge to that of the recursive backend. * en/merge-ort-align-verbosity-with-recursive: merge-ort: exclude messages from inner merges by default
2022-03-07string-list API: change "nr" and "alloc" to "size_t"Ævar Arnfjörð Bjarmason1-2/+2
Change the "nr" and "alloc" members of "struct string_list" to use "size_t" instead of "nr". On some platforms the size of an "unsigned int" will be smaller than a "size_t", e.g. a 32 bit unsigned v.s. 64 bit unsigned. As "struct string_list" is a generic API we use in a lot of places this might cause overflows. As one example: code in "refs.c" keeps track of the number of refs with a "size_t", and auxiliary code in builtin/remote.c in get_ref_states() appends those to a "struct string_list". While we're at it split the "nr" and "alloc" in string-list.h across two lines, which is the case for most such struct member declarations (e.g. in "strbuf.h" and "strvec.h"). Changing e.g. "int i" to "size_t i" in run_and_feed_hook() isn't strictly necessary, and there are a lot more cases where we'll use a local "int", "unsigned int" etc. variable derived from the "nr" in the "struct string_list". But in that case as well as add_wrapped_shortlog_msg() in builtin/shortlog.c we need to adjust the printf format referring to "nr" anyway, so let's also change the other variables referring to it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-06Merge branch 'en/merge-ort-plug-leaks'Junio C Hamano1-19/+17
Leakfix. * en/merge-ort-plug-leaks: merge-ort: fix small memory leak in unique_path() merge-ort: fix small memory leak in detect_and_process_renames()
2022-03-01merge-ort: exclude messages from inner merges by defaultElijah Newren1-2/+3
merge-recursive would only report messages from inner merges when the GIT_MERGE_VERBOSITY was set to 5. Do the same for merge-ort. Note that somewhat reverts 0d83d8240d ("merge-ort: mark conflict/warning messages from inner merges as omittable", 2022-02-02) based on two facts: * This commit basically removes the showing of messages from inner merges as well, at least by default. The only difference is that users can request to get them back by turning up the verbosity. * Messages from inner merges are specially annotated since 4a3d86e1bb ("merge-ort: make informational messages from recursive merges clearer", 2022-02-17). The ability to distinguish them from outer merge comments make them less problematic to include, and easier for humans to parse. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25object-file API: have write_object_file() take "enum object_type"Ævar Arnfjörð Bjarmason1-2/+2
Change the write_object_file() function to take an "enum object_type" instead of a "const char *type". Its callers either passed {commit,tree,blob,tag}_type and can pass the corresponding OBJ_* type instead, or were hardcoding strings like "blob". This avoids the back & forth fragility where the callers of write_object_file() would have the enum type, and convert it themselves via type_name(). We do have to now do that conversion ourselves before calling write_object_file_prepare(), but those codepaths will be similarly adjusted in subsequent commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-20merge-ort: fix small memory leak in unique_path()Elijah Newren1-9/+12
The struct strmap paths member of merge_options_internal is perhaps the most central data structure to all of merge-ort. Because all the paths involved in the merge need to be kept until the merge is complete, this "paths" data structure traditionally took responsibility for owning all the allocated paths. When the merge is over, those paths were free()d as part of free()ing this strmap. In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using memory pools, 2021-07-30), we changed the allocations for pathnames to come from a memory pool. That meant the ownership changed slightly; there were no individual free() calls to make, instead the memory pool owned all those paths and they were free()d all at once. Unfortunately unique_path() was written presuming the pre-memory-pool model, and allocated a path on the heap and left it in the strmap for later free()ing. Modify it to return a path allocated from the memory pool instead. Note that there's one instance -- in record_conflicted_index_entries() -- where the returned string from unique_path() was only used very temporarily and thus had been immediately free()'d. This codepath was associated with an ugly skip-worktree workaround that has since been better fixed by the in-flight en/present-despite-skipped topic. This workaround probably makes sense to excise once that topic merges down, but for now, just remove the immediate free() and allow the returned string to be free()d when the memory pool is released. This fixes the following memory leak as reported by valgrind: ==PID== 65 bytes in 1 blocks are definitely lost in loss record 79 of 134 ==PID== at 0xADDRESS: malloc ==PID== by 0xADDRESS: realloc ==PID== by 0xADDRESS: xrealloc (wrapper.c:126) ==PID== by 0xADDRESS: strbuf_grow (strbuf.c:98) ==PID== by 0xADDRESS: strbuf_vaddf (strbuf.c:394) ==PID== by 0xADDRESS: strbuf_addf (strbuf.c:335) ==PID== by 0xADDRESS: unique_path (merge-ort.c:733) ==PID== by 0xADDRESS: process_entry (merge-ort.c:3678) ==PID== by 0xADDRESS: process_entries (merge-ort.c:4037) ==PID== by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4621) ==PID== by 0xADDRESS: merge_ort_internal (merge-ort.c:4709) ==PID== by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760) ==PID== by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57) ==PID== by 0xADDRESS: try_merge_strategy (merge.c:753) Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-20merge-ort: fix small memory leak in detect_and_process_renames()Elijah Newren1-10/+5
detect_and_process_renames() detects renames on both sides of history and then combines these into a single diff_queue_struct. The combined diff_queue_struct needs to be able to hold the renames found on either side, and since it knows the (maximum) size it needs, it pre-emptively grows the array to the appropriate size: ALLOC_GROW(combined.queue, renames->pairs[1].nr + renames->pairs[2].nr, combined.alloc); It then collects the items from each side: collect_renames(opt, &combined, MERGE_SIDE1, ...) collect_renames(opt, &combined, MERGE_SIDE2, ...) Note, though, that collect_renames() sometimes determines that some pairs are unnecessary and does not include them in the combined array. When it is done, detect_and_process_renames() frees this memory: if (combined.nr) { ... free(combined.queue); } The problem is that sometimes even when there are pairs, none of them are necessary. Instead of checking combined.nr, just remove the if-check; free() knows to skip NULL pointers. This change fixes the following memory leak, as reported by valgrind: ==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134 ==PID== at 0xADDRESS: malloc ==PID== by 0xADDRESS: realloc ==PID== by 0xADDRESS: xrealloc (wrapper.c:126) ==PID== by 0xADDRESS: detect_and_process_renames (merge-ort.c:3134) ==PID== by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4610) ==PID== by 0xADDRESS: merge_ort_internal (merge-ort.c:4709) ==PID== by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760) ==PID== by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57) ==PID== by 0xADDRESS: try_merge_strategy (merge.c:753) ==PID== by 0xADDRESS: cmd_merge (merge.c:1676) ==PID== by 0xADDRESS: run_builtin (git.c:461) ==PID== by 0xADDRESS: handle_builtin (git.c:713) ==PID== by 0xADDRESS: run_argv (git.c:780) ==PID== by 0xADDRESS: cmd_main (git.c:911) ==PID== by 0xADDRESS: main (common-main.c:52) Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-17merge-ort: make informational messages from recursive merges clearerElijah Newren1-0/+5
This is another simple change with a long explanation... merge-recursive and merge-ort are both based on the same recursive idea: if there is more than one merge base, merge the merge bases (which may require first merging the merge bases of the merges bases, etc.). The depth of the inner merge is recorded via a variable called "call_depth", which we'll bring up again later. Naturally, the inner merges themselves can have conflicts and various messages generated about those files. merge-recursive immediately prints to stdout as it goes, at the risk of printing multiple conflict notices for the same path separated far apart from each other with many intervenining conflict notices for other paths between them. And this is true even if there are no inner merges involved. An example of this was given in [1] and apparently caused some confusion: CONFLICT (rename/add): Rename A->B in HEAD. B added in otherbranch ...dozens of conflicts for OTHER paths... CONFLICT (content): Merge conflicts in B In contrast, merge-ort collects messages and stores them by path so that it can print them grouped by path. Thus, the same case handled by merge-ort would have output of the form: CONFLICT (rename/add): Rename A->B in HEAD. B added in otherbranch CONFLICT (content): Merge conflicts in B ...dozens of conflicts for OTHER paths... This is generally helpful, but does make a separate bug more problematic. In particular, while merge-recursive might report the following for a recursive merge: Auto-merging dir.c Auto-merging midx.c CONFLICT (content): Merge conflict in midx.c Auto-merging diff.c Auto-merging dir.c CONFLICT (content): Merge conflict in dir.c merge-ort would instead report: Auto-merging diff.c Auto-merging dir.c Auto-merging dir.c CONFLICT (content): Merge conflict in dir.c Auto-merging midx.c CONFLICT (content): Merge conflict in midx.c The fact that messages for the same file are together is probably helpful in general, but with the indentation missing for the inner merge it unfortunately serves to confuse. This probably would lead users to wonder: * Why is Git reporting that "dir.c" is being merged twice? * If midx.c has conflicts, why do I not see any when I open up the file and why are no conflicts shown in the index? Fix this output confusion by changing the output to clearly differentiate the messages for outer merges from the ones for inner merges, changing the above output from merge-ort to: Auto-merging diff.c From inner merge: Auto-merging dir.c Auto-merging dir.c CONFLICT (content): Merge conflict in dir.c From inner merge: Auto-merging midx.c From inner merge: CONFLICT (content): Merge conflict in midx.c (Note: the number of spaces after the 'From inner merge:' is 2*call_depth). One other thing to note here, that I didn't notice until typing up this commit message, is that merge-recursive does not print any messages from the inner merges by default; the extra verbosity has to be requested. merge-ort currently has no verbosity controls and always prints these. We may also want to change that, but for now, just make the output clearer with these extra markings and indentation. [1] https://lore.kernel.org/git/CAGyf7-He4in8JWUh9dpAwvoPkQz9hr8nCBpxOxhZEd8+jtqTpg@mail.gmail.com/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16Merge branch 'en/remerge-diff'Junio C Hamano1-5/+50
"git log --remerge-diff" shows the difference from mechanical merge result and the result that is actually recorded in a merge commit. * en/remerge-diff: diff-merges: avoid history simplifications when diffing merges merge-ort: mark conflict/warning messages from inner merges as omittable show, log: include conflict/warning messages in --remerge-diff headers diff: add ability to insert additional headers for paths merge-ort: format messages slightly different for use in headers merge-ort: mark a few more conflict messages as omittable merge-ort: capture and print ll-merge warnings in our preferred fashion ll-merge: make callers responsible for showing warnings log: clean unneeded objects during `log --remerge-diff` show, log: provide a --remerge-diff capability
2022-02-09Merge branch 'en/plug-leaks-in-merge'Junio C Hamano1-5/+5
Leakfix. * en/plug-leaks-in-merge: merge: fix memory leaks in cmd_merge() merge-ort: fix memory leak in merge_ort_internal()
2022-02-09Merge branch 'en/merge-ort-restart-optim-fix'Junio C Hamano1-0/+4
The merge-ort misbehaved when merge.renameLimit configuration is set too low and failed to find all renames. * en/merge-ort-restart-optim-fix: merge-ort: avoid assuming all renames detected
2022-02-02merge-ort: mark conflict/warning messages from inner merges as omittableElijah Newren1-1/+3
A recursive merge involves merging the merge bases of the two branches being merged. Such an inner merge can itself generate conflict notices. While such notices may be useful when initially trying to create a merge, they seem to just be noise when investigating merges later with --remerge-diff. (Especially when both sides of the outer merge resolved the conflict the same way leading to no overall conflict.) Remove them. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02show, log: include conflict/warning messages in --remerge-diff headersElijah Newren1-0/+1
Conflicts such as modify/delete, rename/rename, or file/directory are not representable via content conflict markers, and the normal output messages notifying users about these were dropped with --remerge-diff. While we don't want these messages randomly shown before the commit and diff headers, we do want them to still be shown; include them as part of the diff headers instead. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02merge-ort: format messages slightly different for use in headersElijah Newren1-2/+40
When users run git show --remerge-diff $MERGE_COMMIT or git log -p --remerge-diff ... stdout is not an appropriate location to dump conflict messages, but we do want to provide them to users. We will include them in the diff headers instead...but for that to work, we need for any multiline messages to replace newlines with both a newline and a space. Add a new flag to signal when we want these messages modified in such a fashion, and use it in path_msg() to modify these messages this way. Also, allow a special prefix to be specified for these headers. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02merge-ort: mark a few more conflict messages as omittableElijah Newren1-2/+2
path_msg() has the ability to mark messages as omittable, designed for remerge-diff where we'll instead be showing conflict messages as diff headers for a subsequent diff. While all these messages are very useful when trying to create a merge initially, early use with the --remerge-diff feature (the only user of this omittable conflict message capability), suggests that the particular messages marked in this commit are just noise when trying to see what changes users made to create a merge commit. Mark them as omittable. Note that there were already a few messages marked as omittable in merge-ort when doing a remerge-diff, because the development of --remerge-diff preceded the upstreaming of merge-ort and I was trying to ensure merge-ort could handle all the necessary requirements. See commit c5a6f65527 ("merge-ort: add modify/delete handling and delayed output processing", 2020-12-03) for the initial details. For some examples of already-marked-as-omittable messages, see either "Auto-merging <path>" or some of the submodule update hints. This commit just adds two more messages that should also be omittable. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02merge-ort: capture and print ll-merge warnings in our preferred fashionElijah Newren1-2/+3
Instead of immediately printing ll-merge warnings to stderr, we save them in our output strbuf. Besides allowing us to move these warnings to a special file for --remerge-diff, this has two other benefits for regular merges done by merge-ort: * The deferral of messages ensures we can print all messages about any given path together (merge-recursive was known to sometimes intersperse messages about other paths, particularly when renames were involved). * The deferral of messages means we can avoid printing spurious conflict messages when we just end up aborting due to local user modifications in the way. (In contrast to merge-recursive.c which prematurely checks for local modifications in the way via unpack_trees() and gets the check wrong both in terms of false positives and false negatives relative to renames, merge-ort does not perform the local modifications in the way check until the checkout() step after the full merge has been computed.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02ll-merge: make callers responsible for showing warningsElijah Newren1-1/+4
Since some callers may want to send warning messages to somewhere other than stdout/stderr, stop printing "warning: Cannot merge binary files" from ll-merge and instead modify the return status of ll_merge() to indicate when a merge of binary files has occurred. Message printing probably does not belong in a "low-level merge" anyway. This commit continues printing the message as-is, just from the callers instead of within ll_merge(). Future changes will start handling the message differently in the merge-ort codepath. There was one special case here: the callers in rerere.c do NOT check for and print such a message; since those code paths explicitly skip over binary files, there is no reason to check for a return status of LL_MERGE_BINARY_CONFLICT or print the related message. Note that my methodology included first modifying ll_merge() to return a struct, so that the compiler would catch all the callers for me and ensure I had modified all of them. After modifying all of them, I then changed the struct to an enum. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-21merge-ort: fix memory leak in merge_ort_internal()Elijah Newren1-5/+5
The documentation for merge_incore_recursive(), modelled after merge_recursive(), notes that merge_bases will be consumed (emptied) so make a copy if you need it However, in merge_ort_internal() (which merge_incore_recursive() calls), it runs merged_merge_bases = pop_commit(&merge_bases); ... for (iter = merge_bases; iter; iter = iter->next) { ... } In other words, it only consumes the *first* entry of merge_bases, and the rest it iterates through. If it iterated through all of them, the caller could be responsible for free'ing the memory. If it consumed all of them, the current documentation would be correct and the callers would need to do nothing. The current middle ground makes it impossible for callers to avoid memory leaks, since any attempt to use the merge_bases it passes in would result in a use-after-free. It turns out this part of the code was copied from merge-recursive.c, which has had the same bug for 15.5 years. However, since we are trying to keep merge-recursive.c stable as we sunset it, let's just fix the leak in in merge_ort_internal() by having it actually consume all the elements of the merge_bases commit_list. Testing this commit against t6404 (the first testcase specifically about recursive merges) under valgrind shows that this patch fixes the following leak: 32 (16 direct, 16 indirect) bytes in 1 blocks are definitely lost \ in loss record 49 of 126 at 0x484086F: malloc (vg_replace_malloc.c:380) by 0x69FFEB: do_xmalloc (wrapper.c:41) by 0x6A0073: xmalloc (wrapper.c:62) by 0x52A72D: commit_list_insert (commit.c:556) by 0x47EC86: try_merge_strategy (merge.c:751) by 0x48143B: cmd_merge (merge.c:1679) by 0x40686E: run_builtin (git.c:464) by 0x406C51: handle_builtin (git.c:716) by 0x406E96: run_argv (git.c:783) by 0x40730A: cmd_main (git.c:914) by 0x4E7DFA: main (common-main.c:56) Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17merge-ort: avoid assuming all renames detectedElijah Newren1-0/+4
In commit 8b09a900a1 ("merge-ort: restart merge with cached renames to reduce process entry cost", 2021-07-16), we noted that in the merge-ort steps of collect_merge_info() detect_and_process_renames() process_entries() that process_entries() was expensive, and we could often make it cheaper by changing this to collect_merge_info() detect_and_process_renames() <cache all the renames, and restart> collect_merge_info() detect_and_process_renames() process_entries() because the second collect_merge_info() would be cheaper (we could avoid traversing into some directories), the second detect_and_process_renames() would be free since we had already detected all renames, and then process_entries() has far fewer entries to handle. However, this was built on the assumption that the first detect_and_process_renames() actually detected all potential renames. If someone has merge.renameLimit set to some small value, that assumption is violated which manifests later with the following message: $ git -c merge.renameLimit=1 rebase upstream ... git: merge-ort.c:546: clear_or_reinit_internal_opts: Assertion `renames->cached_pairs_valid_side == 0' failed. Turn off this cache-renames-and-restart whenever we cannot detect all renames, and add a testcase that would have caught this problem. Reported-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Elijah Newren <newren@gmail.com> Tested-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30merge-ort: fix bug with renormalization and rename/delete conflictsElijah Newren1-3/+16
Ever since commit a492d5331c ("merge-ort: ensure we consult df_conflict and path_conflicts", 2021-06-30), when renormalization is active AND a file is involved in a rename/delete conflict BUT the file is unmodified (either before or after renormalization), merge-ort was running into an assertion failure. Prior to that commit (or if assertions were compiled out), merge-ort would mis-merge instead, ignoring the rename/delete conflict and just deleting the file. Remove the assertions, fix the code appropriately, leave some good comments in the code, and add a testcase for this situation. Reported-by: Ralf Thielow <ralf.thielow@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-25Merge branch 'jt/no-abuse-alternate-odb-for-submodules'Junio C Hamano1-14/+4
Follow through the work to use the repo interface to access submodule objects in-process, instead of abusing the alternate object database interface. * jt/no-abuse-alternate-odb-for-submodules: submodule: trace adding submodule ODB as alternate submodule: pass repo to check_has_commit() object-file: only register submodule ODB if needed merge-{ort,recursive}: remove add_submodule_odb() refs: peeling non-the_repository iterators is BUG refs: teach arbitrary repo support to iterators refs: plumb repo into ref stores
2021-10-13Merge branch 'en/removing-untracked-fixes'Junio C Hamano1-7/+1
Various fixes in code paths that move untracked files away to make room. * en/removing-untracked-fixes: Documentation: call out commands that nuke untracked files/directories Comment important codepaths regarding nuking untracked files/dirs unpack-trees: avoid nuking untracked dir in way of locally deleted file unpack-trees: avoid nuking untracked dir in way of unmerged file Change unpack_trees' 'reset' flag into an enum Remove ignored files by default when they are in the way unpack-trees: make dir an internal-only struct unpack-trees: introduce preserve_ignored to unpack_trees_options read-tree, merge-recursive: overwrite ignored files by default checkout, read-tree: fix leak of unpack_trees_options.dir t2500: add various tests for nuking untracked files
2021-10-08merge-{ort,recursive}: remove add_submodule_odb()Jonathan Tan1-14/+4
After the parent commit and some of its ancestors, the only place commits are being accessed through alternates is in the user-facing message formatting code. Fix those, and remove the add_submodule_odb() calls. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-06Merge branch 'jt/add-submodule-odb-clean-up'Junio C Hamano1-18/+35
More code paths that use the hack to add submodule's object database to the set of alternate object store have been cleaned up. * jt/add-submodule-odb-clean-up: revision: remove "submodule" from opt struct repository: support unabsorbed in repo_submodule_init submodule: remove unnecessary unabsorbed fallback
2021-09-28Merge branch 'en/typofixes'Junio C Hamano1-1/+1
Typofixes. * en/typofixes: merge-ort: fix completely wrong comment trace2.h: fix trivial comment typo
2021-09-27Remove ignored files by default when they are in the wayElijah Newren1-1/+1
Change several commands to remove ignored files by default when they are in the way. Since some commands (checkout, merge) take a --no-overwrite-ignore option to allow the user to configure this, and it may make sense to add that option to more commands (and in the case of merge, actually plumb that configuration option through to more of the backends than just the fast-forwarding special case), add little comments about where such flags would be used. Incidentally, this fixes a test failure in t7112. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27unpack-trees: introduce preserve_ignored to unpack_trees_optionsElijah Newren1-7/+1
Currently, every caller of unpack_trees() that wants to ensure ignored files are overwritten by default needs to: * allocate unpack_trees_options.dir * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags * call setup_standard_excludes AND then after the call to unpack_trees() needs to * call dir_clear() * deallocate unpack_trees_options.dir That's a fair amount of boilerplate, and every caller uses identical code. Make this easier by instead introducing a new boolean value where the default value (0) does what we want so that new callers of unpack_trees() automatically get the appropriate behavior. And move all the handling of unpack_trees_options.dir into unpack_trees() itself. While preserve_ignored = 0 is the behavior we feel is the appropriate default, we defer fixing commands to use the appropriate default until a later commit. So, this commit introduces several locations where we manually set preserve_ignored=1. This makes it clear where code paths were previously preserving ignored files when they should not have been; a future commit will flip these to instead use a value of 0 to get the behavior we want. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-22Merge branch 'jt/add-submodule-odb-clean-up' into ↵Junio C Hamano1-18/+35
jt/no-abuse-alternate-odb-for-submodules * jt/add-submodule-odb-clean-up: revision: remove "submodule" from opt struct repository: support unabsorbed in repo_submodule_init submodule: remove unnecessary unabsorbed fallback
2021-09-20Merge branch 'ds/mergies-with-sparse-index'Junio C Hamano1-0/+15
Various mergy operations have been prepared to work efficiently with the sparse index. * ds/mergies-with-sparse-index: sparse-index: integrate with cherry-pick and rebase sequencer: ensure full index if not ORT strategy t1092: add cherry-pick, rebase tests merge-ort: expand only for out-of-cone conflicts merge: make sparse-aware with ORT diff: ignore sparse paths in diffstat
2021-09-20merge-ort: fix completely wrong commentElijah Newren1-1/+1
Not sure what happened, but the comment is describing code elsewhere in the file. Fix the comment to actually discuss the code that follows. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09merge-ort: expand only for out-of-cone conflictsDerrick Stolee1-3/+10
Merge conflicts happen often enough to want to avoid expanding a sparse index when they happen, as long as those conflicts are within the sparse-checkout cone. If a conflict exists outside of the sparse-checkout cone, then we still need to expand before iterating over the index entries. This is critical to do in advance because of how the original_cache_nr is tracked to allow inserting and replacing cache entries. Iterate over the conflicted files and check if any paths are outside of the sparse-checkout cone. If so, then expand the full index. Add a test that demonstrates that we do not expand the index, even when we hit a conflict within the sparse-checkout cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09merge: make sparse-aware with ORTDerrick Stolee1-0/+8
Allow 'git merge' to operate without expanding a sparse index, at least not immediately. The index still will be expanded in a few cases: 1. If the merge strategy is 'recursive', then we enable command_requires_full_index at the start of the merge_recursive() method. We expect sparse-index users to also have the 'ort' strategy enabled. 2. With the 'ort' strategy, if the merge results in a conflicted file, then we expand the index before updating the working tree. The loop that iterates over the worktree replaces index entries and tracks 'origintal_cache_nr' which can become completely wrong if the index expands in the middle of the operation. This safety valve is important before that loop starts. A later change will focus this to only expand if we indeed have a conflict outside of the sparse-checkout cone. 3. Other merge strategies are executed as a 'git merge-X' subcommand, and those strategies are currently protected with the 'command_requires_full_index' guard. Some test updates are required, including a mistaken 'git checkout -b' that did not specify the base branch, causing merges to be fast-forward merges. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09revision: remove "submodule" from opt structJonathan Tan1-18/+35
Clean up a TODO in revision.h by removing the "submodule" field from struct setup_revision_opt. This field is only used to specify the ref store to use, so use rev_info->repo to determine the ref store instead. The only users of this field are merge-ort.c and merge-recursive.c. However, both these files specify the superproject as rev_info->repo and the submodule as setup_revision_opt->submodule. In order to be able to pass the submodule as rev_info->repo, all commits must be parsed with the submodule explicitly specified; this patch does that as well. (An incremental solution in which only some commits are parsed with explicit submodule will not work, because if the same commit is parsed twice in different repositories, there will be 2 heap-allocated object structs corresponding to that commit, and any flag set by the revision walking mechanism on one of them will not be reflected onto the other.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24Merge branch 'en/ort-perf-batch-15'Junio C Hamano1-85/+103
Final batch for "merge -sort" optimization. * en/ort-perf-batch-15: merge-ort: remove compile-time ability to turn off usage of memory pools merge-ort: reuse path strings in pool_alloc_filespec merge-ort: store filepairs and filespecs in our mem_pool diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc merge-ort: switch our strmaps over to using memory pools merge-ort: set up a memory pool merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers diffcore-rename: use a mem_pool for exact rename detection's hashmap merge-ort: rename str{map,intmap,set}_func()
2021-08-04Merge branch 'en/ort-perf-batch-14'Junio C Hamano1-11/+388
Further optimization on "merge -sort" backend. * en/ort-perf-batch-14: merge-ort: restart merge with cached renames to reduce process entry cost merge-ort: avoid recursing into directories when we don't need to merge-ort: defer recursing into directories when merge base is matched merge-ort: add a handle_deferred_entries() helper function merge-ort: add data structures for allowable trivial directory resolves merge-ort: add some more explanations in collect_merge_info_callback() merge-ort: resolve paths early when we have sufficient information
2021-08-03merge-ort: remove compile-time ability to turn off usage of memory poolsElijah Newren1-162/+47
Simplify code maintenance by removing the ability to toggle between usage of memory pools and direct allocations. This allows us to also remove paths_to_free since it was solely about bookkeeping to make sure we freed the necessary paths, and allows us to remove some auxiliary functions. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30merge-ort: reuse path strings in pool_alloc_filespecElijah Newren1-7/+22
pool_alloc_filespec() was written so that the code when pool != NULL mimicked the code from alloc_filespec(), which including allocating enough extra space for the path and then copying it. However, the path passed to pool_alloc_filespec() is always going to already be in the same memory pool, so we may as well reuse it instead of copying it. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 198.5 ms ± 3.4 ms 198.3 ms ± 2.9 ms mega-renames: 679.1 ms ± 5.6 ms 661.8 ms ± 5.9 ms just-one-mega: 271.9 ms ± 2.8 ms 264.6 ms ± 2.5 ms Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30merge-ort: store filepairs and filespecs in our mem_poolElijah Newren1-12/+14
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 198.1 ms ± 2.6 ms 198.5 ms ± 3.4 ms mega-renames: 715.8 ms ± 4.0 ms 679.1 ms ± 5.6 ms just-one-mega: 276.8 ms ± 4.2 ms 271.9 ms ± 2.8 ms Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30diffcore-rename, merge-ort: add wrapper functions for filepair alloc/deallocElijah Newren1-0/+42
We want to be able to allocate filespecs and filepairs using a mem_pool. However, filespec data will still remain outside the pool (perhaps in the future we could plumb the pool through the various diff APIs to allocate the filespec data too, but for now we are limiting the scope). Add some extra functions to allocate these appropriately based on the non-NULL-ness of opt->priv->pool, as well as some extra functions to handle correctly deallocating the relevant parts of them. A future commit will make use of these new functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30merge-ort: switch our strmaps over to using memory poolsElijah Newren1-50/+75
For all the strmaps (including strintmaps and strsets) whose memory is unconditionally freed as part of clear_or_reinit_internal_opts(), switch them over to using our new memory pool. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 202.5 ms ± 3.2 ms 198.1 ms ± 2.6 ms mega-renames: 1.072 s ± 0.012 s 715.8 ms ± 4.0 ms just-one-mega: 357.3 ms ± 3.9 ms 276.8 ms ± 4.2 ms Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30merge-ort: set up a memory poolElijah Newren1-0/+25
merge-ort has a lot of data structures, and they all tend to be freed together in clear_or_reinit_internal_opts(). Set up a memory pool to allow us to make these allocations and deallocations faster. Future commits will adjust various callers to make use of this memory pool. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappersElijah Newren1-0/+24
Make the code more flexible so that it can handle both being run with or without a memory pool by adding utility functions which will either call xmalloc, xcalloc, xstrndup or mem_pool_alloc, mem_pool_calloc, mem_pool_strndup depending on whether we have a non-NULL memory pool. A subsequent commit will make use of these. (We will actually be dropping these functions soon and just assuming we always have a memory pool, but the flexibility was very useful during development of merge-ort so I want to be able to restore it if needed.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30merge-ort: rename str{map,intmap,set}_func()Elijah Newren1-13/+13
In order to make it clearer that these three variables holding a function refer to functions that will clear the strmap/strintmap/strset, rename them to str{map,intmap,set}_clear_func(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-28Merge branch 'en/rename-limits-doc'Junio C Hamano1-1/+1
Documentation on "git diff -l<n>" and diff.renameLimit have been updated, and the defaults for these limits have been raised. * en/rename-limits-doc: rename: bump limit defaults yet again diffcore-rename: treat a rename_limit of 0 as unlimited doc: clarify documentation for rename/copy limits diff: correct warning message when renameLimit exceeded
2021-07-28Merge branch 'ab/attribute-format'Junio C Hamano1-0/+1
Many "printf"-like helper functions we have have been annotated with __attribute__() to catch placeholder/parameter mismatches. * ab/attribute-format: advice.h: add missing __attribute__((format)) & fix usage *.h: add a few missing __attribute__((format)) *.c static functions: add missing __attribute__((format)) sequencer.c: move static function to avoid forward decl *.c static functions: don't forward-declare __attribute__
2021-07-20merge-ort: restart merge with cached renames to reduce process entry costElijah Newren1-6/+86
The merge algorithm mostly consists of the following three functions: collect_merge_info() detect_and_process_renames() process_entries() Prior to the trivial directory resolution optimization of the last half dozen commits, process_entries() was consistently the slowest, followed by collect_merge_info(), then detect_and_process_renames(). When the trivial directory resolution applies, it often dramatically decreases the amount of time spent in the two slower functions. Looking at the performance results in the previous commit, the trivial directory resolution optimization helps amazingly well when there are no relevant renames. It also helps really well when reapplying a long series of linear commits (such as in a rebase or cherry-pick), since the relevant renames may well be cached from the first reapplied commit. But when there are any relevant renames that are not cached (represented by the just-one-mega testcase), then the optimization does not help at all. Often, I noticed that when the optimization does not apply, it is because there are a handful of relevant sources -- maybe even only one. It felt frustrating to need to recurse into potentially hundreds or even thousands of directories just for a single rename, but it was needed for correctness. However, staring at this list of functions and noticing that process_entries() is the most expensive and knowing I could avoid it if I had cached renames suggested a simple idea: change collect_merge_info() detect_and_process_renames() process_entries() into collect_merge_info() detect_and_process_renames() <cache all the renames, and restart> collect_merge_info() detect_and_process_renames() process_entries() This may seem odd and look like more work. However, note that although we run collect_merge_info() twice, the second time we get to employ trivial directory resolves, which makes it much faster, so the increased time in collect_merge_info() is small. While we run detect_and_process_renames() again, all renames are cached so it's nearly a no-op (we don't call into diffcore_rename_extended() but we do have a little bit of data structure checking and fixing up). And the big payoff comes from the fact that process_entries(), will be much faster due to having far fewer entries to process. This restarting only makes sense if we can save recursing into enough directories to make it worth our while. Introduce a simple heuristic to guide this. Note that this heuristic uses a "wanted_factor" that I have virtually no actual real world data for, just some back-of-the-envelope quasi-scientific calculations that I included in some comments and then plucked a simple round number out of thin air. It could be that tweaking this number to make it either higher or lower improves the optimization. (There's slightly more here; when I first introduced this optimization, I used a factor of 10, because I was completely confident it was big enough to not cause slowdowns in special cases. I was certain it was higher than needed. Several months later, I added the rough calculations which make me think the optimal number is close to 2; but instead of pushing to the limit, I just bumped it to 3 to reduce the risk that there are special cases where this optimization can result in slowing down the code a little. If the ratio of path counts is below 3, we probably will only see minor performance improvements at best anyway.) Also, note that while the diffstat looks kind of long (nearly 100 lines), more than half of it is in two comments explaining how things work. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 205.1 ms ± 3.8 ms 204.2 ms ± 3.0 ms mega-renames: 1.564 s ± 0.010 s 1.076 s ± 0.015 s just-one-mega: 479.5 ms ± 3.9 ms 364.1 ms ± 7.0 ms Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20merge-ort: avoid recursing into directories when we don't need toElijah Newren1-3/+99
This combines the work of the last several patches, and implements the conditions when we don't need to recurse into directories. It's perhaps easiest to see the logic by separating the fact that a directory might have both rename sources and rename destinations: * rename sources: only files present in the merge base can serve as rename sources, and only when one side deletes that file. When the tree on one side matches the merge base, that means every file within the subtree matches the merge base. This means that the skip-irrelevant-rename-detection optimization from before kicks in and we don't need any of these files as rename sources. * rename destinations: the tree that does not match the merge base might have newly added and hence unmatched destination files. This is what usually prevents us from doing trivial directory resolutions in the merge machinery. However, the fact that we have deferred recursing into this directory until the end means we know whether there are any unmatched relevant potential rename sources elsewhere in this merge. If there are no unmatched such relevant sources anywhere, then there is no need to look for unmatched potential rename destinations to match them with. This informs our algorithm: * Search through relevant_sources; if we have entries, they better all be reflected in cached_pairs or cached_irrelevant, otherwise they represent an unmatched potential rename source (causing the optimization to be disallowed). * For any relevant_source represented in cached_pairs, we do need to to make sure to get the destination for each source, meaning we need to recurse into any ancestor directories of those destinations. * Once we've recursed into all the rename destinations for any relevant_sources in cached_pairs, we can then do the trivial directory resolution for the remaining directories. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.235 s ± 0.042 s 205.1 ms ± 3.8 ms mega-renames: 9.419 s ± 0.107 s 1.564 s ± 0.010 s just-one-mega: 480.1 ms ± 3.9 ms 479.5 ms ± 3.9 ms Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20merge-ort: defer recursing into directories when merge base is matchedElijah Newren1-2/+31
When one side of history matches the merge base (including when the merge base has no entry for the given directory), have collect_merge_info_callback() defer recursing into the directory. To ensure those entries are eventually handled, add a call to handled_deferred_entries() in collect_merge_info() after traverse_trees() returns. Note that the condition in collect_merge_info_callback() may look more complicated than necessary at first glance; renames->trivial_merges_okay[side] is always true until handle_deferred_entries() is called, and possible_trivial_merges[side] is always empty right now (and in the future won't be filled until handle_deferred_entries() is called). However, when handle_deferred_entries() calls traverse_trees() for the relevant deferred directories, those traverse_trees() calls will once again end up in collect_merge_info_callback() for all the entries under those subdirectories. The extra conditions are there for such deferred cases and will be used more as we do more with those variables. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20merge-ort: add a handle_deferred_entries() helper functionElijah Newren1-0/+64
In order to allow trivial directory resolution, we first need to be able to gather more information to determine if the optimization is safe. To enable that, we need a way of deferring the recursion into the directory until a later time. Naturally, deferring the entry into a subtree means that we need some function that will later recurse into the subdirectory exactly the same way that collect_merge_info_callback() would have done. Add a helper function that does this. For now this function is not used but a subsequent commit will change that. Future commits will also make the function sometimes resolve directories instead of traversing inside. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20merge-ort: add data structures for allowable trivial directory resolvesElijah Newren1-0/+61
As noted a few commits ago, we can resolve individual files early if all three sides of the merge have a file at the path and two of the three sides match. We would really like to do the same thing with directories, because being able to do a trivial directory resolve means we don't have to recurse into the directory, potentially saving us a huge amount of time in both collect_merge_info() and process_entries(). Unfortunately, resolving directories early would mean missing any renames whose source or destination is underneath that directory. If we somehow knew there weren't any renames under the directory in question, then we could resolve it early. Sadly, it is impossible to determine whether there are renames under the directory in question without recursing into it, and this has traditionally kept us from ever implementing such an optimization. In commit f89b4f2bee ("merge-ort: skip rename detection entirely if possible", 2021-03-11), we added an additional reason that rename detection could be skipped entirely -- namely, if no *relevant* sources were present. Without completing collect_merge_info_callback(), we do not yet know if there are no relevant sources. However, we do know that if the current directory on one side matches the merge base, then every source file within that directory will not be RELEVANT_CONTENT, and a few simple checks can often let us rule out RELEVANT_LOCATION as well. This suggests we can just defer recursing into such directories until the end of collect_merge_info. Since the deferred directories are known to not add any relevant sources due to the above properties, then if there are no relevant sources after we've traversed all paths other than the deferred ones, then we know there are not any relevant sources. Under those conditions, rename detection is unnecessary, and that means we can resolve the deferred directories without recursing into them. Note that the logic for skipping rename detection was also modified further in commit 76e253793c ("merge-ort, diffcore-rename: employ cached renames when possible", 2021-01-30); in particular rename detection can be skipped if we already have cached renames for each relevant source. We can take advantage of this information as well with our deferral of recursing into directories where one side matches the merge base. Add some data structures that we will use to do these deferrals, with some lengthy comments explaining their purpose. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20merge-ort: add some more explanations in collect_merge_info_callback()Elijah Newren1-5/+15
The previous patch possibly raises some questions about whether additional cases in collect_merge_info_callback() can be handled early. Add some explanations in the form of comments to help explain these better. While we're at it, add a few comments to denote what a few boolean '0' or '1' values stand for. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20merge-ort: resolve paths early when we have sufficient informationElijah Newren1-0/+37
When there are no directories involved at a given path, and all three sides have a file at that path, and two of the three sides of history match, we can immediately resolve the merge of that path in collect_merge_info() and do not need to wait until process_entries(). This is actually a very minor improvement: half the time when I run it, I see an improvement; the other half a slowdown. It seems to be in the range of noise. However, this idea serves as the beginning of some bigger optimizations coming in the following patches. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-16Merge branch 'ab/struct-init'Junio C Hamano1-2/+2
Code cleanup around struct_type_init() functions. * ab/struct-init: string-list.h users: change to use *_{nodup,dup}() string-list.[ch]: add a string_list_init_{nodup,dup}() dir.[ch]: replace dir_init() with DIR_INIT *.c *_init(): define in terms of corresponding *_INIT macro *.h: move some *_INIT to designated initializers
2021-07-16Merge branch 'en/merge-dir-rename-corner-case-fix'Junio C Hamano1-1/+5
The merge code had funny interactions between content based rename detection and directory rename detection. * en/merge-dir-rename-corner-case-fix: merge-recursive: handle rename-to-self case merge-ort: ensure we consult df_conflict and path_conflicts t6423: test directory renames causing rename-to-self
2021-07-16Merge branch 'en/ort-perf-batch-13'Junio C Hamano1-0/+50
Performance tweaks of "git merge -sort" around lazy fetching of objects. * en/ort-perf-batch-13: merge-ort: add prefetching for content merges diffcore-rename: use a different prefetch for basename comparisons diffcore-rename: allow different missing_object_cb functions t6421: add tests checking for excessive object downloads during merge promisor-remote: output trace2 statistics for number of objects fetched
2021-07-16Merge branch 'en/ort-perf-batch-12'Junio C Hamano1-23/+57
More fix-ups and optimization to "merge -sort". * en/ort-perf-batch-12: merge-ort: miscellaneous touch-ups Fix various issues found in comments diffcore-rename: avoid unnecessary strdup'ing in break_idx merge-ort: replace string_list_df_name_compare with faster alternative
2021-07-15rename: bump limit defaults yet againElijah Newren1-1/+1
These were last bumped in commit 92c57e5c1d29 (bump rename limit defaults (again), 2011-02-19), and were bumped both because processors had gotten faster, and because people were getting ugly merges that caused problems and reporting it to the mailing list (suggesting that folks were willing to spend more time waiting). Since that time: * Linus has continued recommending kernel folks to set diff.renameLimit=0 (maps to 32767, currently) * Folks with repositories with lots of renames were happy to set merge.renameLimit above 32767, once the code supported that, to get correct cherry-picks * Processors have gotten faster * It has been discovered that the timing methodology used last time probably used too large example files. The last point is probably worth explaining a bit more: * The "average" file size used appears to have been average blob size in the linux kernel history at the time (probably v2.6.25 or something close to it). * Since bigger files are modified more frequently, such a computation weights towards larger files. * Larger files may be more likely to be modified over time, but are not more likely to be renamed -- the mean and median blob size within a tree are a bit higher than the mean and median of blob sizes in the history leading up to that version for the linux kernel. * The mean blob size in v2.6.25 was half the average blob size in history leading to that point * The median blob size in v2.6.25 was about 40% of the mean blob size in v2.6.25. * Since the mean blob size is more than double the median blob size, any file as big as the mean will not be compared to any files of median size or less (because they'd be more than 50% dissimilar). * Since it is the number of files compared that provides the O(n^2) behavior, median-sized files should matter more than mean-sized ones. The combined effect of the above is that the file size used in past calculations was likely about 5x too large. Combine that with a CPU performance improvement of ~30%, and we can increase the limits by a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated time limits. Keeping the same approximate time limit probably makes sense for diff.renameLimit (there is no progress feedback in e.g. git log -p), but the experience above suggests merge.renameLimit could be extended significantly. In fact, it probably would make sense to have an unlimited default setting for merge.renameLimit, but that would likely need to be coupled with changes to how progress is displayed. (See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/ for details in that area.) For now, let's just bump the approximate time limit from 10s to 1m. (Note: We do not want to use actual time limits, because getting results that depend on how loaded your system is that day feels bad, and because we don't discover that we won't get all the renames until after we've put in a lot of work rather than just upfront telling the user there are too many files involved.) Using the original time limit of 2s for diff.renameLimit, and bumping merge.renameLimit from 10s to 60s, I found the following timings using the simple script at the end of this commit message (on an AWS c5.xlarge which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"): N Timing 1300 1.995s 7100 59.973s So let's round down to nice even numbers and bump the limits from 400->1000, and from 1000->7000. Here is the measure_rename_perf script (adapted from https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/ in particular to avoid triggering the linear handling from basename-guided rename detection): #!/bin/bash n=$1; shift rm -rf repo mkdir repo && cd repo git init -q -b main mkdata() { mkdir $1 for i in `seq 1 $2`; do (sed "s/^/$i /" <../sample echo tag: $1 ) >$1/$i done } mkdata initial $n git add . git commit -q -m initial mkdata new $n git add . cd new for i in *; do git mv $i $i.renamed; done cd .. git rm -q -rf initial git commit -q -m new time git diff-tree -M -l0 --summary HEAD^ HEAD Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-13*.c static functions: add missing __attribute__((format))Ævar Arnfjörð Bjarmason1-0/+1
Add missing __attribute__((format)) function attributes to various "static" functions that take printf arguments. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-01string-list.h users: change to use *_{nodup,dup}()Ævar Arnfjörð Bjarmason1-2/+2
Change all in-tree users of the string_list_init(LIST, BOOL) API to use string_list_init_{nodup,dup}(LIST) instead. As noted in the preceding commit let's leave the now-unused string_list_init() wrapper in-place for any in-flight users, it can be removed at some later date. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>