aboutsummaryrefslogtreecommitdiffstats
path: root/t
AgeCommit message (Collapse)AuthorFilesLines
2024-07-31t98xx: mark Perforce tests as memory-leak freePatrick Steinhardt35-0/+35
All the Perforce tests are free of memory leaks. This went unnoticed because most folks do not have p4 and p4d installed on their computers. Consequently, given that the prerequisites for running those tests aren't fulfilled, `TEST_PASSES_SANITIZE_LEAK=check` won't notice that those tests are indeed memory leak free. Mark those tests accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-31t98xx: fix Perforce tests with p4d r23 and newerPatrick Steinhardt3-8/+48
Some of the tests in t98xx modify the Perforce depot in ways that the tool wouldn't normally allow. This is done to test behaviour of git-p4 in certain edge cases that we have observed in the wild, but which should in theory not be possible. Naturally, modifying the depot on disk directly is quite intimate with the tool and thus prone to breakage when Perforce updates the way that data is stored. And indeed, those tests are broken nowadays with r23 of Perforce. While a file revision was previously stored as a plain file "depot/file,v", it is now stored in a directory "depot/file,d" with compression. Adapt those tests to handle both old- and new-style depot layouts. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-18Merge branch 'jk/am-retry'Junio C Hamano1-1/+1
Test fix as a follow-up to an already graduated topic. * jk/am-retry: t4153: stop redirecting input from /dev/zero
2024-07-17Merge branch 'js/var-git-shell-path'Junio C Hamano1-1/+1
"git var GIT_SHELL_PATH" should report the path to the shell used to spawn external commands, but it didn't do so on Windows, which has been corrected. * js/var-git-shell-path: var(win32): do report the GIT_SHELL_PATH that is actually used run-command: declare the `git_shell_path()` function globally run-command(win32): resolve the path to the Unix shell early mingw(is_msys2_sh): handle forward slashes in the `sh.exe` path, too win32: override `fspathcmp()` with a directory separator-aware version strvec: declare the `strvec_push_nodup()` function globally run-command: refactor getting the Unix shell path into its own function
2024-07-17Merge branch 'kn/push-empty-fix'Junio C Hamano1-0/+17
"git push '' HEAD:there" used to hit a BUG(); it has been corrected to die with "fatal: bad repository ''". * kn/push-empty-fix: builtin/push: call set_refspecs after validating remote
2024-07-17Merge branch 'jk/test-body-in-here-doc'Junio C Hamano160-1026/+1286
The test framework learned to take the test body not as a single string but as a here-document. * jk/test-body-in-here-doc: t/.gitattributes: ignore whitespace in chainlint expect files t: convert some here-doc test bodies test-lib: allow test snippets as here-docs chainlint.pl: add tests for test body in heredoc chainlint.pl: recognize test bodies defined via heredoc chainlint.pl: check line numbers in expected output chainlint.pl: force CRLF conversion when opening input files chainlint.pl: do not spawn more threads than we have scripts chainlint.pl: only start threads if jobs > 1 chainlint.pl: add test_expect_success call to test snippets
2024-07-17Merge branch 'rj/test-sanitize-leak-log-fix'Junio C Hamano2-46/+20
Tests that use GIT_TEST_SANITIZE_LEAK_LOG feature got their exit status inverted, which has been corrected. * rj/test-sanitize-leak-log-fix: test-lib: GIT_TEST_SANITIZE_LEAK_LOG enabled by default test-lib: fix GIT_TEST_SANITIZE_LEAK_LOG
2024-07-17t4153: stop redirecting input from /dev/zeroJeff King1-1/+1
Commit 852a171018 (am: let command-line options override saved options, 2015-08-04) redirected a few "git am" invocations from /dev/zero, even though it did not expect "am" to read the input. This was necessary at the time because those tests used test_terminal, and as described in 18d8c26930 (test_terminal: redirect child process' stdin to a pty, 2015-08-04): Note that due to the way the code is structured, the child's stdin pseudo-tty will be closed when we finish reading from our stdin. This means that in the common case, where our stdin is attached to /dev/null, the child's stdin pseudo-tty will be closed immediately. Some operations like isatty(), which git-am uses, require the file descriptor to be open, and hence if the success of the command depends on such functions, test_terminal's stdin should be redirected to a source with large amount of data to ensure that the child's stdin is not closed, e.g. test_terminal git am --3way </dev/zero But we later dropped the use of test_terminal in 53ce2e3f0a (am: add explicit "--retry" option, 2024-06-06). That commit dropped one of the redirections from /dev/zero but not the other. In theory the remaining one should not cause any problems, but it turns out that at least one platform (NonStop) does not have /dev/zero at all. We never noticed before because it also did not pass the TTY prereq, meaning these tests were not run at all there until 53ce2e3f0a. So let's drop the useless /dev/zero mention. There are others in the test suite, but they are run only for tests marked with EXPENSIVE (so not typically by default). Reported-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-16Merge branch 'bc/http-proactive-auth'Junio C Hamano1-0/+116
The http transport can now be told to send request with authentication material without first getting a 401 response. * bc/http-proactive-auth: http: allow authenticating proactively
2024-07-16Merge branch 'ds/advice-sparse-index-expansion'Junio C Hamano1-1/+15
A new warning message is issued when a command has to expand a sparse index to handle working tree cruft that are outside of the sparse checkout. * ds/advice-sparse-index-expansion: advice: warn when sparse index expands
2024-07-16Merge branch 'cb/send-email-sanitize-trailer-addresses'Junio C Hamano1-0/+43
Address-looking strings found on the trailer are now placed on the Cc: list after running through sanitize_address by "git send-email". * cb/send-email-sanitize-trailer-addresses: git-send-email: use sanitized address when reading mbox body
2024-07-16Merge branch 'en/ort-inner-merge-error-fix'Junio C Hamano1-1/+41
The "ort" merge backend saw one bugfix for a crash that happens when inner merge gets killed, and assorted code clean-ups. * en/ort-inner-merge-error-fix: merge-ort: fix missing early return merge-ort: convert more error() cases to path_msg() merge-ort: upon merge abort, only show messages causing the abort merge-ort: loosen commented requirements merge-ort: clearer propagation of failure-to-function from merge_submodule merge-ort: fix type of local 'clean' var in handle_content_merge () merge-ort: maintain expected invariant for priv member merge-ort: extract handling of priv member into reusable function
2024-07-15Merge branch 'cp/unit-test-reftable-record'Junio C Hamano2-1/+551
A test in reftable library has been rewritten using the unit test framework. * cp/unit-test-reftable-record: t-reftable-record: add tests for reftable_log_record_compare_key() t-reftable-record: add tests for reftable_ref_record_compare_name() t-reftable-record: add index tests for reftable_record_is_deletion() t-reftable-record: add obj tests for reftable_record_is_deletion() t-reftable-record: add log tests for reftable_record_is_deletion() t-reftable-record: add ref tests for reftable_record_is_deletion() t-reftable-record: add comparison tests for obj records t-reftable-record: add comparison tests for index records t-reftable-record: add comparison tests for ref records t-reftable-record: add reftable_record_cmp() tests for log records t: move reftable/record_test.c to the unit testing framework
2024-07-15Merge branch 'jc/disable-push-nego-for-deletion'Junio C Hamano1-0/+10
"git push" that pushes only deletion gave an unnecessary and harmless error message when push negotiation is configured, which has been corrected. * jc/disable-push-nego-for-deletion: push: avoid showing false negotiation errors
2024-07-15Merge branch 'jk/tests-without-dns'Junio C Hamano3-7/+6
Test suite has been taught not to unnecessarily rely on DNS failing a bogus external name. * jk/tests-without-dns: t/lib-bundle-uri: use local fake bundle URLs t5551: do not confirm that bogus url cannot be used t5553: use local url for invalid fetch
2024-07-15Merge branch 'gt/unit-test-oidmap'Junio C Hamano5-239/+181
An existing test of oidmap API has been rewritten with the unit-test framework. * gt/unit-test-oidmap: t: migrate helper/test-oidmap.c to unit-tests/t-oidmap.c
2024-07-15Merge branch 'as/describe-broken-refresh-index-fix'Junio C Hamano1-0/+36
"git describe --dirty --broken" forgot to refresh the index before seeing if there is any chang, ("git describe --dirty" correctly did so), which has been corrected. * as/describe-broken-refresh-index-fix: describe: refresh the index when 'broken' flag is used
2024-07-15Merge branch 'rj/t0613-no-longer-leaks'Junio C Hamano1-0/+1
A test that no longer leaks has been marked as such. * rj/t0613-no-longer-leaks: t0613: mark as leak-free
2024-07-15Merge branch 'rj/t0612-no-longer-leaks'Junio C Hamano1-0/+1
A test that no longer leaks has been marked as such. * rj/t0612-no-longer-leaks: t0612: mark as leak-free
2024-07-13var(win32): do report the GIT_SHELL_PATH that is actually usedJohannes Schindelin1-1/+1
On Windows, Unix-like paths like `/bin/sh` make very little sense. In the best case, they simply don't work, in the worst case they are misinterpreted as absolute paths that are relative to the drive associated with the current directory. To that end, Git does not actually use the path `/bin/sh` that is recorded e.g. when `run_command()` is called with a Unix shell command-line. Instead, as of 776297548e (Do not use SHELL_PATH from build system in prepare_shell_cmd on Windows, 2012-04-17), it re-interprets `/bin/sh` as "look up `sh` on the `PATH` and use the result instead". This is the logic users expect to be followed when running `git var GIT_SHELL_PATH`. However, when 1e65721227 (var: add support for listing the shell, 2023-06-27) introduced support for `git var GIT_SHELL_PATH`, Windows was not special-cased as above, which is why it outputs `/bin/sh` even though that disagrees with what Git actually uses. Let's fix this by using the exact same logic as `prepare_shell_cmd()`, adjusting the Windows-specific `git var GIT_SHELL_PATH` test case to verify that it actually finds a working executable. Reported-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-12builtin/push: call set_refspecs after validating remoteKarthik Nayak1-0/+17
When an end-user runs "git push" with an empty string for the remote repository name, e.g. $ git push '' main "git push" fails with a BUG(). Even though this is a nonsense request that we want to fail, we shouldn't hit a BUG(). Instead we want to give a sensible error message, e.g., 'bad repository'". This is because since 9badf97c42 (remote: allow resetting url list, 2024-06-14), we reset the remote URL if the provided URL is empty. When a user of 'remotes_remote_get' tries to fetch a remote with an empty repo name, the function initializes the remote via 'make_remote'. But the remote is still not a valid remote, since the URL is empty, so it tries to add the URL alias using 'add_url_alias'. This in-turn will call 'add_url', but since the URL is empty we call 'strvec_clear' on the `remote->url`. Back in 'remotes_remote_get', we again check if the remote is valid, which fails, so we return 'NULL' for the 'struct remote *' value. The 'builtin/push.c' code, calls 'set_refspecs' before validating the remote. This worked with empty repo names earlier since we would get a remote, albeit with an empty URL. With the new changes, we get a 'NULL' remote value, this causes the check for remote to fail and raises the BUG in 'set_refspecs'. Do a simple fix by doing remote validation first. Also add a test to validate the bug fix. With this, we can also now directly pass remote to 'set_refspecs' instead of it trying to lazily obtain it. Helped-by: Jeff King <peff@peff.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-11test-lib: GIT_TEST_SANITIZE_LEAK_LOG enabled by defaultRubén Justo2-46/+17
As we currently describe in t/README, it can happen that: Some tests run "git" (or "test-tool" etc.) without properly checking the exit code, or git will invoke itself and fail to ferry the abort() exit code to the original caller. Therefore, GIT_TEST_SANITIZE_LEAK_LOG=true is needed to be set to capture all memory leaks triggered by our tests. It seems unnecessary to force users to remember this option, as forgetting it could lead to missed memory leaks. We could solve the problem by making it "true" by default, but that might suggest we think "false" makes sense, which isn't the case. Therefore, the best approach is to remove the option entirely while maintaining the capability to detect memory leaks in blind spots of our tests. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10t/.gitattributes: ignore whitespace in chainlint expect filesJeff King1-1/+1
The ".expect" files in t/chainlint/ are snippets of expected output from the chainlint script, and do not necessarily conform to our usual code style. Especially with the recent change to retain line numbers, blank lines in the input script end up with trailing whitespace as we print "3 " for line 3, for example. The point of these files is to match the output verbatim, so let's not complain about the trailing spaces. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10t: convert some here-doc test bodiesJeff King2-117/+117
The t1404 script checks a lot of output from Git which contains single quotes. Because the test snippets are themselves wrapped in the same single-quotes, we have to resort to using $SQ to match them. This is error-prone and makes the tests harder to read. Instead, let's use the new here-doc feature added in the previous commit, which lets us write anything in the test body we want (except the here-doc end marker on a line by itself, of course). Note that we do use "\" in our marker to avoid interpolation (which is the whole point). But we don't use "<<-", as we want to preserve whitespace in the snippet (and running with "-v" before and after shows that we produce the exact same output, except with the ugly $SQ references fixed). I just converted every test here, even though only some of them use $SQ. But it would be equally correct to mix-and-match styles if we don't mind the inconsistency. I've also converted a few tests in t0600 which were moved from t1404 (I had written this patch before they were moved, but it seemed worth porting over the changes rather than losing them). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10test-lib: allow test snippets as here-docsJeff King2-5/+35
Most test snippets are wrapped in single quotes, like: test_expect_success 'some description' ' do_something ' This sometimes makes the snippets awkward to write, because you can't easily use single quotes within them. We sometimes work around this with $SQ, or by loosening regexes to use "." instead of a literal quote, or by using double quotes when we'd prefer to use single-quotes (and just adding extra backslash-escapes to avoid interpolation). This commit adds another option: feeding the snippet via the function's stdin. This doesn't conflict with anything the snippet would want to do, because we always redirect its stdin from /dev/null anyway (which we'll continue to do). A few notes on the implementation: - it would be nice to push this down into test_run_, but we can't, as test_expect_success and test_expect_failure want to see the actual script content to report it for verbose-mode. A helper function limits the amount of duplication in those callers here. - The helper function is a little awkward to call, as you feed it the name of the variable you want to set. The more natural thing in shell would be command substitution like: body=$(body_or_stdin "$2") but that loses trailing whitespace. There are tricks around this, like: body=$(body_or_stdin "$2"; printf .) body=${body%.} but we'd prefer to keep such tricks in the helper, not in each caller. - I implemented the helper using a sequence of "read" calls. Together with "-r" and unsetting the IFS, this preserves incoming whitespace. An alternative is to use "cat" (which then requires the gross "." trick above). But this saves us a process, which is probably a good thing. The "read" builtin does use more read() syscalls than necessary (one per byte), but that is almost certainly a win over a separate process. Both are probably slower than passing a single-quoted string, but the difference is lost in the noise for a script that I converted as an experiment. - I handle test_expect_success and test_expect_failure here. If we like this style, we could easily extend it to other spots (e.g., lazy_prereq bodies) on top of this patch. - even though we are using "local", we have to be careful about our variable names. Within test_expect_success, any variable we declare with local will be seen as local by the test snippets themselves (so it wouldn't persist between tests like normal variables would). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10chainlint.pl: add tests for test body in heredocJeff King8-0/+50
The chainlint.pl script recently learned about the upcoming: test_expect_success 'some test' - <<\EOT TEST_BODY EOT syntax, where TEST_BODY should be checked in the usual way. Let's make sure this works by adding a few tests. The "here-doc-body" file tests the basic syntax, including an embedded here-doc which we should still be able to recognize. Likewise the "here-doc-body-indent" checks the same thing, but using the "<<-" operator. We wouldn't expect this to be used normally, but we would not want to accidentally miss a body that uses it. The "pathological" variant checks the opposite: we don't get confused by an indented tag within the here-doc body. The "here-doc-double" tests the handling of two here-doc tags on the same line. This is not something we'd expect anybody to do in practice, but the code was written defensively to handle this, so let's make sure it works. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10chainlint.pl: recognize test bodies defined via heredocEric Sunshine1-5/+22
In order to check tests for semantic problems, chainlint.pl scans test scripts, looking for tests defined as: test_expect_success [prereq] title ' body ' where `body` is a single string which is then treated as a standalone chunk of code and "linted" to detect semantic issues. (The same happens for `test_expect_failure` definitions.) The introduction of test definitions in which the test body is instead presented via a heredoc rather than as a single string creates a blind spot in the linting process since such invocations are not recognized by chainlint.pl. Prepare for this new style by also recognizing tests defined as: test_expect_success [prereq] title - <<\EOT body EOT A minor complication is that chainlint.pl has never considered heredoc bodies significant since it doesn't scan them for semantic problems, thus it has always simply thrown them away. However, with the new `test_expect_success` calling sequence, heredoc bodies become meaningful, thus need to be captured. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10chainlint.pl: check line numbers in expected outputJeff King74-894/+913
While working on chainlint.pl recently, we introduced some bugs that showed incorrect line numbers in the output. But it was hard to notice, since we sanitize the output by removing all of the line numbers! It would be nice to retain these so we can catch any regressions. The main reason we sanitize is for maintainability: we concatenate all of the test snippets into a single file, so it's hard for each ".expect" file to know at which offset its test input will be found. We can handle that by storing the per-test line numbers in the ".expect" files, and then dynamically offsetting them as we build the concatenated test and expect files together. The changes to the ".expect" files look like tedious boilerplate, but it actually makes adding new tests easier. You can now just run: perl chainlint.pl chainlint/foo.test | tail -n +2 >chainlint/foo.expect to save the output of the script minus the comment headers (after checking that it is correct, of course). Whereas before you had to strip the line numbers. The conversions here were done mechanically using something like the script above, and then spot-checked manually. It would be possible to do all of this in shell via the Makefile, but it gets a bit complicated (and requires a lot of extra processes). Instead, I've written a short perl script that generates the concatenated files (we already depend on perl, since chainlint.pl uses it). Incidentally, this improves a few other things: - we incorrectly used $(CHAINLINTTMP_SQ) inside a double-quoted string. So if your test directory required quoting, like: make "TEST_OUTPUT_DIRECTORY=/tmp/h'orrible" we'd fail the chainlint tests. - the shell in the Makefile didn't handle &&-chaining correctly in its loops (though in practice the "sed" and "cat" invocations are not likely to fail). - likewise, the sed invocation to strip numbers was hiding the exit code of chainlint.pl itself. In practice this isn't a big deal; since there are linter violations in the test files, we expect it to exit non-zero. But we could later use exit codes to distinguish serious errors from expected ones. - we now use a constant number of processes, instead of scaling with the number of test scripts. So it should be a little faster (on my machine, "make check-chainlint" goes from 133ms to 73ms). There are some alternatives to this approach, but I think this is still a good intermediate step: 1. We could invoke chainlint.pl individually on each test file, and compare it to the expected output (and possibly using "make" to avoid repeating already-done checks). This is a much bigger change (and we'd have to figure out what to do with the "# LINT" lines in the inputs). But in this case we'd still want the "expect" files to be annotated with line numbers. So most of what's in this patch would be needed anyway. 2. Likewise, we could run a single chainlint.pl and feed it all of the scripts (with "--jobs=1" to get deterministic output). But we'd still need to annotate the scripts as we did here, and we'd still need to either assemble the "expect" file, or break apart the script output to compare to each individual ".expect" file. So we may pursue those in the long run, but this patch gives us more robust tests without too much extra work or moving in a useless direction. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10chainlint.pl: force CRLF conversion when opening input filesJeff King1-1/+1
The lexer in chainlint.pl can't handle CRLF line endings; it complains about an internal error in scan_token() if we see one. For example, in our Windows CI environment: $ perl chainlint.pl chainlint/for-loop.test | cat -v Thread 2 terminated abnormally: internal error scanning character '^M' This doesn't break "make check-chainlint" (yet), because we assemble a concatenated input by passing the contents of each file through "sed". And the "sed" we use will strip out the CRLFs. But the next patch is going to rework this a bit, which does break check-chainlint on Windows. Plus it's probably nicer to folks on Windows who might work on chainlint itself and write new tests. In theory we could fix the parser to handle this, but it's not really worth the trouble. We should be able to ask the input layer to translate the line endings for us. In fact, I'd expect this to happen by default, as perl's documentation claims Win32 uses the ":unix:crlf" PERLIO layer by default ("unix" here just refers to using read/write syscalls, and then "crlf" layers the translation on top). However, this doesn't seem to be the case in our Windows CI environment. I didn't dig into the exact reason, but it is perhaps because we are using an msys build of perl rather than a "true" Win32 build. At any rate, it is easy-ish to just ask explicitly for the conversion. In the above example, setting PERLIO=crlf in the environment is enough to make it work. Curiously, though, this doesn't work when invoking chainlint via "make". Again, I didn't dig into it, but it may have to do with msys programs calling Windows programs or vice versa. We can make it work consistently by just explicitly asking for CRLF translation when we open the files. This will even work on non-Windows platforms, though we wouldn't really expect to find CRLF files there. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10chainlint.pl: do not spawn more threads than we have scriptsJeff King1-0/+1
The chainlint.pl script spawns worker threads to check many scripts in parallel. This is good if you feed it a lot of scripts. But if you give it few (or one), then the overhead of spawning the threads dominates. We can easily notice that we have fewer scripts than threads and scale back as appropriate. This patch reduces the time to run: time for i in chainlint/*.test; do perl chainlint.pl $i done >/dev/null on my system from ~4.1s to ~1.1s, where I have 8+8 cores. As with the previous patch, this isn't the usual way we run chainlint (we feed many scripts at once, which is why it supports threading in the first place). So this won't make a big difference in the real world, but it may help us out in the future, and it makes experimenting with and debugging the chainlint tests a bit more pleasant. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10chainlint.pl: only start threads if jobs > 1Jeff King1-1/+2
If the system supports threads, chainlint.pl will always spawn worker threads to do the real work. But when --jobs=1, this is pointless, since we could just do the work in the main thread. And spawning even a single thread has a high overhead. For example, on my Linux system, running: for i in chainlint/*.test; do perl chainlint.pl --jobs=1 $i done >/dev/null takes ~1.7s without this patch, and ~1.1s after. We don't usually spawn a bunch of individual chainlint.pl processes (instead we feed several scripts at once, and the parallelism outweighs the setup cost). But it's something we've considered doing, and since we already have fallback code for systems without thread support, it's pretty easy to make this work. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-10chainlint.pl: add test_expect_success call to test snippetsJeff King73-3/+145
The chainlint tests are a series of individual files, each holding a test body. The "make check-chainlint" target assembles them into a single file, adding a "test_expect_success" function call around each. Let's instead include that function call in the files themselves. This is a little more boilerplate, but has several advantages: 1. You can now run chainlint manually on snippets with just "perl chainlint.perl chainlint/foo.test". This can make developing and debugging a little easier. 2. Many of the tests implicitly relied on the syntax of the lines added by the Makefile (in particular the use of single-quotes). This assumption is much easier to see when the single-quotes are alongside the test body. 3. We had no way to test how the chainlint program handled various test_expect_success lines themselves. Now we'll be able to check variations. The change to the .test files was done mechanically, using the same test names they would have been assigned by the Makefile (this is important to match the expected output). The Makefile has the minimal change to drop the extra lines; there are more cleanups possible but a future patch in this series will rewrite this substantially anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-09http: allow authenticating proactivelybrian m. carlson1-0/+116
When making a request over HTTP(S), Git only sends authentication if it receives a 401 response. Thus, if a repository is open to the public for reading, Git will typically never ask for authentication for fetches and clones. However, there may be times when a user would like to authenticate nevertheless. For example, a forge may give higher rate limits to users who authenticate because they are easier to contact in case of excessive use. Or it may be useful for a known heavy user, such as an internal service, to proactively authenticate so its use can be monitored and, if necessary, throttled. Let's make this possible with a new option, "http.proactiveAuth". This option specifies a type of authentication which can be used to authenticate against the host in question. This is necessary because we lack the WWW-Authenticate header to provide us details; similarly, we cannot accept certain types of authentication because we require information from the server, such as a nonce or challenge, to successfully authenticate. If we're in auto mode and we got a username and password, set the authentication scheme to Basic. libcurl will not send authentication proactively unless there's a single choice of allowed authentication, and we know in this case we didn't get an authtype entry telling us what scheme to use, or we would have taken a different codepath and written the header ourselves. In any event, of the other schemes that libcurl supports, Digest and NTLM require a nonce or challenge, which means that they cannot work with proactive auth, and GSSAPI does not use a username and password at all, so Basic is the only logical choice among the built-in options. Note that the existing http_proactive_auth variable signifies proactive auth if there are already credentials, which is different from the functionality we're adding, which always seeks credentials even if none are provided. Nonetheless, t5540 tests the existing behavior for WebDAV-based pushes to an open repository without credentials, so we preserve it. While at first this may seem an insecure and bizarre decision, it may be that authentication is done with TLS certificates, in which case it might actually provide a quite high level of security. Expand the variable to use an enum to handle the additional cases and a helper function to distinguish our new cases from the old ones. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-08Merge branch 'xx/bundie-uri-fixes'Junio C Hamano2-4/+218
When bundleURI interface fetches multiple bundles, Git failed to take full advantage of all bundles and ended up slurping duplicated objects. * xx/bundie-uri-fixes: unbundle: extend object verification for fetches fetch-pack: expose fsckObjects configuration logic bundle-uri: verify oid before writing refs
2024-07-08Merge branch 'ps/leakfixes-more'Junio C Hamano78-2/+89
More memory leaks have been plugged. * ps/leakfixes-more: (29 commits) builtin/blame: fix leaking ignore revs files builtin/blame: fix leaking prefixed paths blame: fix leaking data for blame scoreboards line-range: plug leaking find functions merge: fix leaking merge bases builtin/merge: fix leaking `struct cmdnames` in `get_strategy()` sequencer: fix memory leaks in `make_script_with_merges()` builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()` apply: fix leaking string in `match_fragment()` sequencer: fix leaking string buffer in `commit_staged_changes()` commit: fix leaking parents when calling `commit_tree_extended()` config: fix leaking "core.notesref" variable rerere: fix various trivial leaks builtin/stash: fix leak in `show_stash()` revision: free diff options builtin/log: fix leaking commit list in git-cherry(1) merge-recursive: fix memory leak when finalizing merge builtin/merge-recursive: fix leaking object ID bases builtin/difftool: plug memory leaks in `run_dir_diff()` object-name: free leaking object contexts ...
2024-07-08Merge branch 'tb/path-filter-fix'Junio C Hamano4-18/+389
The Bloom filter used for path limited history traversal was broken on systems whose "char" is unsigned; update the implementation and bump the format version to 2. * tb/path-filter-fix: bloom: introduce `deinit_bloom_filters()` commit-graph: reuse existing Bloom filters where possible object.h: fix mis-aligned flag bits table commit-graph: new Bloom filter version that fixes murmur3 commit-graph: unconditionally load Bloom filters bloom: prepare to discard incompatible Bloom filters bloom: annotate filters with hash version repo-settings: introduce commitgraph.changedPathsVersion t4216: test changed path filters with high bit paths t/helper/test-read-graph: implement `bloom-filters` mode bloom.h: make `load_bloom_filter_from_graph()` public t/helper/test-read-graph.c: extract `dump_graph_info()` gitformat-commit-graph: describe version 2 of BDAT commit-graph: ensure Bloom filters are read with consistent settings revision.c: consult Bloom filters for root commits t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
2024-07-08Merge branch 'db/date-underflow-fix'Junio C Hamano1-6/+45
date parser updates to be more careful about underflowing epoch based timestamp. * db/date-underflow-fix: date: detect underflow/overflow when parsing dates with timezone offset t0006: simplify prerequisites
2024-07-08Merge branch 'rj/pager-die-upon-exec-failure'Junio C Hamano1-12/+5
When GIT_PAGER failed to spawn, depending on the code path taken, we failed immediately (correct) or just spew the payload to the standard output (incorrect). The code now always fail immediately when GIT_PAGER fails. * rj/pager-die-upon-exec-failure: pager: die when paging to non-existing command
2024-07-08advice: warn when sparse index expandsDerrick Stolee1-1/+15
Typically, forcing a sparse index to expand to a full index means that Git could not determine the status of a file outside of the sparse-checkout and needed to expand sparse trees into the full list of sparse blobs. This operation can be very slow when the sparse-checkout is much smaller than the full tree at HEAD. When users are in this state, there is usually a modified or untracked file outside of the sparse-checkout mentioned by the output of 'git status'. There are a number of reasons why this is insufficient: 1. Users may not have a full understanding of which files are inside or outside of their sparse-checkout. This is more common in monorepos that manage the sparse-checkout using custom tools that map build dependencies into sparse-checkout definitions. 2. In some cases, an empty directory could exist outside the sparse-checkout and these empty directories are not reported by 'git status' and friends. 3. If the user has '.gitignore' or 'exclude' files, then 'git status' will squelch the warnings and not demonstrate any problems. In order to help users who are in this state, add a new advice message to indicate that a sparse index is expanded to a full index. This message should be written at most once per process, so add a static global 'give_advice_on_expansion' to sparse-index.c. Further, there is a case in 'git sparse-checkout set' that uses the sparse index as an in-memory data structure (even when writing a full index) so we need to disable the message in that kind of case. The t1092-sparse-checkout-compatibility.sh test script compares the behavior of several Git commands across full and sparse repositories, including sparse repositories with and without a sparse index. We need to disable the advice in the sparse-index repo to avoid differences in stderr. By leaving the advice on in the sparse-checkout repo (without the sparse index), we can test the behavior of disabling the advice in convert_to_sparse(). (Indeed, these tests are how that necessity was discovered.) Add a test that reenables the advice and demonstrates that the message is output. The advice message is defined outside of expand_index() to avoid super- wide lines. It is also defined as a macro to avoid compile issues with -Werror=format-security. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-03t: migrate helper/test-oidmap.c to unit-tests/t-oidmap.cGhanshyam Thakkar5-237/+181
helper/test-oidmap.c along with t0016-oidmap.sh test the oidmap.h library which is built on top of hashmap.h. Migrate them to the unit testing framework for better performance, concise code and better debugging. Along with the migration also plug memory leaks and make the test logic independent for all the tests. The migration removes 'put' tests from t0016, because it is used as setup to all the other tests, so testing it separately does not yield any benefit. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Reviewed-by: Josh Steadmon <steadmon@google.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02push: avoid showing false negotiation errorsJunio C Hamano1-0/+10
When "git push" is configured to use the push negotiation, a push of deletion of a branch (without pushing anything else) may end up not having anything to negotiate for the common ancestor discovery. In such a case, we end up making an internal invocation of "git fetch --negotiate-only" without any "--negotiate-tip" parameters that stops the negotiate-only fetch from being run, which by itself is not a bad thing (one fewer round-trip), but the end-user sees a "fatal: --negotiate-only needs one or more --negotiation-tip=*" message that the user cannot act upon. Teach "git push" to notice the situation and omit performing the negotiate-only fetch to begin with. One fewer process spawned, one fewer "alarming" message given the user. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02Merge branch 'rs/diff-color-moved-w-no-ext-diff-fix'Junio C Hamano1-0/+9
"git diff --no-ext-diff" when diff.external is configured ignored the "--color-moved" option. * rs/diff-color-moved-w-no-ext-diff-fix: diff: allow --color-moved with --no-ext-diff
2024-07-02Merge branch 'jk/remote-wo-url'Junio C Hamano5-3/+64
Memory ownership rules for the in-core representation of remote.*.url configuration values have been straightened out, which resulted in a few leak fixes and code clarification. * jk/remote-wo-url: remote: drop checks for zero-url case remote: always require at least one url in a remote t5801: test remote.*.vcs config t5801: make remote-testgit GIT_DIR setup more robust remote: allow resetting url list config: document remote.*.url/pushurl interaction remote: simplify url/pushurl selection remote: use strvecs to store remote url/pushurl remote: transfer ownership of memory in add_url(), etc remote: refactor alias_url() memory ownership archive: fix check for missing url
2024-07-02Merge branch 'ps/use-the-repository'Junio C Hamano37-12/+96
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-07-02Merge branch 'ew/cat-file-unbuffered-tests'Junio C Hamano1-0/+30
The output from "git cat-file --batch-check" and "--batch-command (info)" should not be unbuffered, for which some tests have been added. * ew/cat-file-unbuffered-tests: t1006: ensure cat-file info isn't buffered by default Git.pm: use array in command_bidi_pipe example
2024-07-02t-reftable-record: add tests for reftable_log_record_compare_key()Chandra Pratap1-0/+30
reftable_log_record_compare_key() is a function defined by reftable/record.{c, h} and is used to compare the keys of two log records when sorting multiple log records using 'qsort'. In the current testing setup, this function is left unexercised. Add a testing function for the same. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t-reftable-record: add tests for reftable_ref_record_compare_name()Chandra Pratap1-0/+20
reftable_ref_record_compare_name() is a function defined by reftable/record.{c, h} and is used to compare the refname of two ref records when sorting multiple ref records using 'qsort'. In the current testing setup, this function is left unexercised. Add a testing function for the same. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t-reftable-record: add index tests for reftable_record_is_deletion()Chandra Pratap1-0/+1
reftable_record_is_deletion() is a function defined in reftable/record.{c, h} that determines whether a record is of type deletion or not. In the current testing setup, this function is left untested for index records. Add tests for this function in the case of index records. Note that since index records cannot be of type deletion, this function must always return '0' when called on an index record. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t-reftable-record: add obj tests for reftable_record_is_deletion()Chandra Pratap1-0/+1
reftable_record_is_deletion() is a function defined in reftable/record.{c, h} that determines whether a record is of type deletion or not. In the current testing setup, this function is left untested for two of the four record types (obj, index). Add tests for this function in the case of obj records. Note that since obj records cannot be of type deletion, this function must always return '0' when called on an obj record. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t-reftable-record: add log tests for reftable_record_is_deletion()Chandra Pratap1-0/+4
reftable_record_is_deletion() is a function defined in reftable/record.{c, h} that determines whether a record is of type deletion or not. In the current testing setup, this function is left untested for three of the four record types (log, obj, index). Add tests for this function in the case of log records. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t-reftable-record: add ref tests for reftable_record_is_deletion()Chandra Pratap1-0/+2
reftable_record_is_deletion() is a function defined in reftable/record.{c, h} that determines whether a record is of type deletion or not. In the current testing setup, this function is left untested for all the four record types (ref, log, obj, index). Add tests for this function in the case of ref records. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t-reftable-record: add comparison tests for obj recordsChandra Pratap1-0/+39
In the current testing setup for obj records, the comparison functions for obj records, reftable_obj_record_cmp_void() and reftable_obj_record_equal_void() are left untested. Add tests for the same by using the wrapper functions reftable_record_cmp() and reftable_record_equal() for reftable_index_record_cmp_void() and reftable_index_record_equal_void() respectively. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t-reftable-record: add comparison tests for index recordsChandra Pratap1-0/+38
In the current testing setup for index records, the comparison functions for index records, reftable_index_record_cmp() and reftable_index_record_equal() are left untested. Add tests for the same by using the wrapper functions reftable_record_cmp() and reftable_record_equal() for reftable_index_record_cmp() and reftable_index_record_equal() respectively. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t-reftable-record: add comparison tests for ref recordsChandra Pratap1-0/+33
In the current testing setup for ref records, the comparison functions for ref records, reftable_ref_record_cmp_void() and reftable_ref_record_equal() are left untested. Add tests for the same by using the wrapper functions reftable_record_cmp() and reftable_record_equal() for reftable_ref_record_cmp_void() and reftable_ref_record_equal() respectively. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t-reftable-record: add reftable_record_cmp() tests for log recordsChandra Pratap1-13/+25
In the current testing setup for log records, only reftable_log_record_equal() among log record's comparison functions is tested. Modify the existing tests to exercise reftable_log_record_cmp_void() (using the wrapper function reftable_record_cmp()) alongside reftable_log_record_equal(). Note that to achieve this, we'll need to replace instances of reftable_log_record_equal() with the wrapper function reftable_record_equal(). Rename the now modified test to reflect its nature of exercising all comparison operations, not just equality. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02t: move reftable/record_test.c to the unit testing frameworkChandra Pratap2-1/+371
reftable/record_test.c exercises the functions defined in reftable/record.{c, h}. Migrate reftable/record_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework, and renaming the tests to fit unit-tests' naming scheme. While at it, change the type of index variable 'i' to 'size_t' from 'int'. This is because 'i' is used in comparison against 'ARRAY_SIZE(x)' which is of type 'size_t'. Also, use set_hash() which is defined locally in the test file instead of set_test_hash() which is defined by reftable/test_framework.{c, h}. This is fine to do as both these functions are similarly implemented, and reftable/test_framework.{c, h} is not #included in the ported test. Get rid of reftable_record_print() from the tests as well, because it clutters the test framework's output and we have no way of verifying the output. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-01t0612: mark as leak-freeRubén Justo1-0/+1
A quick test tells us that t0612 does not trigger any leak: $ make SANITIZE=leak test GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true GIT_TEST_OPTS=-i T=t0612-reftable-jgit-compatibility.sh [...] *** t0612-reftable-jgit-compatibility.sh *** in GIT_TEST_PASSING_SANITIZE_LEAK=check mode, setting --invert-exit-code for TEST_PASSES_SANITIZE_LEAK != true ok 1 - CGit repository can be read by JGit ok 2 - JGit repository can be read by CGit ok 3 - mixed writes from JGit and CGit ok 4 - JGit can read multi-level index # passed all 4 test(s) 1..4 # faking up non-zero exit with --invert-exit-code make[2]: *** [Makefile:75: t0612-reftable-jgit-compatibility.sh] Error 1 Let's mark it as leak-free to silence the machinery activated by `GIT_TEST_PASSING_SANITIZE_LEAK=check`. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-01test-lib: fix GIT_TEST_SANITIZE_LEAK_LOGRubén Justo1-1/+4
When a test that leaks runs with GIT_TEST_SANITIZE_LEAK_LOG=true, the test returns zero, which is not what we want. In the if-else's chain we have in "check_test_results_san_file_", we consider three variables: $passes_sanitize_leak, $sanitize_leak_check and, implicitly, GIT_TEST_SANITIZE_LEAK_LOG (always set to "true" at that point). For the first two variables we have different considerations depending on the value of $test_failure, which makes sense. However, for the third, GIT_TEST_SANITIZE_LEAK_LOG, we don't; regardless of $test_failure, we use "invert_exit_code=t" to produce a non-zero return value. That assumes "$test_failure" is always zero at that point. But it may not be: $ git checkout v2.40.1 $ make test SANITIZE=leak T=t3200-branch.sh # this fails $ make test SANITIZE=leak GIT_TEST_SANITIZE_LEAK_LOG=true T=t3200-branch.sh # this succeeds [...] With GIT_TEST_SANITIZE_LEAK_LOG=true, our logs revealed a memory leak, exiting with a non-zero status! # faked up failures as TODO & now exiting with 0 due to --invert-exit-code We need to use "invert_exit_code=t" only when "$test_failure" is zero. Let's add the missing conditions in the if-else's chain to make it work as expected. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Rubén Justo <rjusto@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-01t0613: mark as leak-freeRubén Justo1-0/+1
We can mark t0613 as leak-free: $ make test SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true T=t0613-reftable-write-options.sh [...] *** t0613-reftable-write-options.sh *** in GIT_TEST_PASSING_SANITIZE_LEAK=check mode, setting --invert-exit-code for TEST_PASSES_SANITIZE_LEAK != true ok 1 - default write options ok 2 - disabled reflog writes no log blocks ok 3 - many refs results in multiple blocks ok 4 - tiny block size leads to error ok 5 - small block size leads to multiple ref blocks ok 6 - small block size fails with large reflog message ok 7 - block size exceeding maximum supported size ok 8 - restart interval at every single record ok 9 - restart interval exceeding maximum supported interval ok 10 - object index gets written by default with ref index ok 11 - object index can be disabled # passed all 11 test(s) 1..11 # faking up non-zero exit with --invert-exit-code make[2]: *** [Makefile:75: t0613-reftable-write-options.sh] Error 1 Do it. Signed-off-by: Rubén Justo <rjusto@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-01git-send-email: use sanitized address when reading mbox bodyCsókás, Bence1-0/+43
Addresses that are mentioned on the trailers in the commit log messages (e.g., "Reviewed-by") are added to the "Cc:" list by "git send-email". These hand-written addresses, however, may be malformed (e.g., having unquoted "." and other punctutation marks in the display-name part) and can upset MTA. The code does use the sanitize_address() helper on these address-looking strings to turn them into valid addresses, but it is used only to see if the address should be suppressed. The original string taken from the message is added to the @cc list if the code decides the address is not suppressed. Because the addresses on trailer lines are hand-written and more likely to contain malformed addresses, when adding to the @cc list, use the result from sanitize_address, not the original. Note that we do not modify the behaviour for addresses taken from the e-mail headers, as they are more likely to be machine generated and well-formed. Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-28Merge branch 'ds/ahead-behind-fix' into maint-2.45Junio C Hamano1-1/+1
Fix for a progress bar. * ds/ahead-behind-fix: commit-graph: increment progress indicator
2024-06-28Merge branch 'jk/cap-exclude-file-size' into maint-2.45Junio C Hamano2-0/+20
An overly large ".gitignore" files are now rejected silently. * jk/cap-exclude-file-size: dir.c: reduce max pattern file size to 100MB dir.c: skip .gitignore, etc larger than INT_MAX
2024-06-28Merge branch 'jc/safe-directory-leading-path' into maint-2.45Junio C Hamano1-0/+15
The safe.directory configuration knob has been updated to optionally allow leading path matches. * jc/safe-directory-leading-path: safe.directory: allow "lead/ing/path/*" match
2024-06-28Merge branch 'ps/fix-reinit-includeif-onbranch' into maint-2.45Junio C Hamano1-8/+93
"git init" in an already created directory, when the user configuration has includeif.onbranch, started to fail recently, which has been corrected. * ps/fix-reinit-includeif-onbranch: setup: fix bug with "includeIf.onbranch" when initializing dir
2024-06-28Merge branch 'es/chainlint-ncores-fix' into maint-2.45Junio C Hamano1-3/+17
The chainlint script (invoked during "make test") did nothing when it failed to detect the number of available CPUs. It now falls back to 1 CPU to avoid the problem. * es/chainlint-ncores-fix: chainlint.pl: latch CPU count directly reported by /proc/cpuinfo chainlint.pl: fix incorrect CPU count on Linux SPARC chainlint.pl: make CPU count computation more robust
2024-06-28Merge branch 'mt/t0211-typofix' into maint-2.45Junio C Hamano1-1/+1
Test fix. * mt/t0211-typofix: t/t0211-trace2-perf.sh: fix typo patern -> pattern
2024-06-28Merge branch 'ds/scalar-reconfigure-all-fix' into maint-2.45Junio C Hamano1-0/+38
Scalar fix. * ds/scalar-reconfigure-all-fix: scalar: avoid segfault in reconfigure --all
2024-06-28Merge branch 'tb/attr-limits' into maint-2.45Junio C Hamano1-0/+10
The maximum size of attribute files is enforced more consistently. * tb/attr-limits: attr.c: move ATTR_MAX_FILE_SIZE check into read_attr_from_buf()
2024-06-28Merge branch 'bc/zsh-compatibility' into maint-2.45Junio C Hamano1-7/+9
zsh can pretend to be a normal shell pretty well except for some glitches that we tickle in some of our scripts. Work them around so that "vimdiff" and our test suite works well enough with it. * bc/zsh-compatibility: vimdiff: make script and tests work with zsh t4046: avoid continue in &&-chain for zsh
2024-06-28Merge branch 'js/for-each-repo-keep-going' into maint-2.45Junio C Hamano2-3/+19
A scheduled "git maintenance" job is expected to work on all repositories it knows about, but it stopped at the first one that errored out. Now it keeps going. * js/for-each-repo-keep-going: maintenance: running maintenance should not stop on errors for-each-repo: optionally keep going on an error
2024-06-28Merge branch 'aj/stash-staged-fix' into maint-2.45Junio C Hamano1-0/+9
"git stash -S" did not handle binary files correctly, which has been corrected. * aj/stash-staged-fix: stash: fix "--staged" with binary files
2024-06-28Merge branch 'xx/disable-replace-when-building-midx' into maint-2.45Junio C Hamano1-0/+21
The procedure to build multi-pack-index got confused by the replace-refs mechanism, which has been corrected by disabling the latter. * xx/disable-replace-when-building-midx: midx: disable replace objects
2024-06-28Merge branch 'pw/rebase-m-signoff-fix' into maint-2.45Junio C Hamano2-15/+77
"git rebase --signoff" used to forget that it needs to add a sign-off to the resulting commit when told to continue after a conflict stops its operation. * pw/rebase-m-signoff-fix: rebase -m: fix --signoff with conflicts sequencer: store commit message in private context sequencer: move current fixups to private context sequencer: start removing private fields from public API sequencer: always free "struct replay_opts"
2024-06-27Merge branch 'jk/fetch-pack-fsck-wo-lock-pack'Junio C Hamano1-0/+10
"git fetch-pack -k -k" without passing "--lock-pack" (which we never do ourselves) did not work at all, which has been corrected. * jk/fetch-pack-fsck-wo-lock-pack: fetch-pack: fix segfault when fscking without --lock-pack
2024-06-27Merge branch 'jk/t5500-typofix'Junio C Hamano1-1/+1
A helper function shared between two tests had a copy-paste bug, which has been corrected. * jk/t5500-typofix: t5500: fix mistaken $SERVER reference in helper function
2024-06-27Merge branch 'kz/merge-fail-early-upon-refresh-failure'Junio C Hamano1-0/+10
When "git merge" sees that the index cannot be refreshed (e.g. due to another process doing the same in the background), it died but after writing MERGE_HEAD etc. files, which was useless for the purpose to recover from the failure. * kz/merge-fail-early-upon-refresh-failure: merge: avoid write merge state when unable to write index
2024-06-26t/lib-bundle-uri: use local fake bundle URLsJeff King1-2/+2
A few of the bundle URI tests point config at a fake bundle; they care only that the client has been configured with _some_ bundle, but it doesn't have to actually contain objects. For the file:// tests, we use "$BUNDLE_URI_REPO_URI/fake.bdl", a non-existent file inside the actual remote repo. But for git:// and http:// tests, we use "https://example.com/fake.bdl". This works OK in practice, but it means we actually make a request to example.com (which returns a placeholder HTML response). That can be annoying when running the test suite on a spotty network (it doesn't produce a wrong result, since we expect it to fail, but it may introduce delays). We can reduce our dependency on the outside world by using a local URL. It would work to just do "file://$PWD/fake.bdl" here, since the bundle code does not care about the actual location. But in the long run I suspect we may have more restrictions on which protocols can be passed around as bundle URIs. So instead, let's stick with the file:// repo's pattern and just point to a bogus name based on the remote repo's URL. For http this makes perfect sense; we'll make a request to the local http server and find that there's nothing there. For git:// it's a little weird, as you wouldn't normally access a bundle file over git:// at all. But it's probably the most reasonable guess we can make for now, and anybody who tightens protocol selection later will know better what's the best path forward. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-26t5551: do not confirm that bogus url cannot be usedJeff King1-1/+0
t5551 tries to access a URL with a bogus hostname and confirms that http.curloptResolve lets us use this otherwise unresolvable name. Before doing so, though, we confirm that trying to access the bogus hostname without http.curloptResolve fails as expected. This isn't testing Git at all, but is confirming the test's assumptions. That's often a good thing to do, but in this case it means that we'll actually try to resolve the external name. Even though it's unlikely that "gitbogusexamplehost.invalid" would ever resolve, the DNS lookup itself may take time. It's probably reasonable to just assume that this obviously-bogus name would not actually resolve in practice, which lets us reduce our test suite's dependency on the outside world. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-26t5553: use local url for invalid fetchJeff King1-4/+4
We test how "fetch --set-upstream" behaves when given an invalid URL, using the bogus URL "http://nosuchdomain.example.com". But finding out that it is invalid requires an actual DNS lookup. Reduce our dependency on external factors by using an invalid local filesystem URL, which works just as well for our purposes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-26describe: refresh the index when 'broken' flag is usedAbhijeet Sonar1-0/+36
When describe is run with 'dirty' flag, we refresh the index to make sure it is in sync with the filesystem before determining if the working tree is dirty. However, this is not done for the codepath where the 'broken' flag is used. This causes `git describe --broken --dirty` to false positively report the worktree being dirty if a file has different stat info than what is recorded in the index. Running `git update-index -q --refresh` to refresh the index before running diff-index fixes the problem. Also add tests to deliberately update stat info of a file before running describe to verify it behaves correctly. Reported-by: Paul Millar <paul.millar@desy.de> Suggested-by: Junio C Hamano <gitster@pobox.com> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25date: detect underflow/overflow when parsing dates with timezone offsetDarcy Burke1-0/+33
Overriding the date of a commit to be close to "1970-01-01 00:00:00" with a large enough positive timezone for the equivelant GMT time to be before the epoch is considered valid by `parse_date_basic`. Similar behaviour occurs when using a date close to "2099-12-31 23:59:59" (the maximum date allowed by `tm_to_time_t`) with a large enough negative timezone offset. This leads to an integer underflow or underflow respectively in the commit timestamp, which is not caught by `git-commit`, but will cause other services to fail, such as `git-fsck`, which, for the first case, reports "badDateOverflow: invalid author/committer line - date causes integer overflow". Instead check the timezone offset and fail if the resulting time comes before the epoch "1970-01-01T00:00:00Z" or after the maximum date "2099-12-31T23:59:59Z". Using the REQUIRE_64BIT_TIME prerequisite, make sure that the tests near the end of Git time (aka end of year 2099) are not attempted on purely 32-bit systems, as they cannot express timestamp beyond 2038 anyway. Signed-off-by: Darcy Burke <acednes@gmail.com> [jc: fixups for 32-bit platforms] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25t0006: simplify prerequisitesJunio C Hamano1-6/+12
The system must support 64-bit time and its time_t must be 64-bit wide to pass these tests. Combine these two prerequisites together to simplify the tests. In theory, they could be fulfilled independently and tests could require only one without the other, but in practice, these must come hand-in-hand. Update the "check_parse" test helper to pay attention to the REQUIRE_64BIT_TIME variable, which can be set to the HAVE_64BIT_TIME prerequisite so that a parse test can be skipped on 32-bit systems. This will be used in the next step to skip tests for timestamps near the end of year 2099, as 32-bit systems will not be able to express a timestamp beyond 2038 anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25commit-graph: reuse existing Bloom filters where possibleTaylor Blau1-1/+38
In an earlier commit, a bug was described where it's possible for Git to produce non-murmur3 hashes when the platform's "char" type is signed, and there are paths with characters whose highest bit is set (i.e. all characters >= 0x80). That patch allows the caller to control which version of Bloom filters are read and written. However, even on platforms with a signed "char" type, it is possible to reuse existing Bloom filters if and only if there are no changed paths in any commit's first parent tree-diff whose characters have their highest bit set. When this is the case, we can reuse the existing filter without having to compute a new one. This is done by marking trees which are known to have (or not have) any such paths. When a commit's root tree is verified to not have any such paths, we mark it as such and declare that the commit's Bloom filter is reusable. Note that this heuristic only goes in one direction. If neither a commit nor its first parent have any paths in their trees with non-ASCII characters, then we know for certain that a path with non-ASCII characters will not appear in a tree-diff against that commit's first parent. The reverse isn't necessarily true: just because the tree-diff doesn't contain any such paths does not imply that no such paths exist in either tree. So we end up recomputing some Bloom filters that we don't strictly have to (i.e. their bits are the same no matter which version of murmur3 we use). But culling these out is impossible, since we'd have to perform the full tree-diff, which is the same effort as computing the Bloom filter from scratch. But because we can cache our results in each tree's flag bits, we can often avoid recomputing many filters, thereby reducing the time it takes to run $ git commit-graph write --changed-paths --reachable when upgrading from v1 to v2 Bloom filters. To benchmark this, let's generate a commit-graph in linux.git with v1 changed-paths in generation order[^1]: $ git clone git@github.com:torvalds/linux.git $ cd linux $ git commit-graph write --reachable --changed-paths $ graph=".git/objects/info/commit-graph" $ mv $graph{,.bak} Then let's time how long it takes to go from v1 to v2 filters (with and without the upgrade path enabled), resetting the state of the commit-graph each time: $ git config commitGraph.changedPathsVersion 2 $ hyperfine -p 'cp -f $graph.bak $graph' -L v 0,1 \ 'GIT_TEST_UPGRADE_BLOOM_FILTERS={v} git.compile commit-graph write --reachable --changed-paths' On linux.git (where there aren't any non-ASCII paths), the timings indicate that this patch represents a speed-up over recomputing all Bloom filters from scratch: Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s] Range (min … max): 124.621 s … 125.227 s 3 runs Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s] Range (min … max): 79.112 s … 79.437 s 3 runs Summary 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths' On git.git, we do have some non-ASCII paths, giving us a more modest improvement from 4.163 seconds to 3.348 seconds, for a 1.24x speed-up. On my machine, the stats for git.git are: - 8,285 Bloom filters computed from scratch - 10 Bloom filters generated as empty - 4 Bloom filters generated as truncated due to too many changed paths - 65,114 Bloom filters were reused when transitioning from v1 to v2. [^1]: Note that this is is important, since `--stdin-packs` or `--stdin-commits` orders commits in the commit-graph by their pack position (with `--stdin-packs`) or in the raw input (with `--stdin-commits`). Since we compute Bloom filters in the same order that commits appear in the graph, we must see a commit's (first) parent before we process the commit itself. This is only guaranteed to happen when sorting commits by their generation number. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25commit-graph: new Bloom filter version that fixes murmur3Taylor Blau3-4/+168
The murmur3 implementation in bloom.c has a bug when converting series of 4 bytes into network-order integers when char is signed (which is controllable by a compiler option, and the default signedness of char is platform-specific). When a string contains characters with the high bit set, this bug causes results that, although internally consistent within Git, does not accord with other implementations of murmur3 (thus, the changed path filters wouldn't be readable by other off-the-shelf implementatios of murmur3) and even with Git binaries that were compiled with different signedness of char. This bug affects both how Git writes changed path filters to disk and how Git interprets changed path filters on disk. Therefore, introduce a new version (2) of changed path filters that corrects this problem. The existing version (1) is still supported and is still the default, but users should migrate away from it as soon as possible. Because this bug only manifests with characters that have the high bit set, it may be possible that some (or all) commits in a given repo would have the same changed path filter both before and after this fix is applied. However, in order to determine whether this is the case, the changed paths would first have to be computed, at which point it is not much more expensive to just compute a new changed path filter. So this patch does not include any mechanism to "salvage" changed path filters from repositories. There is also no "mixed" mode - for each invocation of Git, reading and writing changed path filters are done with the same version number; this version number may be explicitly stated (typically if the user knows which version they need) or automatically determined from the version of the existing changed path filters in the repository. There is a change in write_commit_graph(). graph_read_bloom_data() makes it possible for chunk_bloom_data to be non-NULL but bloom_filter_settings to be NULL, which causes a segfault later on. I produced such a segfault while developing this patch, but couldn't find a way to reproduce it neither after this complete patch (or before), but in any case it seemed like a good thing to include that might help future patch authors. The value in t0095 was obtained from another murmur3 implementation using the following Go source code: package main import "fmt" import "github.com/spaolacci/murmur3" func main() { fmt.Printf("%x\n", murmur3.Sum32([]byte("Hello world!"))) fmt.Printf("%x\n", murmur3.Sum32([]byte{0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff})) } Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25t4216: test changed path filters with high bit pathsTaylor Blau1-0/+51
Subsequent commits will teach Git another version of changed path filter that has different behavior with paths that contain at least one character with its high bit set, so test the existing behavior as a baseline. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25t/helper/test-read-graph: implement `bloom-filters` modeTaylor Blau1-5/+37
Implement a mode of the "read-graph" test helper to dump out the hexadecimal contents of the Bloom filter(s) contained in a commit-graph. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25t/helper/test-read-graph.c: extract `dump_graph_info()`Taylor Blau1-13/+18
Prepare for the 'read-graph' test helper to perform other tasks besides dumping high-level information about the commit-graph by extracting its main routine into a separate function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25commit-graph: ensure Bloom filters are read with consistent settingsTaylor Blau1-1/+67
The changed-path Bloom filter mechanism is parameterized by a couple of variables, notably the number of bits per hash (typically "m" in Bloom filter literature) and the number of hashes themselves (typically "k"). It is critically important that filters are read with the Bloom filter settings that they were written with. Failing to do so would mean that each query is liable to compute different fingerprints, meaning that the filter itself could return a false negative. This goes against a basic assumption of using Bloom filters (that they may return false positives, but never false negatives) and can lead to incorrect results. We have some existing logic to carry forward existing Bloom filter settings from one layer to the next. In `write_commit_graph()`, we have something like: if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { struct commit_graph *g = ctx->r->objects->commit_graph; /* We have changed-paths already. Keep them in the next graph */ if (g && g->chunk_bloom_data) { ctx->changed_paths = 1; ctx->bloom_settings = g->bloom_filter_settings; } } , which drags forward Bloom filter settings across adjacent layers. This doesn't quite address all cases, however, since it is possible for intermediate layers to contain no Bloom filters at all. For example, suppose we have two layers in a commit-graph chain, say, {G1, G2}. If G1 contains Bloom filters, but G2 doesn't, a new G3 (whose base graph is G2) may be written with arbitrary Bloom filter settings, because we only check the immediately adjacent layer's settings for compatibility. This behavior has existed since the introduction of changed-path Bloom filters. But in practice, this is not such a big deal, since the only way up until this point to modify the Bloom filter settings at write time is with the undocumented environment variables: - GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY - GIT_TEST_BLOOM_SETTINGS_NUM_HASHES - GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS (it is still possible to tweak MAX_CHANGED_PATHS between layers, but this does not affect reads, so is allowed to differ across multiple graph layers). But in future commits, we will introduce another parameter to change the hash algorithm used to compute Bloom fingerprints itself. This will be exposed via a configuration setting, making this foot-gun easier to use. To prevent this potential issue, validate that all layers of a split commit-graph have compatible settings with the newest layer which contains Bloom filters. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Original-test-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25revision.c: consult Bloom filters for root commitsTaylor Blau1-2/+6
The commit-graph stores changed-path Bloom filters which represent the set of paths included in a tree-level diff between a commit's root tree and that of its parent. When a commit has no parents, the tree-diff is computed against that commit's root tree and the empty tree. In other words, every path in that commit's tree is stored in the Bloom filter (since they all appear in the diff). Consult these filters during pathspec-limited traversals in the function `rev_same_tree_as_empty()`. Doing so yields a performance improvement where we can avoid enumerating the full set of paths in a parentless commit's root tree when we know that the path(s) of interest were not listed in that commit's changed-path Bloom filter. Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Original-patch-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`Taylor Blau1-1/+13
The existing implementation of test_bloom_filters_not_used() asserts that the Bloom filter sub-system has not been initialized at all, by checking for the absence of any data from it from trace2. In the following commit, it will become possible to load Bloom filters without using them (e.g., because the `commitGraph.changedPathVersion` introduced later in this series is incompatible with the hash version with which the commit-graph's Bloom filters were written). When this is the case, it's possible to initialize the Bloom filter sub-system, while still not using any Bloom filters. When this is the case, check that the data dump from the Bloom sub-system is all zeros, indicating that no filters were used. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25pager: die when paging to non-existing commandRubén Justo1-12/+5
When trying to execute a non-existent program from GIT_PAGER, we display an error. However, we also send the complete text to the terminal and return a successful exit code. This can be confusing for the user and the displayed error could easily become obscured by a lengthy text. For example, here the error message would be very far above after sending 50 MB of text: $ GIT_PAGER=non-existent t/test-terminal.perl git log | wc -c error: cannot run non-existent: No such file or directory 50314363 Let's make the error clear by aborting the process and return an error so that the user can easily correct their mistake. This will be the result of the change: $ GIT_PAGER=non-existent t/test-terminal.perl git log | wc -c error: cannot run non-existent: No such file or directory fatal: unable to execute pager 'non-existent' 0 The behavior change we're introducing in this commit affects two tests in t7006, which is a good sign regarding test coverage and requires us to address it. The first test is 'git skips paging non-existing command'. This test comes from f7991f01f2 (t7006: clean up SIGPIPE handling in trace2 tests, 2021-11-21,) where a modification was made to a test that was originally introduced in c24b7f6736 (pager: test for exit code with and without SIGPIPE, 2021-02-02). That original test was, IMHO, in the same direction we're going in this commit. At any rate, this test obviously needs to be adjusted to check the new behavior we are introducing. Do it. The second test being affected is: 'non-existent pager doesnt cause crash', introduced in f917f57f40 (pager: fix crash when pager program doesn't exist, 2021-11-24). As its name states, it has the intention of checking that we don't introduce a regression that produces a crash when GIT_PAGER points to a nonexistent program. This test could be considered redundant nowadays, due to us already having several tests checking implicitly what a non-existent command in GIT_PAGER produces. However, let's maintain a good belt-and-suspenders strategy; adapt it to the new world. Finally, it's worth noting that we are not changing the behavior if the command specified in GIT_PAGER is a shell command. In such cases, it is: $ GIT_PAGER=:\;non-existent t/test-terminal.perl git log :;non-existent: 1: non-existent: not found died of signal 13 at t/test-terminal.perl line 33. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-24Merge branch 'tb/commit-graph-use-tempfile'Junio C Hamano1-1/+25
"git update-server-info" and "git commit-graph --write" have been updated to use the tempfile API to avoid leaving cruft after failing. * tb/commit-graph-use-tempfile: server-info.c: remove temporary info files on exit commit-graph.c: remove temporary graph layers on exit
2024-06-24Merge branch 'jc/add-i-retire-usebuiltin-config'Junio C Hamano1-15/+0
For over a year, setting add.interactive.useBuiltin configuration variable did nothing but giving a "this does not do anything" warning. Finally remove it. * jc/add-i-retire-usebuiltin-config: add-i: finally retire add.interactive.useBuiltin
2024-06-24Merge branch 'tb/precompose-getcwd'Junio C Hamano1-1/+38
We forgot to normalize the result of getcwd() to NFC on macOS where all other paths are normalized, which has been corrected. This still does not address the case where core.precomposeUnicode configuration is not defined globally. * tb/precompose-getcwd: macOS: ls-files path fails if path of workdir is NFD
2024-06-24Merge branch 'tb/pseudo-merge-reachability-bitmap'Junio C Hamano4-8/+460
The pseudo-merge reachability bitmap to help more efficient storage of the reachability bitmap in a repository with too many refs has been added. * tb/pseudo-merge-reachability-bitmap: (26 commits) pack-bitmap.c: ensure pseudo-merge offset reads are bounded Documentation/technical/bitmap-format.txt: add missing position table t/perf: implement performance tests for pseudo-merge bitmaps pseudo-merge: implement support for finding existing merges ewah: `bitmap_equals_ewah()` pack-bitmap: extra trace2 information pack-bitmap.c: use pseudo-merges during traversal t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` pack-bitmap: implement test helpers for pseudo-merge ewah: implement `ewah_bitmap_popcount()` pseudo-merge: implement support for reading pseudo-merge commits pack-bitmap.c: read pseudo-merge extension pseudo-merge: scaffolding for reads pack-bitmap: extract `read_bitmap()` function pack-bitmap-write.c: write pseudo-merge table pseudo-merge: implement support for selecting pseudo-merge commits config: introduce `git_config_double()` pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` pack-bitmap-write: support storing pseudo-merge commits ...
2024-06-24diff: allow --color-moved with --no-ext-diffRené Scharfe1-0/+9
We ignore the option --color-moved if an external diff program is configured, presumably because its overhead is unnecessary in that case. Respect the option if we don't actually use the external diff, though. Reported-by: lolligerhans@gmx.de Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20Merge branch 'jc/heads-are-branches'Junio C Hamano2-13/+43
The "--heads" option of "ls-remote" and "show-ref" has been been deprecated; "--branches" replaces "--heads". * jc/heads-are-branches: show-ref: introduce --branches and deprecate --heads ls-remote: introduce --branches and deprecate --heads refs: call branches branches
2024-06-20Merge branch 'pw/rebase-i-error-message'Junio C Hamano1-0/+45
When the user adds to "git rebase -i" instruction to "pick" a merge commit, the error experience is not pleasant. Such an error is now caught earlier in the process that parses the todo list. * pw/rebase-i-error-message: rebase -i: improve error message when picking merge rebase -i: pass struct replay_opts to parse_insn_line()
2024-06-20Merge branch 'ds/ahead-behind-fix'Junio C Hamano1-1/+1
Fix for a progress bar. * ds/ahead-behind-fix: commit-graph: increment progress indicator
2024-06-20Merge branch 'ps/abbrev-length-before-setup-fix'Junio C Hamano2-0/+31
Setting core.abbrev too early before the repository set-up (typically in "git clone") caused segfault, which as been corrected. * ps/abbrev-length-before-setup-fix: object-name: don't try to abbreviate to lengths greater than hexsz parse-options-cb: stop clamping "--abbrev=" to hash length config: fix segfault when parsing "core.abbrev" without repo
2024-06-20Merge branch 'rj/format-patch-auto-cover-with-interdiff'Junio C Hamano2-5/+34
"git format-patch --interdiff" for multi-patch series learned to turn on cover letters automatically (unless told never to enable cover letter with "--no-cover-letter" and such). * rj/format-patch-auto-cover-with-interdiff: format-patch: assume --cover-letter for diff in multi-patch series t4014: cleanups in a few tests
2024-06-20Merge branch 'kn/update-ref-symref'Junio C Hamano4-3/+515
"git update-ref --stdin" learned to handle transactional updates of symbolic-refs. * kn/update-ref-symref: update-ref: add support for 'symref-update' command reftable: pick either 'oid' or 'target' for new updates update-ref: add support for 'symref-create' command update-ref: add support for 'symref-delete' command update-ref: add support for 'symref-verify' command refs: specify error for regular refs with `old_target` refs: create and use `ref_update_expects_existing_old_ref()`
2024-06-20Merge branch 'gt/unit-test-oidtree'Junio C Hamano7-106/+191
"oidtree" tests were rewritten to use the unit test framework. * gt/unit-test-oidtree: t/: migrate helper/test-oidtree.c to unit-tests/t-oidtree.c
2024-06-20Merge branch 'tb/multi-pack-reuse-fix'Junio C Hamano2-0/+56
Assorted fixes to multi-pack-index code paths. * tb/multi-pack-reuse-fix: pack-revindex.c: guard against out-of-bounds pack lookups pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse midx-write.c: do not read existing MIDX with `packs_to_include`
2024-06-20Merge branch 'rs/diff-exit-code-with-external-diff'Junio C Hamano1-0/+66
"git diff --exit-code --ext-diff" learned to take the exit status of the external diff driver into account when deciding the exit status of the overall "git diff" invocation when configured to do so. * rs/diff-exit-code-with-external-diff: diff: let external diffs report that changes are uninteresting userdiff: add and use struct external_diff t4020: test exit code with external diffs
2024-06-20t5500: fix mistaken $SERVER reference in helper functionJeff King1-1/+1
The end of t5500 contains two tests which use a single helper function, fetch_filter_blob_limit_zero(). It takes a parameter to point to the path of the server repository, which we store locally as $SERVER. The first caller uses the relative path "server", while the second points into the httpd document root. Commit 07ef3c6604 (fetch test: use more robust test for filtered objects, 2019-12-23) refactored some lines, but accidentally switched "$SERVER" to "server" in one spot. That means the second caller is looking at the server directory from the previous test rather than its own. This happens to work out because the "server" directory from the first test is still hanging around, and the contents of the two are identical. But it was clearly not the intended behavior, and is fragile to cleaning up the leftovers from the first test. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20fetch-pack: fix segfault when fscking without --lock-packJeff King1-0/+10
The fetch-pack internals have multiple options related to creating ".keep" lock-files for the received pack: - if args.lock_pack is set, then we tell index-pack to create a .keep file. In the fetch-pack plumbing command, this is triggered by passing "-k" twice. - if the caller passes in a pack_lockfiles string list, then we use it to record the path of the keep-file created by index-pack. We get that name by reading the stdout of index-pack. In the fetch-pack command, this is triggered by passing the (undocumented) --lock-pack option; without it, we pass in a NULL string list. So it's possible to ask index-pack to create the lock-file (using "-k -k") but not ask to record it (by avoiding "--lock-pack"). This worked fine until 5476e1efde (fetch-pack: print and use dangling .gitmodules, 2021-02-22), but now it causes a segfault. Before that commit, if pack_lockfiles was NULL, we wouldn't bother reading the output from index-pack at all. But since that commit, index-pack may produce extra output if we asked it to fsck. So even if nobody cares about the lockfile path, we still need to read it to skip to the output we do care about. We correctly check that we didn't get a NULL lockfile path (which can happen if we did not ask it to create a .keep file at all), but we missed the case where the lockfile path is not NULL (due to "-k -k") but the pack_lockfiles string_list is NULL (because nobody passed "--lock-pack"), and segfault trying to add to the NULL string-list. We can fix this by skipping the append to the string list when either the value or the list is NULL. In that case we must also free the lockfile path to avoid leaking it when it's non-NULL. Nobody noticed the bug for so long because the transport code used by "git fetch" always passes in a pack_lockfiles pointer, and remote-curl (the main user of the fetch-pack plumbing command) always passes --lock-pack. Reported-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20merge-ort: convert more error() cases to path_msg()Elijah Newren1-1/+1
merge_submodule() stores errors using path_msg(), whereas other call sites make use of the error() function. This is inconsistent, and moving towards path_msg() seems more friendly for libification efforts since it will allow the caller to determine whether the error messages need to be printed. Note that this deferred handling of error messages changes the error message in a recursive merge from error: failed to execute internal merge to From inner merge: error: failed to execute internal merge which provides a little more information about the error which may be useful. Since the recursive merge strategy still only shows the older error, we had to adjust the new testcase introduced a few commits ago to just search for the older message somewhere in the output. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20merge-ort: maintain expected invariant for priv memberElijah Newren1-1/+41
The calling convention for the merge machinery is One call to init_merge_options() One or more calls to merge_incore_[non]recursive() One call to merge_finalize() (possibly indirectly via merge_switch_to_result()) Both merge_switch_to_result() and merge_finalize() expect opt->priv == NULL && result->priv != NULL which is supposed to be set up by our move_opt_priv_to_result_priv() function. However, two codepaths dealing with error cases did not execute this necessary logic, which could result in assertion failures (or, if assertions were compiled out, could result in segfaults). Fix the oversight and add a test that would have caught one of these problems. While at it, also tighten an existing test for a non-recursive merge to verify that it fails with appropriate status. Most merge tests in the testsuite check either for success or conflicts; those testing for neither are rare and it is good to ensure they support the invariant assumed by builtin/merge.c in this comment: /* * The backend exits with 1 when conflicts are * left to be resolved, with 2 when it does not * handle the given merge at all. */ So, explicitly check for the exit status of 2 in these cases. Reported-by: Matt Cree <matt.cree@gearset.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20unbundle: extend object verification for fetchesXing Xin2-1/+68
The existing fetch.fsckObjects and transfer.fsckObjects configurations were not fully applied to bundle-involved fetches, including direct bundle fetches and bundle-uri enabled fetches. Furthermore, there was no object verification support for unbundle. This commit extends object verification support in `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When this option is enabled, we append the `--fsck-objects` flag to `git-index-pack`. The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches, where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable this option for `bundle.c:unbundle`, specifically in: - `transport.c:fetch_refs_from_bundle` for direct bundle fetches. - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches. This addition ensures a consistent logic for object verification during fetches. Tests have been added to confirm functionality in the scenarios mentioned above. Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20bundle-uri: verify oid before writing refsXing Xin1-4/+151
When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak <karthik.188@gmail.com> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20t1006: ensure cat-file info isn't buffered by defaultEric Wong1-0/+30
While working on buffering changes to `git cat-file' in a separate patch, I inadvertently made the output of --batch-check and the `info' command of --batch-command buffered as if opt->buffer_output is turned on by default. Buffering by default breaks some 3rd-party Perl scripts using cat-file, but this breakage was not detected anywhere in our test suite. Add a small Perl snippet to test this problem since (AFAIK) other equivalent ways to test this behavior from Bourne shell and/or awk would require racy sleeps, non-portable FIFOs or tedious C code. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-18merge: avoid write merge state when unable to write indexKyle Zhao1-0/+10
Writing the merge state after the index write fails is meaningless and could potentially cause Git to lose changes. Signed-off-by: Kyle Zhao <kylezhao@tencent.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-17Merge branch 'ps/no-writable-strings'Junio C Hamano8-18/+24
Building with "-Werror -Wwrite-strings" is now supported. * ps/no-writable-strings: (27 commits) config.mak.dev: enable `-Wwrite-strings` warning builtin/merge: always store allocated strings in `pull_twohead` builtin/rebase: always store allocated string in `options.strategy` builtin/rebase: do not assign default backend to non-constant field imap-send: fix leaking memory in `imap_server_conf` imap-send: drop global `imap_server_conf` variable mailmap: always store allocated strings in mailmap blob revision: always store allocated strings in output encoding remote-curl: avoid assigning string constant to non-const variable send-pack: always allocate receive status parse-options: cast long name for OPTION_ALIAS http: do not assign string constant to non-const field compat/win32: fix const-correctness with string constants pretty: add casts for decoration option pointers object-file: make `buf` parameter of `index_mem()` a constant object-file: mark cached object buffers as const ident: add casts for fallback name and GECOS entry: refactor how we remove items for delayed checkouts line-log: always allocate the output prefix line-log: stop assigning string constant to file parent buffer ...
2024-06-17Merge branch 'jk/am-retry'Junio C Hamano2-31/+12
"git am" has a safety feature to prevent it from starting a new session when there already is a session going. It reliably triggers when a mbox is given on the command line, but it has to rely on the tty-ness of the standard input. Add an explicit way to opt out of this safety with a command line option. * jk/am-retry: test-terminal: drop stdin handling am: add explicit "--retry" option
2024-06-17Merge branch 'ps/ref-storage-migration'Junio C Hamano2-0/+244
A new command has been added to migrate a repository that uses the files backend for its ref storage to use the reftable backend, with limitations. * ps/ref-storage-migration: builtin/refs: new command to migrate ref storage formats refs: implement logic to migrate between ref storage formats refs: implement removal of ref storages worktree: don't store main worktree twice reftable: inline `merged_table_release()` refs/files: fix NULL pointer deref when releasing ref store refs/files: extract function to iterate through root refs refs/files: refactor `add_pseudoref_and_head_entries()` refs: allow to skip creation of reflog entries refs: pass storage format to `ref_store_init()` explicitly refs: convert ref storage format to an enum setup: unset ref storage when reinitializing repository version
2024-06-17Merge branch 'jc/format-patch-with-range-diff'Junio C Hamano1-6/+30
The inter/range-diff output has been moved to the end of the patch when format-patch adds it to a single patch, instead of writing it before the patch text, to be consistent with what is done for a cover letter for a multi-patch series. * jc/format-patch-with-range-diff: format-patch: move range/inter diff at the end of a single patch output show_log: factor out interdiff/range-diff generation
2024-06-14t/helper: remove dependency on `the_repository` in "proc-receive"Patrick Steinhardt1-6/+3
The "proc-receive" test helper implicitly relies on `the_repository` via `parse_oid_hex()`. This isn't necessary though, and in fact the whole command does not depend on `the_repository` at all. Stop setting up `the_repository` and use `parse_oid_hex_any()` to parse object IDs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14t/helper: fix segfault in "oid-array" command without repositoryPatrick Steinhardt2-0/+22
The "oid-array" test helper can supposedly work without a Git repository, but will in fact crash because `the_repository->hash_algo` is not initialized. This is because `oid_pos()`, which is used by `oid_array_lookup()`, depends on `the_hash_algo->rawsz`. Ideally, we'd adapt `oid_pos()` to not depend on `the_hash_algo` anymore. That is a bigger untertaking though, so instead we fall back to SHA1 when there is no repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14t/helper: use correct object hash in partial-clone helperPatrick Steinhardt1-1/+1
The `object_info()` function of the partial-clone helper is responsible for checking the object ID of a repository other than `the_repository`. We use `parse_oid_hex()` in this function though, which means that we still depend on `the_repository->hash_algo`. Fix this by using the object hash of the function-local repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14http-fetch: don't crash when parsing packfile without a repoPatrick Steinhardt2-0/+11
The git-http-fetch(1) command accepts a `--packfile=` option, which allows the user to specify that it shall fetch a specific packfile, only. The parameter here is the hash of the packfile, which is specific to the object hash used by the repository. This requirement is implicit though via our use of `parse_oid_hex()`, which internally uses `the_repository`. The git-http-fetch(1) command allows for there to be no repository though, which only exists such that we can show usage via the "-h" option. In that case though, starting with c8aed5e8da (repository: stop setting SHA1 as the default object hash, 2024-05-07), `the_repository` does not have its object hash initialized anymore and thus we would crash when trying to parse the object ID outside of a repository. Fix this issue by dying immediately when we see a "--packfile=" parameter when outside a Git repository. This is not a functional regression as we would die later on with the same error anyway. Add a test to detect the segfault. We use the "nongit" function to do so, which we need to allow-list in `test_must_fail ()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14hash-ll: merge with "hash.h"Patrick Steinhardt4-4/+4
The "hash-ll.h" header was introduced via d1cbe1e6d8 (hash-ll.h: split out of hash.h to remove dependency on repository.h, 2023-04-22) to make explicit the split between hash-related functions that rely on the global `the_repository`, and those that don't. This split is no longer necessary now that we we have removed the reliance on `the_repository`. Merge "hash-ll.h" back into "hash.h". This causes some code units to not include "repository.h" anymore, which requires us to add some forward declarations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt27-0/+54
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14hash: require hash algorithm in `oidread()` and `oidclr()`Patrick Steinhardt1-1/+1
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14remote: drop checks for zero-url caseJeff King1-2/+0
Now that the previous commit removed the possibility that a "struct remote" will ever have zero url fields, we can drop a number of redundant checks and untriggerable code paths. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14remote: always require at least one url in a remoteJeff King1-1/+1
When we return a struct from remote_get(), the result _almost_ always has at least one url. In remotes_remote_get_1(), we do this: if (name_given && !valid_remote(ret)) add_url_alias(remote_state, ret, name); if (!valid_remote(ret)) return NULL; So if the remote doesn't have a url, we give it one based on the name (this is how unconfigured urls are used as remotes). And if that doesn't work, we return NULL. But there's a catch: valid_remote() checks that we have at least one url _unless_ the remote.*.vcs field is set. This comes from c578f51d52 (Add a config option for remotes to specify a foreign vcs, 2009-11-18), and the whole idea was to support remote helpers that don't have their own url. However, that mode has been broken since 25d5cc488a (Pass unknown protocols to external protocol handlers, 2009-12-09)! That commit unconditionally looks at the url in get_helper(), causing a segfault with something like: git -c remote.foo.vcs=bar fetch foo We could fix that now, of course. But given that it has been broken for almost 15 years and nobody noticed, there's a better option. This weird "there might not be a url" special case requires checks all over the code base, and it's not clear if there are other similar segfaults lurking. It would be nice if we could drop that special case. So instead, let's let the "the remote name is the url" code kick in. If you have "remote.foo.vcs", then your url (unless otherwise configured) is "foo". This does have a visible effect compared to what 25d5cc488a was trying to do. The idea back then is that for a remote without a url, we'd run: # only one command-line option! git-remote-bar foo whereas with our default url, now we'll run: git-remote-bar foo foo Again, in practice nobody can be relying on this because it has been segfaulting for 15 years. We should consider just removing this "vcs" config option entirely, but that would be a user-visible breakage. So by fixing it this way, we can keep things working that have been working, and simplify away one special case inside our code. This fixes the segfault from 25d5cc488a (demonstrated by the test), and we can build further cleanups on top. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14t5801: test remote.*.vcs configJeff King2-0/+26
The usual way to trigger a remote helper is to use the "::" syntax from: 87422439d1 (Allow specifying the remote helper in the url, 2009-11-18). Doing: git config remote.origin.url hg::https://example.com/repo will run "git-remote-hg origin https://example.com/repo". Or you can use the fallback handling from 25d5cc488a (Pass unknown protocols to external protocol handlers, 2009-12-09): git config remote.origin.url "foo://bar" which will run "git-remote-foo origin foo://bar". But there's a third way, from c578f51d52 (Add a config option for remotes to specify a foreign vcs, 2009-11-18): git config remote.origin.vcs foo git config remote.origin.url bar which will run "git-remote-foo origin bar". This is mostly redundant with the other methods, except that it is supposed to allow you to run without a URL at all. So: git config remote.origin.vcs foo would run "git-remote-foo origin" with no extra URL parameter (under the assumption that the helper somehow knows how to access the remote repo). However, this mode has been broken since 25d5cc488a, shortly after it was added! That commit taught the transport code to always look at the URL string to parse off the "foo::" bits, meaning it would always segfault in the no-url case. You can see that with: git -c remote.foo.vcs=bar fetch foo Nobody seems to have noticed in the almost 15 years since, so presumably it's not a well-used feature. And without that, arguably the whole remote.*.vcs feature could be removed entirely, as it isn't offering anything you couldn't do with the "helper::" syntax. But it _does_ work if you have a URL, and it has been advertised in the documentation for all that time. So we shouldn't just remove it without warning. Likewise, even if we were going to deprecate it, we should avoid breaking it in the meantime. Since there are no tests for it at all, let's add a few basic ones: - this syntax doesn't work well with "git clone" (another point against it versus "helper::"). But we can use "clone -c" to set up the config manually, passing the URL as usual to clone. This does work, though note that I had to use --no-local in the test to avoid broken interactions between the local code and the helper. In the real world this would be a non-issue, since the remote URL would generally not also be a local Git repo! - likewise, we should be able to set up the config manually and fetch into a repository. This also works. - we can simulate a vcs that has no URL support by stuffing the remote path into another environment variable. This should work, but doesn't (it hits the segfault mentioned above). In the first two cases, I took the extra step of checking GIT_TRACE output to confirm that we actually ran the helper (since the URL is a valid Git repo, the clone/fetch would appear to work even if we didn't use the helper at all!). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14t5801: make remote-testgit GIT_DIR setup more robustJeff King1-1/+2
Our tests use a fake helper that just imports from an existing Git repository. We're fed the path to that repo on the command line, and derive the GIT_DIR by tacking on "/.git". This is wrong if the path is a bare repository, but that's OK since this is just a limited test. But it's also wrong if the transport code feeds us the actual .git directory itself (i.e., we expect "/path/to/repo" but it gives us "/path/to/repo/.git"). None of the current tests do that, but let's future-proof ourselves against adding a test that does. We can instead ask "rev-parse" to set our GIT_DIR. Note that we have to first unset other git variables from our environment. Coming into this script, we'll have GIT_DIR set to the fetching repository, and we need to "switch" to the remote one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14remote: allow resetting url listJeff King1-0/+36
Because remote.*.url is treated as a multi-valued key, there is no way to override previous config. So for example if you have remote.origin.url set to some wrong value, doing: git -c remote.origin.url=right fetch would not work. It would append "right" to the list, which means we'd still fetch from "wrong" (since subsequent values are used only as push urls). Let's provide a mechanism to reset the list, like we do for other multi-valued keys (e.g., credential.helper, http.extraheaders, and merge.suppressDest all use this "empty string means reset" pattern). Reported-by: Mathew George <mathewegeorge@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14remote: use strvecs to store remote url/pushurlJeff King1-1/+1
Now that the url/pushurl fields of "struct remote" own their strings, we can switch from bare arrays to strvecs. This has a few advantages: - push/clear are now one-liners - likewise the free+assigns in alias_all_urls() can use strvec_replace() - we now use size_t for storage, avoiding possible overflow - this will enable some further cleanups in future patches There's quite a bit of fallout in the code that reads these fields, as it tends to access these arrays directly. But it's mostly a mechanical replacement of "url_nr" with "url.nr", and "url[i]" with "url.v[i]", with a few variations (e.g. "*url" could become "*url.v", but I used "url.v[0]" for consistency). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-13Merge branch 'gt/unit-test-oidtree' into ps/use-the-repositoryJunio C Hamano7-106/+191
* gt/unit-test-oidtree: t/: migrate helper/test-oidtree.c to unit-tests/t-oidtree.c
2024-06-13Merge branch 'ps/ref-storage-migration' into ps/use-the-repositoryJunio C Hamano2-0/+244
* ps/ref-storage-migration: builtin/refs: new command to migrate ref storage formats refs: implement logic to migrate between ref storage formats refs: implement removal of ref storages worktree: don't store main worktree twice reftable: inline `merged_table_release()` refs/files: fix NULL pointer deref when releasing ref store refs/files: extract function to iterate through root refs refs/files: refactor `add_pseudoref_and_head_entries()` refs: allow to skip creation of reflog entries refs: pass storage format to `ref_store_init()` explicitly refs: convert ref storage format to an enum setup: unset ref storage when reinitializing repository version
2024-06-12commit-graph: increment progress indicatorDerrick Stolee1-1/+1
This fixes a bug that was introduced by 368d19b0b7 (commit-graph: refactor compute_topological_levels(), 2023-03-20): Previously, the progress indicator was updated from `i + 1` where `i` is the loop variable of the enclosing `for` loop. After this patch, the update used `info->progress_cnt + 1` instead, however, unlike `i`, the `progress_cnt` attribute was not incremented. Let's increment it. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> [jc: squashed in a test update from Patrick Steinhardt] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-12Merge branch 'gt/decorate-unit-test'Junio C Hamano5-92/+80
A test helper that essentially is unit tests on the "decorate" logic has been rewritten using the unit-tests framework. * gt/decorate-unit-test: t/: migrate helper/test-example-decorate to the unit testing framework
2024-06-12Merge branch 'jk/sparse-leakfix'Junio C Hamano4-0/+4
Many memory leaks in the sparse-checkout code paths have been plugged. * jk/sparse-leakfix: sparse-checkout: free duplicate hashmap entries sparse-checkout: free string list after displaying sparse-checkout: free pattern list in sparse_checkout_list() sparse-checkout: free sparse_filename after use sparse-checkout: refactor temporary sparse_checkout_patterns sparse-checkout: always free "line" strbuf after reading input sparse-checkout: reuse --stdin buffer when reading patterns dir.c: always copy input to add_pattern() dir.c: free removed sparse-pattern hashmap entries sparse-checkout: clear patterns when init() sees existing sparse file dir.c: free strings in sparse cone pattern hashmaps sparse-checkout: pass string literals directly to add_pattern() sparse-checkout: free string list in write_cone_to_file()
2024-06-12Merge branch 'jk/cap-exclude-file-size'Junio C Hamano2-0/+20
An overly large ".gitignore" files are now rejected silently. * jk/cap-exclude-file-size: dir.c: reduce max pattern file size to 100MB dir.c: skip .gitignore, etc larger than INT_MAX
2024-06-12Merge branch 'jc/safe-directory-leading-path'Junio C Hamano1-0/+15
The safe.directory configuration knob has been updated to optionally allow leading path matches. * jc/safe-directory-leading-path: safe.directory: allow "lead/ing/path/*" match
2024-06-12Merge branch 'gt/t-hash-unit-test'Junio C Hamano2-56/+84
A pair of test helpers that essentially are unit tests on hash algorithms have been rewritten using the unit-tests framework. * gt/t-hash-unit-test: t/: migrate helper/test-{sha1, sha256} to unit-tests/t-hash strbuf: introduce strbuf_addstrings() to repeatedly add a string
2024-06-12Merge branch 'cp/reftable-unit-test'Junio C Hamano2-1/+160
Basic unit tests for reftable have been reimplemented under the unit test framework. * cp/reftable-unit-test: t: improve the test-case for parse_names() t: add test for put_be16() t: move tests from reftable/record_test.c to the new unit test t: move tests from reftable/stack_test.c to the new unit test t: move reftable/basics_test.c to the unit testing framework
2024-06-12Merge branch 'jc/t1517-more'Junio C Hamano1-0/+52
A new test was added to ensure git commands that are designed to run outside repositories do work. * jc/t1517-more: imap-send: minimum leakfix t1517: more coverage for commands that work without repository
2024-06-12t/: migrate helper/test-oidtree.c to unit-tests/t-oidtree.cGhanshyam Thakkar7-106/+191
helper/test-oidtree.c along with t0069-oidtree.sh test the oidtree.h library, which is a wrapper around crit-bit tree. Migrate them to the unit testing framework for better debugging and runtime performance. Along with the migration, add an extra check for oidtree_each() test, which showcases how multiple expected matches can be given to check_each() helper. To achieve this, introduce a new library called 'lib-oid.h' exclusively for the unit tests to use. It currently mainly includes utility to generate object_id from an arbitrary hex string (i.e. '12a' -> '12a0000000000000000000000000000000000000'). This also handles the hash algo selection based on GIT_TEST_DEFAULT_HASH. This library will also be helpful when we port other unit tests such as oid-array, oidset etc. Helped-by: Junio C Hamano <gitster@pobox.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> [jc: small fixlets squashed in] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-12parse-options-cb: stop clamping "--abbrev=" to hash lengthPatrick Steinhardt1-0/+12
The `OPT__ABBREV()` option allows the user to specify the length that object hashes shall be abbreviated to. This length needs to be in the range of `(MIN_ABBREV, the_hash_algo->hexsz)`, which is why we clamp the value as required. While this makes sense in the case of `MIN_ABBREV`, it is unnecessary for the upper boundary as the value is eventually passed down to `repo_find_unnique_abbrev_r()`, which handles values larger than the current hash length just fine. In the preceding commit, we have changed parsing of the "core.abbrev" config to stop clamping to the upper boundary. Let's do the same here so that the code becomes simpler, we are consistent with how we treat the "core.abbrev" config and so that we stop depending on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-12config: fix segfault when parsing "core.abbrev" without repoPatrick Steinhardt2-0/+19
The "core.abbrev" config allows the user to specify the minimum length when abbreviating object hashes. Next to the values "auto" and "no", this config also accepts a concrete length that needs to be bigger or equal to the minimum length and smaller or equal to the hash algorithm's hex length. While the former condition is trivial, the latter depends on the object format used by the current repository. It is thus a variable upper boundary that may either be 40 (SHA-1) or 64 (SHA-256). This has two major downsides. First, the user that specifies this config must be aware of the object hashes that its repository use. If they want to configure the value globally, then they cannot pick any value in the range `[41, 64]` if they have any repository that uses SHA-1. If they did, Git would error out when parsing the config. Second, and more importantly, parsing "core.abbrev" crashes when outside of a Git repository because we dereference `the_hash_algo` to figure out its hex length. Starting with c8aed5e8da (repository: stop setting SHA1 as the default object hash, 2024-05-07) though, we stopped initializing `the_hash_algo` outside of Git repositories. Fix both of these issues by not making it an error anymore when the given length exceeds the hash length. Instead, leave the abbreviated length intact. `repo_find_unique_abbrev_r()` handles this just fine except for a performance penalty which we will fix in a subsequent commit. Reported-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11pack-bitmap.c: avoid uninitialized `pack_int_id` during reuseTaylor Blau1-0/+26
When performing multi-pack reuse, reuse_partial_packfile_from_bitmap() is responsible for generating an array of bitmapped_pack structs from which to perform reuse. In the multi-pack case, we loop over the MIDXs packs and copy the result of calling `nth_bitmapped_pack()` to construct the list of reusable paths. But we may also want to do pack-reuse over a single pack, either because we only had one pack to perform reuse over (in the case of single-pack bitmaps), or because we explicitly asked to do single pack reuse even with a MIDX[^1]. When this is the case, the array we generate of reusable packs contains only a single element, which is either (a) the pack attached to the single-pack bitmap, or (b) the MIDX's preferred pack. In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks, 2024-04-15), we refactored the reuse_partial_packfile_from_bitmap() function and stopped assigning the pack_int_id field when reusing only the MIDX's preferred pack. This results in an uninitialized read down in try_partial_reuse() like so: ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8 #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8 #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3 #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3 #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27 #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3 #6 0x55c5ccc8fac8 in run_builtin git.c:474:11 which happens when try_partial_reuse() tries to call midx_pair_to_pack_pos() when it tries to reject cross-pack deltas. Avoid the uninitialized read by ensuring that the pack_int_id field is set in the single-pack reuse case by setting it to either the MIDX preferred pack's pack_int_id, or '-1', in the case of single-pack bitmaps. In the latter case, we never read the pack_int_id field, so the choice of '-1' is intentional as a "garbage in, garbage out" measure. Guard against further regressions in this area by adding a test which ensures that we do not throw out deltas from the preferred pack as "cross-pack" due to an uninitialized pack_int_id. [^1]: This can happen for a couple of reasons, either because the repository is configured with 'pack.allowPackReuse=(true|single)', or because the MIDX was generated prior to the introduction of the BTMP chunk, which contains information necessary to perform multi-pack reuse. Reported-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11midx-write.c: do not read existing MIDX with `packs_to_include`Taylor Blau1-0/+30
Commit d6a8c58675 (midx-write.c: support reading an existing MIDX with `packs_to_include`, 2024-05-29) changed the MIDX generation machinery to support reading from an existing MIDX when writing a new one. Unfortunately, the rest of the MIDX generation machinery is not prepared to deal with such a change. For instance, the function responsible for adding to the object ID fanout table from a MIDX source (midx_fanout_add_midx_fanout()) will gladly add objects from an existing MIDX for some fanout level regardless of whether or not those objects came from packs that are to be included in the subsequent MIDX write. This results in broken pseudo-pack object order (leading to incorrect object traversal results) and segmentation faults, like so (generated by running the added test prior to the changes in midx-write.c): #0 0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590 #1 0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects", packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15) at midx-write.c:1171 #2 0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects", packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15) at midx-write.c:1274 [...] In stack frame #0, the code on midx-write.c:590 is using the new pack ID corresponding to some object which was added from the existing MIDX. Importantly, the pack from which that object was selected in the existing MIDX does not appear in the new MIDX as it was excluded via `--stdin-packs`. In this instance, the pack in question had pack ID "1" in the existing MIDX, but since it was excluded from the new MIDX, we never filled in that entry in the pack_perm table, resulting in: (gdb) p *ctx->pack_perm@2 $1 = {0, 1515870810} Which is what causes the segfault above when we try and read: struct pack_info *pack = &ctx->info[ctx->pack_perm[i]]; if (pack->bitmap_pos == BITMAP_POS_UNKNOWN) pack->bitmap_pos = 0; Fundamentally, we should be able to read information from an existing MIDX when generating a new one. But in practice the midx-write.c code assumes that we won't run into issues like the above with incongruent pack IDs, and often makes those assumptions in extremely subtle and fragile ways. Instead, let's avoid reading from an existing MIDX altogether, and stick with the pre-d6a8c58675 implementation. Harden against any regressions in this area by adding a test which demonstrates these issues. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11builtin/blame: fix leaking ignore revs filesPatrick Steinhardt1-0/+2
When parsing the blame configuration we add "blame.ignoreRevsFile" configs to a string list. This string list is declared as with `NODUP`, and thus we hand over the allocated string to that list. We eventually end up calling `string_list_clear()` on that list, but due to it being declared as `NODUP` we will not release the associated strings and thus leak memory. Fix this issue by setting up the list as `DUP` instead and free the config string after insertion. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11builtin/blame: fix leaking prefixed pathsPatrick Steinhardt4-0/+6
In `cmd_blame()` we compute prefixed paths by calling `add_prefix()`, which itself calls `prefix_path()`. While `prefix_path()` returns an allocated string, `add_prefix()` pretends to return a constant string. Consequently, this path never gets freed. Fix the return type to be `char *` and free the path to plug the memory leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11blame: fix leaking data for blame scoreboardsPatrick Steinhardt8-0/+12
There are some memory leaks when cleaning up blame scoreboards. Fix those. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11merge: fix leaking merge basesPatrick Steinhardt5-0/+5
When calling either the recursive or the ORT merge machineries we need to provide a list of merge bases. The ownership of that parameter is then implicitly transferred to the callee, which is somewhat fishy. Furthermore, that list may leak in some cases where the merge machinery runs into an error, thus causing a memory leak. Refactor the code such that we stop transferring ownership. Instead, the merge machinery will now create its own local copies of the passed in list as required if they need to modify the list. Free the list at the callsites as required. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11builtin/merge: fix leaking `struct cmdnames` in `get_strategy()`Patrick Steinhardt1-0/+1
In "builtin/merge.c" we use the helper infrastructure to figure out what merge strategies there are. We never free contents of the `cmdnames` structures though and thus leak their memory. Fix this by exposing the already existing `clean_cmdnames()` function to release their memory. As this name isn't quite idiomatic, rename it to `cmdnames_release()` while at it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11sequencer: fix memory leaks in `make_script_with_merges()`Patrick Steinhardt3-0/+4
Fix some trivial memory leaks in `make_script_with_merges()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()`Patrick Steinhardt3-2/+4
In `wanted_peer_refs()` we first create a copy of the "HEAD" ref. This copy may not actually be passed back to the caller, but is not getting freed in this case. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11apply: fix leaking string in `match_fragment()`Patrick Steinhardt1-0/+1
Before calling `update_pre_post_images()`, we call `strbuf_detach()` to put its buffer into a new string variable that we then pass to that function. Besides being rather pointless, it also causes us to leak memory of that variable because we never free it. Get rid of the variable altogether and instead reach into the `strbuf` directly. While at it, refactor the code to have a common exit path and mark string that do not contain allocated memory as constant. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11commit: fix leaking parents when calling `commit_tree_extended()`Patrick Steinhardt4-0/+4
When creating commits via `commit_tree_extended()`, the caller passes in a string list of parents. This call implicitly transfers ownership of that list to the function, which is quite surprising to begin with. But to make matters worse, `commit_tree_extended()` doesn't even bother to free the list of parents in error cases. The result is a memory leak, and one that the caller cannot fix by themselves because they do not know whether parts of the string list have already been released. Refactor the code such that callers can keep ownership of the list of parents, which is getting indicated by parameter being a constant pointer now. Free the lists at the calling site and add a common exit path to those sites as required. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11config: fix leaking "core.notesref" variablePatrick Steinhardt2-0/+2
The variable used to track the "core.notesref" config is not getting freed before we assign to it and thus leaks. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11rerere: fix various trivial leaksPatrick Steinhardt3-0/+3
We leak various different string lists in the rerere code. Free those to plug them. Note that the `merge_rr` variable is intentionally being free'd with the `free_util` parameter set to 1. The `util` field is used there to store the IDs of every rerere item and thus needs to be freed, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11builtin/stash: fix leak in `show_stash()`Patrick Steinhardt2-0/+2
We leak the `revision_args()` variable. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11revision: free diff optionsPatrick Steinhardt7-0/+7
There is a todo comment in `release_revisions()` that mentions that we need to free the diff options, which was added via 54c8a7c379 (revisions API: add a TODO for diff_free(&revs->diffopt), 2022-04-14). Releasing the diff options wasn't quite feasible at that time because some call sites rely on its contents to remain even after the revisions have been released. In fact, there really only are a couple of callsites that misbehave here: - `cmd_shortlog()` releases the revisions, but continues to access its file pointer. - `do_diff_cache()` creates a shallow copy of `struct diff_options`, but does not set the `no_free` member. Consequently, we end up releasing resources of the caller-provided diff options. - `diff_free()` and friends do not play nice when being called multiple times as they don't unset data structures that they have just released. Fix all of those cases and enable the call to `diff_free()`, which plugs a bunch of memory leaks. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11builtin/log: fix leaking commit list in git-cherry(1)Patrick Steinhardt1-0/+1
We're storing the list of commits that git-cherry(1) is about to print into a temporary list. This list is never getting free'd and thus leaks. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11merge-recursive: fix memory leak when finalizing mergePatrick Steinhardt3-0/+4
We do not free some members of `struct merge_options`' private data. Fix this to plug those leaks. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11builtin/merge-recursive: fix leaking object ID basesPatrick Steinhardt2-0/+2
In `cmd_merge_recursive()` we have a static array of object ID bases that we pass to `merge_recursive_generic()`. This interface is somewhat weird though because the latter function accepts a pointer to a pointer of object IDs, which requires us to allocate the object IDs on the heap. And as we never free those object IDs, the end result is a leak. While we can easily solve this leak by just freeing the respective object IDs, the whole calling convention is somewhat weird. Instead, refactor `merge_recursive_generic()` to accept a plain pointer to object IDs so that we can avoid allocating them altogether. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11object-name: free leaking object contextsPatrick Steinhardt1-0/+1
While it is documented in `struct object_context::path` that this variable needs to be released by the caller, this fact is rather easy to miss given that we do not ever provide a function to release the object context. And of course, while some callers dutifully release the path, many others don't. Introduce a new `object_context_release()` function that releases the path. Convert callsites that used to free the path to use that new function and add missing calls to callsites that were leaking memory. Refactor those callsites as required to have a single return path, only. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11builtin/rev-list: fix leaking bitmap index when calculating disk usagePatrick Steinhardt1-0/+2
git-rev-list(1) can speed up its object size calculations for reachable objects via a bitmap walk, if there is any bitmap. This is done in `try_bitmap_disk_usage()`, which tries to optimistically load the bitmap and then use it, if available. It never frees it though, leading to a memory leak. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11notes: fix memory leak when pruning notesPatrick Steinhardt1-0/+1
In `prune_notes()` we first store the notes that are to be deleted in a local list, and then iterate through that list to delete those notes one by one. We never free the list though and thus leak its memory. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11revision: fix leaking display notesPatrick Steinhardt1-0/+1
We never free the display notes options embedded into `struct revision`. Implement a new function `release_display_notes()` that we can call in `release_revisions()` to fix this. There is another gotcha here though: we play some games with the string list used to track extra notes refs, where we sometimes set the bit that indicates that strings should be strdup'd and sometimes unset it. This dance is done to avoid a copy of an already-allocated string when we call `enable_ref_display_notes()`. But this dance is rather pointless as we can instead call `string_list_append_nodup()` to transfer ownership of the allocated string to the list. Refactor the code to do so and drop the `strdup_strings` dance. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11merge-recursive: fix leaking rename conflict infoPatrick Steinhardt3-0/+3
When computing rename conflicts in our recursive merge algorithm we set up `struct rename_conflict_info`s to track that information. We never free those data structures though and thus leak memory. We need to be a bit more careful here though because the same rename conflict info can be assigned to multiple structures. Accommodate for this by introducing a `rename_conflict_info_owned` bit that we can use to steer whether or not the rename conflict info shall be free'd. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11biultin/rev-parse: fix memory leaks in `--parseopt` modePatrick Steinhardt2-0/+2
We have a bunch of memory leaks in git-rev-parse(1)'s `--parseopt` mode. Refactor the code to use `struct strvec`s to make it easier for us to track the lifecycle of those leaking variables and then free them. While at it, remove the unneeded static lifetime for some of the variables. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11bundle: plug leaks in `create_bundle()`Patrick Steinhardt3-0/+3
When creating a bundle, we set up a revision walk, but never release data associated with it. Furthermore, we create a mostly-shallow copy of that revision walk where we only adapt its pending objects such that we can reuse the walk. While that copy must not be released, the pending objects array need to be. Plug those memory leaks by releasing the revision walk and the pending objects of the copied revision walk. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11notes-utils: free note trees when releasing copied notesPatrick Steinhardt2-0/+2
While we clear most of the members of `struct notes_rewrite_cfg` in `finish_copy_notes_for_rewrite()`, we do not clear the notes tree. Fix this to plug this memory leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11parse-options: fix leaks for users of OPT_FILENAMEPatrick Steinhardt13-0/+13
The `OPT_FILENAME()` option will, if set, put an allocated string into the user-provided variable. Consequently, that variable thus needs to be free'd by the caller of `parse_options()`. Some callsites don't though and thus leak memory. Fix those. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11revision: fix memory leak when reversing revisionsPatrick Steinhardt1-0/+1
When reversing revisions in a rev walk, `get_revision()` will allocate a new commit list and assign it to `revs->commits`. It does not free the old list though, which makes it leak. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-10Merge branch 'jk/leakfixes'Junio C Hamano1-0/+3
Memory leaks in "git mv" has been plugged. * jk/leakfixes: mv: replace src_dir with a strvec mv: factor out empty src_dir removal mv: move src_dir cleanup to end of cmd_mv() t-strvec: mark variable-arg helper with LAST_ARG_MUST_BE_NULL t-strvec: use va_end() to match va_start()
2024-06-10Merge branch 'iw/trace-argv-on-alias'Junio C Hamano1-0/+11
The alias-expanded command lines are logged to the trace output. * iw/trace-argv-on-alias: run-command: show prepared command Documentation: alias: add notes on shell expansion Documentation: alias: rework notes into points
2024-06-10diff: let external diffs report that changes are uninterestingRené Scharfe1-10/+23
The options --exit-code and --quiet instruct git diff to indicate whether it found any significant changes by exiting with code 1 if it did and 0 if there were none. Currently this doesn't work if external diff programs are involved, as we have no way to learn what they found. Add that ability in the form of the new configuration options diff.trustExitCode and diff.<driver>.trustExitCode and the environment variable GIT_EXTERNAL_DIFF_TRUST_EXIT_CODE. They pair with the config options diff.external and diff.<driver>.command and the environment variable GIT_EXTERNAL_DIFF, respectively. The new options are off by default, keeping the old behavior. Enabling them indicates that the external diff returns exit code 1 if it finds significant changes and 0 if it doesn't, like diff(1). The name of the new options is taken from the git difftool and mergetool options of similar purpose. (There they enable passing on the exit code of a diff tool and to infer whether a merge done by a merge tool is successful.) The new feature sets the diff flag diff_from_contents in diff_setup_done() if we need the exit code and are allowed to call external diffs. This disables the optimization that avoids calling the program with --quiet. Add it back by skipping the call if the external diff is not able to report empty diffs. We can only do that check after evaluating the file-specific attributes in run_external_diff(). If we do run the external diff with --quiet, send its output to /dev/null. I considered checking the output of the external diff to check whether its empty. It was added as 11be65cfa4 (diff: fix --exit-code with external diff, 2024-05-05) and quickly reverted, as it does not work with external diffs that do not write to stdout. There's no reason why a graphical diff tool would even need to write anything there at all. I also considered using a non-zero exit code for empty diffs, which could be done without adding new configuration options. We'd need to disable the optimization that allows git diff --quiet to skip calling external diffs, though -- that might be quite surprising if graphical diff programs are involved. And assigning the opposite meaning of the exit codes compared to diff(1) and git diff --exit-code to the external diff can cause unnecessary confusion. Suggested-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-10t4020: test exit code with external diffsRené Scharfe1-0/+53
Add tests to check the exit code of git diff with its options --quiet and --exit-code when using an external diff program. Currently we cannot tell whether it found significant changes or not. While at it, document briefly that --quiet turns off execution of external diff programs because that behavior surprised me for a moment while writing the tests. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07format-patch: assume --cover-letter for diff in multi-patch seriesRubén Justo2-0/+29
When we deal with a multi-patch series in git-format-patch(1), if we see `--interdiff` or `--range-diff` but no `--cover-letter`, we return with an error, saying: fatal: --range-diff requires --cover-letter or single patch or: fatal: --interdiff requires --cover-letter or single patch This makes sense because the cover-letter is where we place the diff from the previous version. However, considering that `format-patch` generates a multi-patch as needed, let's adopt a similar "cover as necessary" approach when using `--interdiff` or `--range-diff`. Therefore, relax the requirement for an explicit `--cover-letter` in a multi-patch series when the user says `--iterdiff` or `--range-diff`. Still, if only to return the error, respect "format.coverLetter=no" and `--no-cover-letter`. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07t4014: cleanups in a few testsRubén Justo1-5/+5
Arrange things we are going to create to be removed at end, and then start creating them. That way, we will clean them up even if we fail after creating some but before the end of the command. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07Merge branch 'tb/midx-write-cleanup'Junio C Hamano1-0/+23
Code clean-up around writing the .midx files. * tb/midx-write-cleanup: pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper midx: replace `get_midx_rev_filename()` with a generic helper midx-write.c: support reading an existing MIDX with `packs_to_include` midx-write.c: extract `fill_packs_from_midx()` midx-write.c: extract `should_include_pack()` midx-write.c: pass `start_pack` to `compute_sorted_entries()` midx-write.c: reduce argument count for `get_sorted_entries()` midx-write.c: tolerate `--preferred-pack` without bitmaps
2024-06-07revision: always store allocated strings in output encodingPatrick Steinhardt2-0/+2
The `git_log_output_encoding` variable can be set via the `--encoding=` option. When doing so, we conditionally either assign it to the passed value, or if the value is "none" we assign it the empty string. Depending on which of the both code paths we pick though, the variable may end up being assigned either an allocated string or a string constant. This is somewhat risky and may easily lead to bugs when a different code path may want to reassign a new value to it, freeing the previous value. We already to this when parsing the "i18n.logoutputencoding" config in `git_default_i18n_config()`. But because the config is typically parsed before we parse command line options this has been fine so far. Regardless of that, safeguard the code such that the variable always contains an allocated string. While at it, also free the old value in case there was any to plug a potential memory leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07global: improve const correctness when assigning string constantsPatrick Steinhardt5-14/+18
We're about to enable `-Wwrite-strings`, which changes the type of string constants to `const char[]`. Fix various sites where we assign such constants to non-const variables. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07update-ref: add support for 'symref-update' commandKarthik Nayak2-0/+207
Add 'symref-update' command to the '--stdin' mode of 'git-update-ref' to allow updates of symbolic refs. The 'symref-update' command takes in a <new-target>, which the <ref> will be updated to. If the <ref> doesn't exist it will be created. It also optionally takes either an `ref <old-target>` or `oid <old-oid>`. If the <old-target> is provided, it checks to see if the <ref> targets the <old-target> before the update. If <old-oid> is provided it checks <ref> to ensure that it is a regular ref and <old-oid> is the OID before the update. This by extension also means that this when a zero <old-oid> is provided, it ensures that the ref didn't exist before. The divergence in syntax from the regular `update` command is because if we don't use a `(ref | oid)` prefix for the old_value, then there is ambiguity around if the value provided should be treated as an oid or a reference. This is more so the reason, because we allow anything committish to be provided as an oid. While 'symref-verify' and 'symref-delete' also take in `<old-target>` we do not have this divergence there as those commands only work with symrefs. Whereas 'symref-update' also works with regular refs and allows users to convert regular refs to symrefs. The command allows users to perform symbolic ref updates within a transaction. This provides atomicity and allows users to perform a set of operations together. This command supports deref mode, to ensure that we can update dereferenced regular refs to symrefs. Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07update-ref: add support for 'symref-create' commandKarthik Nayak4-1/+101
Add 'symref-create' command to the '--stdin' mode 'git-update-ref' to allow creation of symbolic refs in a transaction. The 'symref-create' command takes in a <new-target>, which the created <ref> will point to. Also, support the 'core.prefersymlinkrefs' config, wherein if the config is set and the filesystem supports symlinks, we create the symbolic ref as a symlink. We fallback to creating a regular symref if creating the symlink is unsuccessful. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07update-ref: add support for 'symref-delete' commandKarthik Nayak2-1/+86
Add a new command 'symref-delete' to allow deletions of symbolic refs in a transaction via the '--stdin' mode of the 'git-update-ref' command. The 'symref-delete' command can, when given an <old-target>, delete the provided <ref> only when it points to <old-target>. This command is only compatible with the 'no-deref' mode because we optionally want to check the 'old_target' of the ref being deleted. De-referencing a symbolic ref would provide a regular ref and we already have the 'delete' command for regular refs. While users can also use 'git symbolic-ref -d' to delete symbolic refs, the 'symref-delete' command in 'git-update-ref' allows users to do so within a transaction, which promises atomicity of the operation and can be batched with other commands. When no 'old_target' is provided it can also delete regular refs, similar to how the 'delete' command can delete symrefs when no 'old_oid' is provided. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07update-ref: add support for 'symref-verify' commandKarthik Nayak2-2/+122
The 'symref-verify' command allows users to verify if a provided <ref> contains the provided <old-target> without changing the <ref>. If <old-target> is not provided, the command will verify that the <ref> doesn't exist. The command allows users to verify symbolic refs within a transaction, and this means users can perform a set of changes in a transaction only when the verification holds good. Since we're checking for symbolic refs, this command will only work with the 'no-deref' mode. This is because any dereferenced symbolic ref will point to an object and not a ref and the regular 'verify' command can be used in such situations. Add required tests for symref support in 'verify'. Since we're here, also add reflog checks for the pre-existing 'verify' tests, there is no divergence from behavior, but we never tested to ensure that reflog wasn't affected by the 'verify' command. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07commit-graph.c: remove temporary graph layers on exitTaylor Blau1-1/+25
Since the introduction of split commit graph layers in 92b1ea66b9a (Merge branch 'ds/commit-graph-incremental', 2019-07-19), the function write_commit_graph_file() has done the following when writing an incremental commit-graph layer: - used a lock_file to control access to the commit-graph-chain file - used an auxiliary file (whose descriptor was stored in 'fd') to write the new commit-graph layer itself Using a lock_file to control access to the commit-graph-chain is sensible, since only one writer may modify it at a time. Likewise, when the commit-graph machinery is writing out a single layer, the lock_file structure is used to modify the commit-graph itself. This is also sensible, since the non-incremental commit-graph may also have at most one writer. However, using an auxiliary temporary file without using the tempfile.h API means that writes that fail after the temporary graph layer has been created will leave around a file in $GIT_DIR/objects/info/commit-graphs/tmp_graph_XXXXXX The commit-graph-chain file and non-incremental commit-graph do not suffer from this problem as the lockfile.h API uses the tempfile.h API transparently, so processes that died before moving those finals into their final location cleaned up after themselves. Ensure that the temporary file used to write incremental commit-graphs is also managed with the tempfile.h API, to ensure that we do not ever leave tmp_graph_XXXXXX files laying around. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06Merge branch 'th/quiet-lazy-fetch-from-promisor'Junio C Hamano1-0/+43
The promisor.quiet configuration knob can be set to true to make lazy fetching from promisor remotes silent. * th/quiet-lazy-fetch-from-promisor: promisor-remote: add promisor.quiet configuration option
2024-06-06Merge branch 'ps/leakfixes'Junio C Hamano46-4/+342
Leakfixes. * ps/leakfixes: builtin/mv: fix leaks for submodule gitfile paths builtin/mv: refactor to use `struct strvec` builtin/mv duplicate string list memory builtin/mv: refactor `add_slash()` to always return allocated strings strvec: add functions to replace and remove strings submodule: fix leaking memory for submodule entries commit-reach: fix memory leak in `ahead_behind()` builtin/credential: clear credential before exit config: plug various memory leaks config: clarify memory ownership in `git_config_string()` builtin/log: stop using globals for format config builtin/log: stop using globals for log config convert: refactor code to clarify ownership of check_roundtrip_encoding diff: refactor code to clarify memory ownership of prefixes config: clarify memory ownership in `git_config_pathname()` http: refactor code to clarify memory ownership checkout: clarify memory ownership in `unique_tracking_name()` strbuf: fix leak when `appendwholeline()` fails with EOF transport-helper: fix leaking helper name
2024-06-06test-terminal: drop stdin handlingJeff King1-26/+3
Since 18d8c26930 (test_terminal: redirect child process' stdin to a pty, 2015-08-04), we set up a pty and copy stdin to the child program. But this ends up being racy; once we send all of the bytes and close the descriptor, the child program will no longer see a terminal! isatty() will return 0, and trying to read may return EIO, even if we didn't yet get all of the bytes. This was mentioned even in the commit message of 18d8c26930, but we hacked around it by just sending an infinite input from /dev/zero (in the intended case, we only cared about isatty(0), not reading actual input). And it came up again recently in: https://lore.kernel.org/git/d42a55b1-1ba9-4cfb-9c3d-98ea4d86da33@gmail.com/ where we tried to actually send bytes, but they don't always all come through. So this interface is somewhat of an accident waiting to happen; a caller might not even care about stdin being a tty, but will get bit by the flaky behavior. One solution would probably be to avoid closing test_terminal's end of the pty altogether. But then the other side would never see EOF on its stdin. That may be OK for some cases, but it's another gotcha that might cause races or deadlocks, depending on what the child expects to read. Let's instead just drop test_terminal's stdin feature completely. Since the previous commit dropped the two cases from t4153 for which the feature was originally added, there are no callers left that need it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06am: add explicit "--retry" optionJeff King1-5/+9
After a patch fails, you can ask "git am" to try applying it again with new options by running without any of the resume options. E.g.: git am <patch # oops, it failed; let's try again git am --3way But since this second command has no explicit resume option (like "--continue"), it looks just like an invocation to read a fresh patch from stdin. To avoid confusing the two cases, there are some heuristics, courtesy of 8d18550318 (builtin-am: reject patches when there's a session in progress, 2015-08-04): if (in_progress) { /* * Catch user error to feed us patches when there is a session * in progress: * * 1. mbox path(s) are provided on the command-line. * 2. stdin is not a tty: the user is trying to feed us a patch * from standard input. This is somewhat unreliable -- stdin * could be /dev/null for example and the caller did not * intend to feed us a patch but wanted to continue * unattended. */ if (argc || (resume_mode == RESUME_FALSE && !isatty(0))) die(_("previous rebase directory %s still exists but mbox given."), state.dir); if (resume_mode == RESUME_FALSE) resume_mode = RESUME_APPLY; [...] So if no resume command is given, then we require that stdin be a tty, and otherwise complain about (potentially) receiving an mbox on stdin. But of course you might not actually have a terminal available! And sadly there is no explicit way to hit this same code path; this is the only place that sets RESUME_APPLY. So you're stuck, and scripts like our test suite have to bend over backwards to create a pseudo-tty. Let's provide an explicit option to trigger this mode. The code turns out to be quite simple; just setting "resume_mode" to RESUME_FALSE is enough to dodge the tty check, and then our state is the same as it would be with the heuristic case (which we'll continue to allow). When we don't have a session in progress, there's already code to complain when resume_mode is set (but we'll add a new test to cover that). To test the new option, we'll convert the existing tests that rely on the fake stdin tty. That lets us test them on more platforms, and will let us simplify test_terminal a bit in a future patch. It does, however, mean we're not testing the tty heuristic at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06builtin/refs: new command to migrate ref storage formatsPatrick Steinhardt1-0/+243
Introduce a new command that allows the user to migrate a repository between ref storage formats. This new command is implemented as part of a new git-refs(1) executable. This is due to two reasons: - There is no good place to put the migration logic in existing commands. git-maintenance(1) felt unwieldy, and git-pack-refs(1) is not the correct place to put it, either. - I had it in my mind to create a new low-level command for accessing refs for quite a while already. git-refs(1) is that command and can over time grow more functionality relating to refs. This should help discoverability by consolidating low-level access to refs into a single executable. As mentioned in the preceding commit that introduces the ref storage format migration logic, the new `git refs migrate` command still has a bunch of restrictions. These restrictions are documented accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06refs: allow to skip creation of reflog entriesPatrick Steinhardt1-0/+1
The ref backends do not have any way to disable the creation of reflog entries. This will be required for upcoming ref format migration logic so that we do not create any entries that didn't exist in the original ref database. Provide a new `REF_SKIP_CREATE_REFLOG` flag that allows the caller to disable reflog entry creation. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-05add-i: finally retire add.interactive.useBuiltinJunio C Hamano1-15/+0
The configuration variable stopped doing anything (other than announcing itself as a variable that does not do anything useful, when it is used) in Git 2.40. At this point, it is not even worth giving the warning, which was meant to be a way to help users notice they are carrying unused cruft in their configuration files and give them a chance to clean-up. Let's remove the warning and documentation for it, and truly stop paying attention to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/config/add.txt | 6 ------ builtin/add.c | 6 +----- t/t3701-add-interactive.sh | 15 --------------- 3 files changed, 1 insertion(+), 26 deletions(-)
2024-06-05sparse-checkout: free duplicate hashmap entriesJeff King1-0/+1
In insert_recursive_pattern(), we create a new pattern_entry to insert into the parent_hashmap. If we find that the same entry already exists in the hashmap, we skip adding the new one. But we forget to free the new one, creating a leak. We can fix it by cleaning up the discarded entry. It would probably be possible to avoid creating it in the first place, but it's non-trivial. We'd have to define a "keydata" struct that lets us compare the existing entries to the broken-out fields. It's probably not worth the complexity, so we'll punt on that for now. There is one subtlety here: our insertion is happening in a loop, with each iteration looking at the pattern we just inserted (hence the "recursive" in the name). So if we skip insertion, what do we look at? The obvious answer is that we should remember the existing duplicate we found and use that. But I _think_ in that case, we probably already have all of the recursive bits already (from when the original entry was added). And so just breaking out of the loop would be correct. But I'm not 100% sure on that; after all, the original leaky code could have done the same break, but it didn't. So I went with the "obvious answer" above, which has no chance of changing the behavior aside from fixing the leak. With this patch, t1091 can now be marked leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-05dir.c: reduce max pattern file size to 100MBJeff King2-0/+20
In a2bc523e1e (dir.c: skip .gitignore, etc larger than INT_MAX, 2024-05-31) we put capped the size of some files whose parsing code and data structures used ints. Setting the limit to INT_MAX was a natural spot, since we know the parsing code would misbehave above that. But it also leaves the possibility of overflow errors when we multiply that limit to allocate memory. For instance, a file consisting only of "a\na\n..." could have INT_MAX/2 entries. Allocating an array of pointers for each would need INT_MAX*4 bytes on a 64-bit system, enough to overflow a 32-bit int. So let's give ourselves a bit more safety margin by giving a much smaller limit. The size 100MB is somewhat arbitrary, but is based on the similar value for attribute files added by 3c50032ff5 (attr: ignore overly large gitattributes files, 2022-12-01). There's no particular reason these have to be the same, but the idea is that they are in the ballpark of "so huge that nobody would care, but small enough to avoid malicious overflow". So lacking a better guess, it makes sense to use the same value. The implementation here doesn't share the same constant, but we could change that later (or even give it a runtime config knob, though nobody has complained yet about the attribute limit). And likewise, let's add a few tests that exercise the limits, based on the attr ones. In this case, though, we never read .gitignore from the index; the blob code is exercised only for sparse filters. So we'll trigger it that way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-04show-ref: introduce --branches and deprecate --headsJunio C Hamano1-8/+16
We call the tips of branches "heads", but this command calls the option to show only branches "--heads", which confuses the branches themselves and the tips of branches. Straighten the terminology by introducing "--branches" option that limits the output to branches, and deprecate "--heads" option used that way. We do not plan to remove "--heads" or "-h" yet; we may want to do so at Git 3.0, in which case, we may need to start advertising upcoming removal with an extra warning when they are used. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-04ls-remote: introduce --branches and deprecate --headsJunio C Hamano1-5/+27
We call the tips of branches "heads", but this command calls the option to show only branches "--heads", which confuses the branches themselves and the tips of branches. Straighten the terminology by introducing "--branches" option that limits the output to branches, and deprecate "--heads" option used that way. We do not plan to remove "--heads" or "-h" yet; we may want to do so at Git 3.0, in which case, we may need to start advertising upcoming removal with an extra warning when they are used. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-04dir.c: free removed sparse-pattern hashmap entriesJeff King1-0/+1
In add_pattern_to_hashsets(), we remove entries from the recursive_hashmap when adding similar ones to the parent_hashmap. I won't pretend to understand all of what's going on here, but there's an obvious leak: whatever we removed from recursive_hashmap is not referenced anywhere else, and is never free()d. We can easily fix this by asking the hashmap to return a pointer to the old entry. This makes t7002 now completely leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-04sparse-checkout: pass string literals directly to add_pattern()Jeff King2-0/+2
The add_pattern() function takes a pattern string, but neither makes a copy of it nor takes ownership of the memory. So it is the caller's responsibility to make sure the string hangs around as long as the pattern_list which references it. There are a few cases in sparse-checkout where we use string literal patterns by stuffing them into a strbuf, detaching the buffer, and then passing the result into add_pattern(). This creates a leak when the pattern_list is eventually cleared, since we don't retain a copy of the detached buffer to free. But we can observe that the whole strbuf dance is unnecessary. The point was presumably[1] to satisfy the lifetime requirement of the string. But string literals have static duration; we can count on them lasting for the whole program. So we can fix the leak by just passing them directly. And as a bonus, that simplifies the code. The leaks can be seen in t7002, which drops from 25 leaks to 22 with this patch. It also makes t3602 and t1090 leak-free. In the long run, we will also want to clean up this (undocumented!) memory lifetime requirement of add_pattern(). But that can come in a later patch; passing the string literals directly will be the right thing either way. [1] The code in question comes from 416adc8711 (sparse-checkout: update working directory in-process for 'init', 2019-11-21) and 99dfa6f970 (sparse-checkout: use in-process update for disable subcommand, 2019-11-21), but I didn't see anything in their commit messages or on the list explaining the strbufs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-03Merge branch 'th/push-local-ff-check-without-lazy-fetch'Junio C Hamano1-0/+19
When "git push" notices that the commit at the tip of the ref on the other side it is about to overwrite does not exist locally, it used to first try fetching it if the local repository is a partial clone. The command has been taught not to do so and immediately fail instead. * th/push-local-ff-check-without-lazy-fetch: push: don't fetch commit object when checking existence
2024-06-03Merge branch 'ps/fix-reinit-includeif-onbranch'Junio C Hamano1-8/+93
"git init" in an already created directory, when the user configuration has includeif.onbranch, started to fail recently, which has been corrected. * ps/fix-reinit-includeif-onbranch: setup: fix bug with "includeIf.onbranch" when initializing dir