aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
19 hoursThe fifth batchHEADmastermainJunio C Hamano1-0/+11
Signed-off-by: Junio C Hamano <gitster@pobox.com>
19 hoursMerge branch 'jk/asan-bonanza'Junio C Hamano7-40/+122
Various issues detected by Asan have been corrected. * jk/asan-bonanza: t: enable ASan's strict_string_checks option fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated fsck: remove redundant date timestamp check fsck: avoid strcspn() in fsck_ident() fsck: assert newline presence in fsck_ident() cache-tree: avoid strtol() on non-string buffer Makefile: turn on NO_MMAP when building with ASan pack-bitmap: handle name-hash lookups in incremental bitmaps compat/mmap: mark unused argument in git_munmap()
19 hoursMerge branch 'je/doc-data-model'Junio C Hamano4-2/+311
Add a new manual that describes the data model. * je/doc-data-model: doc: add an explanation of Git's data model
19 hoursMerge branch 'jc/whitespace-incomplete-line'Junio C Hamano9-88/+450
Both "git apply" and "git diff" learn a new whitespace error class, "incomplete-line". * jc/whitespace-incomplete-line: attr: enable incomplete-line whitespace error for this project diff: highlight and error out on incomplete lines apply: check and fix incomplete lines whitespace: allocate a few more bits and define WS_INCOMPLETE_LINE apply: revamp the parsing of incomplete lines diff: update the way rewrite diff handles incomplete lines diff: call emit_callback ecbdata everywhere diff: refactor output of incomplete line diff: keep track of the type of the last line seen diff: correct suppress_blank_empty hack diff: emit_line_ws_markup() if/else style fix whitespace: correct bit assignment comments
19 hoursMerge branch 'ja/doc-synopsis-style'Junio C Hamano10-405/+427
Doc mark-up updates. * ja/doc-synopsis-style: doc: pull-fetch-param typofix doc: convert git push to synopsis style doc: convert git pull to synopsis style doc: convert git fetch to synopsis style
19 hoursMerge branch 'lo/repo-info-all'Junio C Hamano3-21/+69
"git repo info" learned "--all" option. * lo/repo-info-all: repo: add --all to git-repo-info repo: factor out field printing to dedicated function
5 daysThe fourth batchJunio C Hamano1-0/+39
Signed-off-by: Junio C Hamano <gitster@pobox.com>
5 daysMerge branch 'gf/win32-pthread-cond-wait-err'Junio C Hamano2-1/+9
Emulation code clean-up. * gf/win32-pthread-cond-wait-err: win32: return error if SleepConditionVariableCS fails
5 daysMerge branch 'jk/ci-windows-meson-test-fix'Junio C Hamano3-1/+25
"Windows+meson" job at the GitHub Actions CI was hard to debug, as it did not show and save failed test artifacts, which has been corrected. * jk/ci-windows-meson-test-fix: ci(windows-meson-test): handle options and output like other test jobs unit-test: ignore --no-chain-lint
5 daysMerge branch 'pw/worktree-list-display-width-fix'Junio C Hamano2-25/+53
"git worktree list" attempts to show paths to worktrees while aligning them, but miscounted display columns for the paths when non-ASCII characters were involved, which has been corrected. * pw/worktree-list-display-width-fix: worktree list: quote paths worktree list: fix column spacing
5 daysMerge branch 'js/wincred-get-credential-alloc-fix'Junio C Hamano1-1/+1
Under-allocation fix. * js/wincred-get-credential-alloc-fix: wincred: avoid memory corruption
5 daysMerge branch 'js/cmake-libgit-fix'Junio C Hamano1-13/+1
Makefile based build have recently been updated to build a libgit.a that also has reftable and xdiff objects; CMake based build procedure has been updated to match. * js/cmake-libgit-fix: cmake: stop trying to build the reftable and xdiff libraries
5 daysMerge branch 'js/mingw-assign-comma-fix'Junio C Hamano1-20/+28
The "return errno = EFOO, -1" construct, which is heavily used in compat/mingw.c and triggers warnings under "-Wcomma", has been rewritten to avoid the warnings. * js/mingw-assign-comma-fix: mingw: avoid the comma operator
5 daysMerge branch 'js/ci-github-setup-go-update'Junio C Hamano1-1/+1
Update a version of action used at the GitHub Actrions CI. * js/ci-github-setup-go-update: ci: bump actions/setup-go from 5 to 6
5 daysMerge branch 'jk/test-mktemp-leakfix'Junio C Hamano1-1/+7
Test leakfix. * jk/test-mktemp-leakfix: test-mktemp: plug memory and descriptor leaks
5 daysMerge branch 'rs/xmkstemp-simplify'Junio C Hamano1-18/+1
Code simplification. * rs/xmkstemp-simplify: wrapper: simplify xmkstemp()
5 daysMerge branch 'ad/blame-diff-algorithm'Junio C Hamano9-24/+279
"git blame" learns "--diff-algorithm=<algo>" option. * ad/blame-diff-algorithm: blame: make diff algorithm configurable xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASK
5 daysMerge branch 'en/ort-rename-another-fix'Junio C Hamano2-10/+114
Yet another corner case fix around renames in the "ort" merge strategy. * en/ort-rename-another-fix: merge-ort: fix failing merges in special corner case merge-ort: remove debugging crud t6429: update comment to mention correct tool
5 daysMerge branch 'master' of https://github.com/j6t/gitkJunio C Hamano1-18/+69
* 'master' of https://github.com/j6t/gitk: gitk: add external diff file rename detection gitk: show unescaped file names on 'rename' and 'copy' lines gitk: fix a 'continue' statement outside a loop to 'return' gitk: persist position and size of the Tags and Heads window Revert "gitk: Only restore window size from ~/.gitk, not position"
5 daysMerge branch 'tb/external-diff-renamed'Johannes Sixt1-2/+38
* tb/external-diff-renamed: gitk: add external diff file rename detection
5 daysMerge branch 'js/persist-ref-window-geometry'Johannes Sixt1-15/+22
* js/persist-ref-window-geometry: gitk: persist position and size of the Tags and Heads window Revert "gitk: Only restore window size from ~/.gitk, not position"
7 daysThe third batchJunio C Hamano1-1/+32
Signed-off-by: Junio C Hamano <gitster@pobox.com>
7 daysMerge branch 'jx/repo-struct-utf8width-fix'Junio C Hamano4-4/+153
The "git repo structure" subcommand tried to align its output but mixed up byte count and display column width, which has been corrected. * jx/repo-struct-utf8width-fix: builtin/repo: fix table alignment for UTF-8 characters t/unit-tests: add UTF-8 width tests for CJK chars
7 daysMerge branch 'kn/osxkeychain-idempotent-store-fix'Junio C Hamano3-30/+132
An earlier check added to osx keychain credential helper to avoid storing the credential itself supplied was overeager and rejected credential material supplied by other helper backends that it would have wanted to store, which has been corrected. * kn/osxkeychain-idempotent-store-fix: osxkeychain: avoid incorrectly skipping store operation
7 daysMerge branch 'kh/doc-commit-extra-references'Junio C Hamano1-4/+6
Doc update. * kh/doc-commit-extra-references: doc: commit: link to git-status(1) on all format options
7 daysMerge branch 'ps/object-source-loose'Junio C Hamano12-207/+287
A part of code paths that deals with loose objects has been cleaned up. * ps/object-source-loose: object-file: refactor writing objects via a stream object-file: rename `write_object_file()` object-file: refactor freshening of objects object-file: rename `has_loose_object()` object-file: read objects via the loose object source object-file: move loose object map into loose source object-file: hide internals when we need to reprepare loose sources object-file: move loose object cache into loose source object-file: introduce `struct odb_source_loose` object-file: move `fetch_if_missing` odb: adjust naming to free object sources odb: introduce `odb_source_new()` odb: fix subtle logic to check whether an alternate is usable
7 daysMerge branch 'qj/doc-http-bad-want-response'Junio C Hamano1-1/+2
Doc update. * qj/doc-http-bad-want-response: doc: clarify server behavior for invalid 'want' lines in HTTP protocol
7 daysMerge branch 'sa/replay-atomic-ref-updates'Junio C Hamano4-43/+277
"git replay" (experimental) learned to perform ref updates itself in a transaction by default, instead of emitting where each refs should point at and leaving the actual update to another command. * sa/replay-atomic-ref-updates: replay: add replay.refAction config option replay: make atomic ref updates the default behavior replay: use die_for_incompatible_opt2() for option validation
7 daysMerge branch 'bc/submodule-force-same-hash'Junio C Hamano4-4/+56
Adding a repository that uses a different hash function is a no-no, but "git submodule add" did nt prevent it, which has been corrected. * bc/submodule-force-same-hash: read-cache: drop submodule check from add_to_cache() object-file: disallow adding submodules of different hash algo
7 daysMerge branch 'jk/attr-macroexpand-wo-recursion'Junio C Hamano2-16/+54
The code to expand attribute macros has been rewritten to avoid recursion to avoid running out of stack space in an uncontrolled way. * jk/attr-macroexpand-wo-recursion: attr: avoid recursion when expanding attribute macros
7 daysdoc: pull-fetch-param typofixJean-Noël Avila via GitGitGadget1-1/+1
An earier patch had a typo discovered after it has been merged to 'next'. Fix it. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
10 daysThe second batchJunio C Hamano1-0/+19
Signed-off-by: Junio C Hamano <gitster@pobox.com>
10 daysMerge branch 'jc/gitattributes-whitespace-no-indent-fix'Junio C Hamano1-1/+1
Ever since we added whitespace rules for this project, we misspelt an entry, which has been corrected. * jc/gitattributes-whitespace-no-indent-fix: .gitattributes: remove misspelled no-op whitespace attribute
10 daysMerge branch 'kn/maintenance-is-needed'Junio C Hamano14-43/+284
"git maintenance" command learned "is-needed" subcommand to tell if it is necessary to perform various maintenance tasks. * kn/maintenance-is-needed: maintenance: add 'is-needed' subcommand maintenance: add checking logic in `pack_refs_condition()` refs: add a `optimize_required` field to `struct ref_storage_be` reftable/stack: add function to check if optimization is required reftable/stack: return stack segments directly
10 daysMerge branch 'rs/diff-quiet-no-rename'Junio C Hamano2-0/+12
As "git diff --quiet" only cares about the existence of any changes, disable rename/copy detection to skip more expensive processing whose result will be discarded anyway. * rs/diff-quiet-no-rename: diff: disable rename detection with --quiet
11 dayswin32: return error if SleepConditionVariableCS failsGreg Funni2-1/+9
If it fails, return an error. Signed-off-by: Greg Funni <gfunni234@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
12 daysdoc: convert git push to synopsis styleJean-Noël Avila2-179/+201
- Switch the synopsis to a synopsis block which will automatically format placeholders in italics and keywords in monospace - Use _<placeholder>_ instead of <placeholder> in the description - Use `backticks` for keywords and more complex option descriptions. The new rendering engine will apply synopsis rules to these spans. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
12 daysdoc: convert git pull to synopsis styleJean-Noël Avila4-39/+38
- Switch the synopsis to a synopsis block which will automatically format placeholders in italics and keywords in monospace - Use _<placeholder>_ instead of <placeholder> in the description - Use `backticks` for keywords and more complex option descriptions. The new rendering engine will apply synopsis rules to these spans. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
12 daysdoc: convert git fetch to synopsis styleJean-Noël Avila6-189/+190
- Switch the synopsis to a synopsis block which will automatically format placeholders in italics and keywords in monospace - Use _<placeholder>_ instead of <placeholder> in the description - Use `backticks` for keywords and more complex option descriptions. The new rendering engine will apply synopsis rules to these spans. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
12 daysStart 2.53 cycleJunio C Hamano3-2/+14
Signed-off-by: Junio C Hamano <gitster@pobox.com>
12 daysMerge branch 'ps/ref-peeled-tags-fixes'Junio C Hamano5-11/+11
Another fix-up to "peeled-tags" topic. * ps/ref-peeled-tags-fixes: object: fix performance regression when peeling tags
12 daysMerge branch 'kn/refs-optim-cleanup'Junio C Hamano11-72/+42
Code clean-up. * kn/refs-optim-cleanup: t/pack-refs-tests: move the 'test_done' to callees refs: rename 'pack_refs_opts' to 'refs_optimize_opts' refs: move to using the '.optimize' functions
12 daysMerge branch 'ps/ref-peeled-tags'Junio C Hamano67-852/+825
Some ref backend storage can hold not just the object name of an annotated tag, but the object name of the object the tag points at. The code to handle this information has been streamlined. * ps/ref-peeled-tags: t7004: do not chdir around in the main process ref-filter: fix stale parsed objects ref-filter: parse objects on demand ref-filter: detect broken tags when dereferencing them refs: don't store peeled object IDs for invalid tags object: add flag to `peel_object()` to verify object type refs: drop infrastructure to peel via iterators refs: drop `current_ref_iter` hack builtin/show-ref: convert to use `reference_get_peeled_oid()` ref-filter: propagate peeled object ID upload-pack: convert to use `reference_get_peeled_oid()` refs: expose peeled object ID via the iterator refs: refactor reference status flags refs: fully reset `struct ref_iterator::ref` on iteration refs: introduce `.ref` field for the base iterator refs: introduce wrapper struct for `each_ref_fn`
12 daysMerge branch 'ps/packed-git-in-object-store'Junio C Hamano9-172/+223
The list of packfiles used in a running Git process is moved from the packed_git structure into the packfile store. * ps/packed-git-in-object-store: packfile: track packs via the MRU list exclusively packfile: always add packfiles to MRU when adding a pack packfile: move list of packs into the packfile store builtin/pack-objects: simplify logic to find kept or nonlocal objects packfile: fix approximation of object counts http: refactor subsystem to use `packfile_list`s packfile: move the MRU list into the packfile store packfile: use a `strmap` to store packs by name
13 daysrepo: add --all to git-repo-infoLucas Seiki Oshiro3-5/+51
Add a new flag `--all` to git-repo-info for requesting values for all the available keys. By using this flag, the user can retrieve all the values instead of searching what are the desired keys for what they wants. Helped-by: Karthik Nayak <karthik.188@gmail.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysrepo: factor out field printing to dedicated functionLucas Seiki Oshiro1-16/+18
Move the field printing in git-repo-info to a new function called `print_field`, allowing it to be called by functions other than `print_fields`. Also change its use of quote_c_style() helper to output directly to the standard output stream, instead of taking a result in a strbuf and then printing it outselves. Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysworktree list: quote pathsPhillip Wood2-3/+22
If a worktree path contains newlines or other control characters it messes up the output of "git worktree list". Fix this by using quote_path() to display the worktree path. The output of "git worktree list" is designed for human consumption, scripts should be using the "--porcelain" option so this change should not break them. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysworktree list: fix column spacingPhillip Wood2-24/+33
The output of "git worktree list" displays a table containing the worktree path, HEAD OID and branch name for each worktree. The code aligns the columns by measuring the visual width of the worktree path when it is printed. Unfortunately it fails to use the visual width when calculating the width of the column so, if any of the paths contain a multibyte character, we can end up with excess padding between columns. The simplest fix would be to replace strlen() with utf8_strwidth() in measure_widths(). However that leaves us measuring the visual width twice and the byte length once. By caching the visual width and printing the padding separately to the worktree path, we only need to calculate the visual width once and do not need the byte length at all. The visual widths are stored in an arrays of structs rather than an array of ints as the next commit will add more struct members. Even if there are no multibyte characters in any of the paths we still print an extra space between the path and the object id as the field width is calculated as one plus the length of the path and we print an explicit space as well. This is fixed by not printing the extra space. The tests are updated to include multibyte characters in one of the worktree paths and to check the spacing of the columns. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daystest-mktemp: plug memory and descriptor leaksJeff King1-1/+7
We test xmkstemp() in our helper by just calling: xmkstemp(xstrdup(argv[1])); This leaks both the copied string as well as the descriptor returned by the function. In practice this isn't a big deal, since we immediately exit the program, but: 1. LSan will complain about the memory leak. The only reason we did not notice this in our leak-checking builds is that both of the callers in the test suite (both in t0070) pass a broken template (and expect failure). So the function calls die() before we can actually leak. But it's an accident waiting to happen if anybody adds a call which succeeds. 2. Coverity complains about the descriptor leak. There's a long list of uninteresting or false positives in Coverity's results, but since we're here we might as well fix it, too. I didn't bother adding a new test that triggers the leak. It's not even in real production code, but just in the test-helper itself. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysci(windows-meson-test): handle options and output like other test jobsJeff King2-1/+24
The GitHub windows-meson-test jobs directly run "meson test" with the --slice option. This means they skip all of the ci/lib.sh infrastructure, and in particular: 1. They do not actually set any GIT_TEST_OPTS like --verbose-log or -x. 2. They do not do the usual handle_failed_tests() magic to print test failures or tar up failed directories. As a result, you get almost no feedback at all when a test fails in this job, making debugging rather tricky. Let's try to make this behave more like the other CI jobs. Because we're on Windows, we can't just use the normal run-build-and-tests.sh script. Our build runs as a separate job (like the non-meson Windows job), and then we parallelize the tests across several job slices. So we need something like the run-test-slice.sh script that the "windows-test" job uses. In theory we could just swap out the "make" invocation there for "meson". But it doesn't quite work, because "make" knows how to pull GIT_TEST_OPTS out of GIT-BUILD-OPTIONS automatically. But for meson, we have to extract them into the --test-args option ourselves. I tried making the logic in run-test-slice.sh conditional, but there ended up being hardly any common code at all (and there are some tricky ordering constraints). So I added up with a new meson-specific test-slice runner. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysunit-test: ignore --no-chain-lintJeff King1-0/+1
In the same spirit as 9faf3963b6 (t: introduce compatibility options to clar-based tests, 2024-12-13), we should ignore --no-chain-lint passed to our clar tests, since it may appear in GIT_TEST_OPTS to be used with other tests. This is particularly important on Windows CI, where --no-chain-lint is added to the test options by default, and the meson build will pass all options to the unit tests. The only reason our meson Windows CI job does not run into this currently is that it is not respecting GIT_TEST_OPTS at all! So ignoring this option is a prerequisite to fixing that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 dayst: enable ASan's strict_string_checks optionJeff King1-0/+1
ASan has an option to enable strict string checking, where any pointer passed to a function that expects a NUL-terminated string will be checked for that NUL termination. This can sometimes produce false positives. E.g., it is not wrong to pass a buffer with { '1', '2', '\n' } into strtoul(). Even though it is not NUL-terminated, it will stop at the newline. But in trying it out, it identified two problematic spots in our test suite (which have now been adjusted): 1. The strtol() parsing in cache-tree.c was a real potential problem, which would have been very hard to find otherwise (since it required constructing a very specific broken index file). 2. The use of string functions in fsck_ident() were false positives, because we knew that there was always a trailing newline which would stop the functions from reading off the end of the buffer. But the reasoning behind that is somewhat fragile, and silencing those complaints made the code easier to reason about. So even though this did not find any earth-shattering bugs, and even had a few false positives, I'm sufficiently convinced that its complaints are more helpful than hurtful. Let's turn it on by default (since the test suite now runs cleanly with it) and see if it ever turns up any other instances. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysfsck: avoid parse_timestamp() on buffer that isn't NUL-terminatedJeff King1-4/+19
In fsck_ident(), we parse the timestamp with parse_timestamp(), which is really an alias for strtoumax(). But since our buffer may not be NUL-terminated, this can trigger a complaint from ASan's strict_string_checks mode. This is a false positive, since we know that the buffer contains a trailing newline (which we checked earlier in the function), and that strtoumax() would stop there. But it is worth working around ASan's complaint. One is because that will let us turn on strict_string_checks by default, which has helped catch other real problems. And two is that the safety of the current code is very hard to reason about (it subtly depends on distant code which could change). One option here is to just parse the number left-to-right ourselves. But we care about the size of a timestamp_t and detecting overflow, since that's part of the point of these checks. And doing that correctly is tricky. So we'll instead just pull the digits into a separate, NUL-terminated buffer, and use that to call parse_timestamp(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysfsck: remove redundant date timestamp checkJeff King1-1/+1
After calling "parse_timestamp(p, &end, 10)", we complain if "p == end", which would imply that we did not see any digits at all. But we know this cannot be the case, since we would have bailed already if we did not see any digits, courtesy of extra checks added by 8e4309038f (fsck: do not assume NUL-termination of buffers, 2023-01-19). Since then, checking "p == end" is redundant and we can drop it. This will make our lives a little easier as we refactor further. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysfsck: avoid strcspn() in fsck_ident()Jeff King1-10/+22
We may be operating on a buffer that is not NUL-terminated, but we use strcspn() to parse it. This is OK in practice, as discussed in 8e4309038f (fsck: do not assume NUL-termination of buffers, 2023-01-19), because we know there is at least a trailing newline in our buffer, and we always pass "\n" to strcspn(). So we know it will stop before running off the end of the buffer. But this is a subtle point to hang our memory safety hat on. And it confuses ASan's strict_string_checks mode, even though it is technically a false positive (that mode complains that we have no NUL, which is true, but it does not know that we have verified the presence of the newline already). Let's instead open-code the loop. As a bonus, this makes the logic more obvious (to my mind, anyway). The current code skips forward with strcspn until it hits "<", ">", or "\n". But then it must check which it saw to decide if that was what we expected or not, duplicating some logic between what's in the strcspn() and what's in the domain logic. Instead, we can just check each character as we loop and act on it immediately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysfsck: assert newline presence in fsck_ident()Jeff King1-7/+9
The fsck code purports to handle buffers that are not NUL-terminated, but fsck_ident() uses some string functions. This works OK in practice, as explained in 8e4309038f (fsck: do not assume NUL-termination of buffers, 2023-01-19). Before calling fsck_ident() we'll have called verify_headers(), which makes sure we have at least a trailing newline. And none of our string-like functions will walk past that newline. However, that makes this code at the top of fsck_ident() very confusing: *ident = strchrnul(*ident, '\n'); if (**ident == '\n') (*ident)++; We should always see that newline, or our memory safety assumptions have been violated! Further, using strchrnul() is weird, since the whole point is that if the newline is not there, we don't necessarily have a NUL at all, and might read off the end of the buffer. So let's have callers pass in the boundary of our buffer, which lets us safely find the newline with memchr(). And if it is not there, this is a BUG(), because it means our caller did not validate the input with verify_headers() as it was supposed to (and we are better off bailing rather than having memory-safety problems). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 dayscache-tree: avoid strtol() on non-string bufferJeff King1-13/+37
A cache-tree extension entry in the index looks like this: <name> NUL <entry_nr> SPACE <subtree_nr> NEWLINE <binary_oid> where the "_nr" items are human-readable base-10 ASCII. We parse them with strtol(), even though we do not have a NUL-terminated string (we'd generally have an mmap() of the on-disk index file). For a well-formed entry, this is not a problem; strtol() will stop when it sees the newline. But there are two problems: 1. A corrupted entry could omit the newline, causing us to read further. You'd mostly get stopped by seeing non-digits in the oid field (and if it is likewise truncated, there will still be 20 or more bytes of the index checksum). So it's possible, though unlikely, to read off the end of the mmap'd buffer. Of course a malicious index file can fake the oid and the index checksum to all (ASCII) 0's. This is further complicated by the fact that mmap'd buffers tend to be zero-padded up to the page boundary. So to run off the end, the index size also has to be a multiple of the page size. This is also unlikely, though you can construct a malicious index file that matches this. The security implications aren't too interesting. The index file is a local file anyway (so you can't attack somebody by cloning, but only if you convince them to operate in a .git directory you made, at which point attacking .git/config is much easier). And it's just a read overflow via strtol(), which is unlikely to buy you much beyond a crash. 2. ASan has a strict_string_checks option, which tells it to make sure that options to string functions (like strtol) have some eventual NUL, without regard to what the function would actually do (like stopping at a newline here). This option sometimes has false positives, but it can point to sketchy areas (like this one) where the input we use doesn't exhibit a problem, but different input _could_ cause us to misbehave. Let's fix it by just parsing the values ourselves with a helper function that is careful not to go past the end of the buffer. There are a few behavior changes here that should not matter: - We do not consider overflow, as strtol() would. But nor did the original code. However, we don't trust the value we get from the on-disk file, and if it says to read 2^30 entries, we would notice that we do not have that many and bail before reading off the end of the buffer. - Our helper does not skip past extra leading whitespace as strtol() would, but according to gitformat-index(5) there should not be any. - The original quit parsing at a newline or a NUL byte, but now we insist on a newline (which is what the documentation says, and what Git has always produced). Since we are providing our own helper function, we can tweak the interface a bit to make our lives easier. The original code does not use strtol's "end" pointer to find the end of the parsed data, but rather uses a separate loop to advance our "buf" pointer to the trailing newline. We can instead provide a helper that advances "buf" as it parses, letting us read strictly left-to-right through the buffer. I didn't add a new test here. It's surprisingly difficult to construct an index of exactly the right size due to the way we pad entries. But it is easy to trigger the problem in existing tests when using ASan's strict string checking, coupled with a recent change to use NO_MMAP with ASan builds. So: make SANITIZE=address cd t ASAN_OPTIONS=strict_string_checks=1 ./t0090-cache-tree.sh triggers it reliably. Technically it is not deterministic because there is ~8% chance (it's 1-(255/256)^20, or ^32 for sha256) that the trailing checksum hash has a NUL byte in it. But we compute enough cache-trees in the course of that script that we are very likely to hit the problem in one of them. We can look at making strict_string_checks the default for ASan builds, but there are some other cases we'd want to fix first. Reported-by: correctmost <cmlists@sent.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysMakefile: turn on NO_MMAP when building with ASanJeff King2-1/+8
Git often uses mmap() to access on-disk files. This leaves a blind spot in our SANITIZE=address builds, since ASan does not seem to handle mmap at all. Nor does the OS notice most out-of-bounds access, since it tends to round up to the nearest page size (so depending on how big the map is, you might have to overrun it by up to 4095 bytes to trigger a segfault). The previous commit demonstrates a memory bug that we missed. We could have made a new test where the out-of-bounds access was much larger, or where the mapped file ended closer to a page boundary. But the point of running the test suite with sanitizers is to catch these problems without having to construct specific tests. Let's enable NO_MMAP for our ASan builds by default, which should give us better coverage. This does increase the memory usage of Git, since we're copying from the filesystem into heap. But the repositories in the test suite tend to be small, so the overhead isn't really noticeable (and ASan already has quite a performance penalty). There are a few other known bugs that this patch will help flush out. However, they aren't directly triggered in the test suite (yet). So it's safe to turn this on now without breaking the test suite, which will help us add new tests to demonstrate those other bugs as we fix them. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 dayspack-bitmap: handle name-hash lookups in incremental bitmapsJeff King1-4/+25
If a bitmap has a name-hash cache, it is an array of 32-bit integers, one per entry in the bitmap, which we've mmap'd from the .bitmap file. We access it directly like this: if (bitmap_git->hashes) hash = get_be32(bitmap_git->hashes + index_pos); That works for both regular pack bitmaps and for non-incremental midx bitmaps. There is one bitmap_index with one "hashes" array, and index_pos is within its bounds (we do the bounds-checking when we load the bitmap). But for an incremental midx bitmap, we have a linked list of bitmap_index structs, and each one has only its own small slice of the name-hash array. If index_pos refers to an object that is not in the first bitmap_git of the chain, then we'll access memory outside of the bounds of its "hashes" array, and often outside of the mmap. Instead, we should walk through the list until we find the bitmap_index which serves our index_pos, and use its hash (after adjusting index_pos to make it relative to the slice we found). This is exactly what we do elsewhere for incremental midx lookups (like the pack_pos_to_midx() call a few lines above). But we can't use existing helpers like midx_for_object() here, because we're walking through the chain of bitmap_index structs (each of which refers to a midx), not the chain of incremental multi_pack_index structs themselves. The problem is triggered in the test suite, but we don't get a segfault because the out-of-bounds index is too small. The OS typically rounds our mmap up to the nearest page size, so we just end up accessing some extra zero'd memory. Nor do we catch it with ASan, since it doesn't seem to instrument mmaps at all. But if we build with NO_MMAP, then our maps are replaced with heap allocations, which ASan does check. And so: make NO_MMAP=1 SANITIZE=address cd t ./t5334-incremental-multi-pack-index.sh does show the problem (and this patch makes it go away). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 dayscompat/mmap: mark unused argument in git_munmap()Jeff King1-1/+1
Our mmap compat code emulates mapping by using malloc/free. Our git_munmap() must take a "length" parameter to match the interface of munmap(), but we don't use it (it is up to the allocator to know how big the block is in free()). Let's mark it as UNUSED to avoid complaints from -Wunused-parameter. Otherwise you cannot build with "make DEVELOPER=1 NO_MMAP=1". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 daysci: bump actions/setup-go from 5 to 6Johannes Schindelin1-1/+1
Bumps actions/setup-go from 5 to 6. This upgrade includes dependency updates that incorporate a fix for a critical vulnerability. [Originally opened at https://github.com/git-for-windows/git/pull/5811] - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/v5...v6) Originally-authored-by: dependabot[bot] <support@github.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jiang Xin <worldhello.net@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
14 daysmingw: avoid the comma operatorJohannes Schindelin1-20/+28
The pattern `return errno = ..., -1;` is observed several times in `compat/mingw.c`. It has served us well over the years, but now clang starts complaining: compat/mingw.c:723:24: error: possible misuse of comma operator here [-Werror,-Wcomma] 723 | return errno = ENOSYS, -1; | ^ See for example this failing workflow run: https://github.com/git-for-windows/git-sdk-arm64/actions/runs/15457893907/job/43513458823#step:8:201 Let's appease clang (and also reduce the use of the no longer common comma operator). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
14 dayscmake: stop trying to build the reftable and xdiff librariesJohannes Schindelin1-13/+1
In the `en/make-libgit-a` topic branch, more precisely in the commits f3b4c89d59f1 (make: delete REFTABLE_LIB, add reftable to LIB_OBJS, 2025-10-02) and cf680cdb9543 (make: delete XDIFF_LIB, add xdiff to LIB_OBJS, 2025-10-02), the strategy to build three static libraries was rethought, and instead only one static library is now built. This is good. However, the CMake definition was not changed accordingly, and now CMake-based builds fail thusly: [...] Generating hook-list.h CMake Error at CMakeLists.txt:122 (string): string sub-command REPLACE requires at least four arguments. Call Stack (most recent call first): CMakeLists.txt:711 (parse_makefile_for_sources) CMake Error at CMakeLists.txt:122 (string): string sub-command REPLACE requires at least four arguments. Call Stack (most recent call first): CMakeLists.txt:717 (parse_makefile_for_sources) -- Configuring incomplete, errors occurred! Fix that by removing the parts that expect the reftable and xdiff objects to be defined separately in the Makefile, still. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
14 dayswincred: avoid memory corruptionDavid Macek1-1/+1
`wcsncpy_s()` wants to write the terminating null character so we need to allocate one more space for it in the target memory block. This should fix crashes when trying to read passwords. When this happened, the password/token wouldn't print out and Git would therefore ask for a new password every time. Signed-off-by: David Macek <david.macek.0@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
14 daysmerge-ort: fix failing merges in special corner caseElijah Newren2-1/+106
At GitHub, we had a repository that was triggering git: merge-ort.c:3032: process_renames: Assertion `newinfo && !newinfo->merged.clean` failed. during git replay. This sounds similar to the somewhat recent f6ecb603ff8a (merge-ort: fix directory rename on top of source of other rename/delete, 2025-08-06), but the cause is different. Unlike that case, there are no rename-to-self situations arising in this case, and new to this case it can only be triggered during a replay operation on the 2nd or later commit being replayed, never on the first merge in the sequence. To trigger, the repository needs: * an upstream which: * renames a file to a different directory, e.g. old/file -> new/file * leaves other files remaining in the original directory (so that e.g. "old/" still exists upstream even though file has been removed from it and placed elsewhere) * a topic branch being rebased where: * a commit in the sequence: * modifies old/file * a subsequent commit in the sequence being replayed: * does NOT touch *anything* under new/ * does NOT touch old/file * DOES modify other paths under old/ * does NOT have any relevant renames that we need to detect _anywhere_ elsewhere in the tree (meaning this interacts interestingly with both directory renames and cached renames) In such a case, the assertion will trigger. The fix turns out to be surprisingly simple. I have a very vague recollection that I actually considered whether to add such an if-check years ago when I added the very similar one for oldinfo in 1b6b902d95a5 (merge-ort: process_renames() now needs more defensiveness, 2021-01-19), but I think I couldn't figure out a possible way to trigger it and was worried at the time that if I didn't know how to trigger it then I wasn't so sure that simply skipping it was correct. Waiting did give me a chance to put more thorough tests and checks into place for the rename-to-self cases a few months back, which I might not have found as easily otherwise. Anyway, put the check in place now and add a test that demonstrates the fix. Note that this bug, as demonstrated by the conditions listed above, runs at the intersection of relevant renames, trivial directory resolutions, and cached renames. All three of those optimizations are ones that unfortunately make the code (and testcases!) a bit more complex, and threading all three makes it a bit more so. However, the testcase isn't crazy enough that I'd expect no one to ever hit it in practice, and was confused why we didn't see it before. After some digging, I discovered that merge.directoryRenames=false is a workaround to this bug, and GitHub used that setting until recently (it was a "temporary" match-what-libgit2-does piece of code that lasted years longer than intended). Since the conditions I gave above for triggering this bug rule out the possibility of there being directory renames, one might assume that it shouldn't matter whether you try to detect such renames if there aren't any. However, due to commit a16e8efe5c2b (merge-ort: fix merge.directoryRenames=false, 2025-03-13), the heavy hammer used there means that merge.directoryRenames=false ALSO turns off rename caching, which is critical to triggering the bug. This becomes a bit more than an aside since... Re-reading that old commit, a16e8efe5c2b (merge-ort: fix merge.directoryRenames=false, 2025-03-13), it appears that the solution to this latest bug might have been at least a partial alternative solution to that old commit. And it may have been an improved alternative (or at least help implement one), since it may be able to avoid the heavy-handed disabling of rename cache. That might be an interesting future thing to investigate, but is not critical for the current fix. However, since I spent time digging it all up, at least leave a small comment tweak breadcrumb to help some future reader (myself or others) who wants to dig further to connect the dots a little quicker. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
14 daysmerge-ort: remove debugging crudElijah Newren1-1/+1
While developing commit a16e8efe5c2b (merge-ort: fix merge.directoryRenames=false, 2025-03-13), I was testing things out and had an extra condition on one of the if-blocks that I occasionally swapped between '&& 0' and '&& 1' to see the effects of the changes. I forgot to remove it before submitting and it wasn't caught in review. Remove it now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
14 dayst6429: update comment to mention correct toolElijah Newren1-8/+7
A comment at the top of t6429 mentions why the test doesn't exercise git rebase or git cherry-pick. However, it claims that it uses `test-tool fast-rebase`. That was true when the comment was written, but commit f920b0289ba3 (replay: introduce new builtin, 2023-11-24) changed it to use git replay without updating this comment. We could potentially just strike this second comment, since git replay is a bona fide built-in, but perhaps the explanation about why it focuses on git replay is still useful. Update the comment to make it accurate again. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
14 dayswrapper: simplify xmkstemp()René Scharfe1-18/+1
Call xmkstemp_mode() instead of duplicating its error handling code. This switches the implementation from the system's mkstemp(3) to our own git_mkstemp_mode(), which works just as well. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17blame: make diff algorithm configurableAntonin Delpeuch6-21/+278
The diff algorithm used in 'git-blame(1)' is set to 'myers', without the possibility to change it aside from the `--minimal` option. There has been long-standing interest in changing the default diff algorithm to "histogram", and Git 3.0 was floated as a possible occasion for taking some steps towards that: https://lore.kernel.org/git/xmqqed873vgn.fsf@gitster.g/ As a preparation for this move, it is worth making sure that the diff algorithm is configurable where useful. Make it configurable in the `git-blame(1)` command by introducing the `--diff-algorithm` option and make honor the `diff.algorithm` config variable. Keep Myers diff as the default. Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASKAntonin Delpeuch3-3/+1
The XDF_DIFF_ALGORITHM_MASK bit mask only includes bits for the patience and histogram diffs, not for the minimal one. This means that when reseting the diff algorithm to the default one, one needs to separately clear the bit for the minimal diff. There are places in the code that fail to do that: merge-ort.c and builtin/merge-file.c. Add the XDF_NEED_MINIMAL bit to the bit mask, and remove the separate clearing of this bit in the places where it hasn't been forgotten. Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17Git 2.52v2.52.0maintJunio C Hamano2-4/+7
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-17Merge branch 'jc/ci-use-arm64-p4-on-macos'Junio C Hamano1-1/+1
We replaced deprecated macos-13 with macos-14 image in GitHub Actions CI, but we forgot that the image is for arm64. We have been seeing a lot of test failures ever since. Switch to arm64 binary for Perforce tests. * jc/ci-use-arm64-p4-on-macos: Use Perforce arm64 binary on macOS CI jobs
2025-11-16builtin/repo: fix table alignment for UTF-8 charactersJiang Xin2-4/+54
The output table from "git repo structure" is misaligned when displaying UTF-8 characters (e.g., non-ASCII glyphs). E.g.: | 仓库结构 | 值 | | -------------- | ---- | | * 引用 | | | * 计数 | 67 | The previous implementation used simple width formatting with printf() which didn't properly handle multi-byte UTF-8 characters, causing misaligned table columns when displaying repository structure information. This change modifies the stats_table_print_structure function to use strbuf_utf8_align() instead of basic printf width specifiers. This ensures proper column alignment regardless of the character encoding of the content being displayed. Also add test cases for strbuf_utf8_align(), a function newly introduced in "builtin/repo.c". Signed-off-by: Jiang Xin <worldhello.net@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-16t/unit-tests: add UTF-8 width tests for CJK charsJiang Xin3-0/+99
The file "builtin/repo.c" uses utf8_strwidth() to calculate the display width of UTF-8 characters in a table, but the resulting output is still misaligned. Add test cases for both utf8_strwidth and utf8_strnwidth to verify that they correctly compute the display width for UTF-8 characters. Also updated the build configuration in Makefile and meson.build to include the new test suite in the build process. Signed-off-by: Jiang Xin <worldhello.net@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-16Use Perforce arm64 binary on macOS CI jobsJunio C Hamano1-1/+1
The previous step replaced deprecated macos-13 image with macos-14 image on GitHub Actions CI. While x86-64 binaries can work there, because macos-14 images are arm64 based (we could replace it with macos-14-large that is x86-64), it makes more sense to use arm64 binary there. Without this change, we have been getting unusually higher rate of failures from random macOS CI jobs railing to run t98xx series of tests. Helped-by: Koji Nakamaru <koji.nakamaru@gree.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-16Merge tag 'l10n-2.52.0-v1' of https://github.com/git-l10n/git-poJunio C Hamano10-8985/+13309
l10n-2.52.0-v1 * tag 'l10n-2.52.0-v1' of https://github.com/git-l10n/git-po: l10n: zh_CN: updated translation for 2.52 l10n: uk: add 2.52 translation l10n: zh_TW.po: update Git 2.52 translation l10n: Updated translation for vi-2.52 l10n: tr: Update Turkish translations l10n: po-id for 2.52 l10n: ga.po: Update Irish translation for Git 2.52 l10n: bg.po: Updated Bulgarian translation (6065t) l10n: fr: version 2.52 l10n: sv.po: Update Swedish translation
2025-11-16l10n: zh_CN: updated translation for 2.52Teng Long1-297/+1401
Reviewed-by: 依云 <lilydjwg@gmail.com> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2025-11-15read-cache: drop submodule check from add_to_cache()Jeff King3-4/+2
In add_to_cache(), we treat any directories as submodules, and complain if we can't resolve their HEAD. This call to resolve_gitlink_ref() was added by f937bc2f86 (add: error appropriately on repository with no commits, 2019-04-09), with the goal of improving the error message for empty repositories. But we already resolve the submodule HEAD in index_path(), which is where we find the actual oid we're going to use. Resolving it again here introduces some downsides: 1. It's more work, since we have to open up the submodule repository's files twice. 2. There are call paths that get to index_path() without going through add_to_cache(). For instance, we'd want a similar informative message if "git diff empty" finds that it can't resolve the submodule's HEAD. (In theory we can also get there through update-index, but AFAICT it refuses to consider directories as submodules at all, and just complains about them). 3. The resolution in index_path() catches more errors that we don't handle here. In particular, it will validate that the object format for the submodule matches that of the superproject. This isn't a bug, since our call in add_to_cache() throws away the oid it gets without looking at it. But it certainly caused confusion for me when looking at where the object-format check should go. So instead of resolving the submodule HEAD in add_to_cache(), let's just teach the call in index_path() to actually produce an error message (which it already does for other cases). That's probably what f937bc2f86 should have done in the first place, and it gives us a single point of resolution when adding a submodule to the index. The resulting output is slightly more verbose, as we propagate the error up the call stack, but I think that's OK (and again, matches many other errors we get when indexing fails). I've left the text of the error message as-is, though it is perhaps overly specific. There are many reasons that resolving the submodule HEAD might fail, though outside of corruption or system errors it is probably most likely that the submodule HEAD is simply on an unborn branch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-16Merge branch '2.52-uk' of github.com:arkid15r/git-ukrainian-l10nJiang Xin1-231/+1161
* '2.52-uk' of github.com:arkid15r/git-ukrainian-l10n: l10n: uk: add 2.52 translation
2025-11-15object-file: disallow adding submodules of different hash algobrian m. carlson3-1/+55
The design of the hash algorithm transition plan is that objects stored must be entirely in one algorithm since we lack any way to indicate a mix of algorithms. This also includes submodules, but we have traditionally not enforced this, which leads to various problems when trying to clone or check out the the submodule from the remote. Since this cannot work in the general case, restrict adding a submodule of a different algorithm to the index. Add tests for git add and git submodule add that these are rejected. Note that we cannot check this in git fsck because the malformed submodule is stored in the tree as an object ID which is either truncated (when a SHA-256 submodule is added to a SHA-1 repository) or padded with zeros (when a SHA-1 submodule is added to a SHA-256 repository). We cannot detect even the latter case because someone could have an actual submodule that actually ends in 24 zeros, which would be a false positive. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-15l10n: uk: add 2.52 translationArkadii Yakovets1-231/+1161
Co-authored-by: Kate Golovanova <kate@kgthreads.com> Signed-off-by: Arkadii Yakovets <ark@cho.red> Signed-off-by: Kate Golovanova <kate@kgthreads.com>
2025-11-15Merge branch 'vi-2.52' of github.com:Nekosha/git-poJiang Xin1-242/+1140
* 'vi-2.52' of github.com:Nekosha/git-po: l10n: Updated translation for vi-2.52
2025-11-15Merge branch 'l10n/zh-TW/git-2-52' of github.com:l10n-tw/git-poJiang Xin1-415/+1568
* 'l10n/zh-TW/git-2-52' of github.com:l10n-tw/git-po: l10n: zh_TW.po: update Git 2.52 translation
2025-11-15Merge branch 'po-id' of github.com:bagasme/git-poJiang Xin1-283/+1420
* 'po-id' of github.com:bagasme/git-po: l10n: po-id for 2.52
2025-11-15Merge branch 'master' of github.com:alshopov/git-poJiang Xin1-230/+1177
* 'master' of github.com:alshopov/git-po: l10n: bg.po: Updated Bulgarian translation (6065t)
2025-11-15Merge branch 'fr_v2.52' of github.com:jnavila/gitJiang Xin1-228/+1208
* 'fr_v2.52' of github.com:jnavila/git: l10n: fr: version 2.52
2025-11-15Merge branch 'l10n-ga-2.52' of github.com:aindriu80/git-poJiang Xin1-6603/+1938
* 'l10n-ga-2.52' of github.com:aindriu80/git-po: l10n: ga.po: Update Irish translation for Git 2.52
2025-11-15Merge branch 'master' of github.com:nafmo/git-l10n-svJiang Xin1-235/+1151
* 'master' of github.com:nafmo/git-l10n-sv: l10n: sv.po: Update Swedish translation
2025-11-15l10n: zh_TW.po: update Git 2.52 translationYi-Jyun Pan1-415/+1568
Reviewed-by: hms5232 <hms5232@hhming.moe> Co-authored-by: Lumynous <lumynou5.tw@gmail.com> Signed-off-by: Yi-Jyun Pan <pan93412@gmail.com>
2025-11-15l10n: Updated translation for vi-2.52Vũ Tiến Hưng1-242/+1140
Signed-off-by: Vũ Tiến Hưng <newcomerminecraft@gmail.com>
2025-11-15l10n: tr: Update Turkish translationsEmir SARI1-221/+1145
Signed-off-by: Emir SARI <emir_sari@icloud.com>
2025-11-14doc: commit: link to git-status(1) on all format optionsKristoffer Haugsbakk1-4/+6
`--branch` and `--long` refer to git-status(1) options but they don’t tell us what `short-format` and `long-format` are, respectively. And `--null` mentions “status” but does not link to the command. Refer to git-config(1) on `--branch` like `--short` does. `long-format` is the git-status(1) output. So we can just say that directly. Replace “status” with a `linkgit` on `--null`. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-14osxkeychain: avoid incorrectly skipping store operationKoji Nakamaru3-30/+132
git-credential-osxkeychain skips storing a credential if its "get" action sets "state[]=osxkeychain:seen=1". This behavior was introduced in e1ab45b2 (osxkeychain: state to skip unnecessary store operations, 2024-05-15), which appeared in v2.46. However, this state[] persists even if a credential returned by "git-credential-osxkeychain get" is invalid and a subsequent helper's "get" operation returns a valid credential. Another subsequent helper (such as [1]) may expect git-credential-osxkeychain to store the valid credential, but the "store" operation is incorrectly skipped because it only checks "state[]=osxkeychain:seen=1". To solve this issue, "state[]=osxkeychain:seen" needs to contain enough information to identify whether the current "store" input matches the output from the previous "get" operation (and not a credential from another helper). Set "state[]=osxkeychain:seen" to a value encoding the credential output by "get", and compare it with a value encoding the credential input by "store". [1]: https://github.com/hickford/git-credential-oauth Reported-by: Petter Sælen <petter@saelen.eu> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Koji Nakamaru <koji.nakamaru@gree.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-14attr: enable incomplete-line whitespace error for this projectJunio C Hamano1-3/+3
Now "git diff --check" and "git apply --whitespace=warn/fix" learned incomplete line is a whitespace error, enable them for this project to prevent patches to add new incomplete lines to our source to both code and documentation files. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-13RelNotes: fix typo in release notes for 2.52.0Taylor Blau1-1/+1
Introduced via aea86cf00f (The nineteenth batch, 2025-10-14). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-13l10n: po-id for 2.52Bagas Sanjaya1-283/+1420
Update following components: - add-patch.c - builtin/bisect.c - builtin/describe.c - builtin/fast-export.c - builtin/fast-import.c - builtin/fetch.c - builtin/for-each-ref.c - builtin/gc.c - builtin/log.c - builtin/pack-refs.c - builtin/range-diff.c - builtin/reflog.c - builtin/refs.c - builtin/remote.c - builtin/repo.c - builtin/sparse-checkout.c - command-list.h - config.c - diff-lib.c - diff.c - gpg-interface.c - midx-write.c - promisor-remote.c - range-diff.c - refs.c - refs/files-backend.c - refs/reftable-backend.c - remote.c - usage.c - git-send-email.perl Translate following new components: - builtin/last-modified.c - http.h Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
2025-11-12diff: highlight and error out on incomplete linesJunio C Hamano2-6/+90
Teach "git diff" to highlight "\ No newline at end of file" message as a whitespace error when incomplete-line whitespace error class is in effect. Thanks to the previous refactoring of complete rewrite code path, we can do this at a single place. Unlike whitespace errors in the payload where we need to annotate in line, possibly using colors, the line that has whitespace problems, we have a dedicated line already that can serve as the error message, so paint it as a whitespace error message. Also teach "git diff --check" to notice incomplete lines as whitespace errors and report when incomplete-line whitespace error class is in effect. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12apply: check and fix incomplete linesJunio C Hamano3-1/+213
The final line of a file that lacks the terminating newline at its end is called an incomplete line. In general they are frowned upon for many reasons (imagine concatenating two files with "cat A B" and what happens when A ends in an incomplete line, for example), and text-oriented tools often mishandle such a line. Implement checks in "git apply" for incomplete lines, which is off by default for backward compatibility's sake, so that "git apply --whitespace={fix,warn,error}" can notice, warn against, and fix them. As one of the new test shows, if you modify contents on an incomplete line in the original and leave the resulting line incomplete, it is still considered a whitespace error, the reasoning being that "you'd better fix it while at it if you are making a change on an incomplete line anyway", which may controversial. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12whitespace: allocate a few more bits and define WS_INCOMPLETE_LINEJunio C Hamano5-12/+21
Reserve a few more bits in the diff flags word to be used for future whitespace rules. Add WS_INCOMPLETE_LINE without implementing the behaviour (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12apply: revamp the parsing of incomplete linesJunio C Hamano1-21/+49
A patch file represents the incomplete line at the end of the file with two lines, one that is the usual "context" with " " as the first letter, "added" with "+" as the first letter, or "removed" with "-" as the first letter that shows the content of the line, plus an extra "\ No newline at the end of file" line that comes immediately after it. Ever since the apply machinery was written, the "git apply" machinery parses "\ No newline at the end of file" line independently, without even knowing what line the incomplete-ness applies to, simply because it does not even remember what the previous line was. This poses a problem if we want to check and warn on an incomplete line. Revamp the code that parses a fragment, to actually drop the '\n' at the end of the incoming patch file that terminates a line, so that check_whitespace() calls made from the code path actually sees an incomplete as incomplete. Note that the result of this parsing is not directly used by the code path that applies the patch. apply_one_fragment() function already checks if each of the patch text it handles is followed by a line that begins with a backslash to drop the newline at the end of the current line it is looking at. In a sense, this patch harmonizes the behaviour of the parsing side to what is already done in the application side. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12diff: update the way rewrite diff handles incomplete linesJunio C Hamano1-15/+22
The diff_symbol based output framework uses one DIFF_SYMBOL_* enum value per the kind of output lines of "git diff", which corresponds to one output line from the xdiff machinery used internally. Most notably, DIFF_SYMBOL_PLUS and DIFF_SYMBOL_MINUS that correspond to "+" and "-" lines are designed to always take a complete line, even if the output from xdiff machinery may produce "\ No newline at the end of file" immediately after them. But this is not true in the rewrite-diff codepath, which completely bypasses the xdiff machinery. Since the code path feeds the bytes directly from the payload to the output routines, the output layer has to deal with an incomplete line with DIFF_SYMBOL_PLUS and DIFF_SYMBOL_MINUS, which never would see an incomplete line in the normal code paths. This lack of final newline is compensated by an ugly hack for a fabricated DIFF_SYMBOL_NO_LF_EOF token to inject an extra newline to the output to simulate output coming from the xdiff machinery. Revamp the way the complete-rewrite code path feeds the lines to the output layer by treating the last line of the pre/post image when it is an incomplete line specially. This lets us remove the DIFF_SYMBOL_NO_LF_EOF hack and use the usual DIFF_SYMBOL_CONTEXT_INCOMPLETE code path, which will later learn how to handle whitespace errors. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12diff: call emit_callback ecbdata everywhereJunio C Hamano1-6/+6
Everybody else, except for emit_rewrite_lines(), calls the emit_callback data ecbdata. Make sure we call the same thing by the same name for consistency. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12diff: refactor output of incomplete lineJunio C Hamano1-2/+12
Create a helper function that reacts to "\ No newline at the end of file" in preparation for unifying the incomplete line handling in the code path that handles xdiff output and the code path that bypasses xdiff and produces a complete-rewrite patch. Currently the output from the DIFF_SYMBOL_CONTEXT_INCOMPLETE case still (ab)uses the same code as what is used for context lines, but that would change in a later step where we introduce support to treat an incomplete line as a whitespace error. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12diff: keep track of the type of the last line seenJunio C Hamano1-0/+11
The "\ No newline at the end of the file" can come after any of the "-" (deleted preimage line), " " (unchanged line), or "+" (added postimage line). In later steps in this series, we will start treating a change that makes a file to end in an incomplete line as a whitespace error, and we would need to know what the previous line was when we react to "\ No newline" in the diff output. If the previous line was a context (i.e., unchanged) line, the file lacked the final newline before the change, and the change did not touch that line and left it still incomplete, so we do not want to warn in such a case. Teach fn_out_consume() function to keep track of what the previous line was, and prepare an otherwise empty switch statement to let us react differently to "\ No newline" based on that. Note that there is an existing curiosity (read: likely to be a bug) in the code that increments line number in the preimage file every time it sees a line with "\ No newline" on it, regardless of what the previous line was. I left it as-is, because it does not affect the main theme of this series, and more importantly, I do not think it matters, as these numbers are used only to compare them with blank_at_eof_in_{pre,post}image to issue a warning when we see more empty line was added at the end, but by definition, after we see "\ No newline at the end of the file" for an added line, we will not see an added line for the file. An independent audit to ensure that this curious increment can be safely removed would make a good #leftoverbits clean-up (we may even find some code that decrements this counter or over-increments the other quantity this counter is compared with that compensates the effect of this curious increment that hides a bug, in which case we may also need to remove them). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12diff: correct suppress_blank_empty hackJunio C Hamano1-16/+11
The suppress-blank-empty feature abused the CONTEXT_INCOMPLETE symbol that was meant to be used only for "\ No newline at the end of file" code path. The intent of the feature was to turn a context line we receive from xdiff machinery (which always uses ' ' for context lines, even an empty one) and spit it out as a truly empty line. Perform such a conversion very locally at where a line from xdiff that begins with ' ' is handled for output; there are many checks before the control reaches such place that checks the first letter of the diff output line to see if it is a context line, and having to check for '\n' and treat it as a special case is error prone. In order to catch similar hacks in the future, make sure the code path that is meant for "\ No newline" case checks the first byte is indeed a backslash. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12diff: emit_line_ws_markup() if/else style fixJunio C Hamano1-4/+4
Apply the simple rule: if you need {} in one arm of the if/else if/else... cascade, have {} in all of them. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12whitespace: correct bit assignment commentsJunio C Hamano3-16/+22
A comment in diff.c claimed that bits up to 12th (counting from 0th) are whitespace rules, and 13th thru 15th are for new/old/context, but it turns out it was miscounting. Correct them, and clarify where the whitespace rule bits come from in the comment. Extend bit assignment comments to cover bits used for color-moved, which weren't described. Also update the way these bit constants are defined to use (1 << N) notation, instead of octal constants, as it tends to make it easier to notice a breakage like this. Sprinkle a few blank lines between logically distinct groups of CPP macro definitions to make them easier to read. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12doc: add an explanation of Git's data modelJulia Evans4-2/+311
Git very often uses the terms "object", "reference", or "index" in its documentation. However, it's hard to find a clear explanation of these terms and how they relate to each other in the documentation. The closest candidates currently are: 1. `gitglossary`. This makes a good effort, but it's an alphabetically ordered dictionary and a dictionary is not a good way to learn concepts. You have to jump around too much and it's not possible to present the concepts in the order that they should be explained. 2. `gitcore-tutorial`. This explains how to use the "core" Git commands. This is a nice document to have, but it's not necessary to learn how `update-index` works to understand Git's data model, and we should not be requiring users to learn how to use the "plumbing" commands if they want to learn what the term "index" or "object" means. 3. `gitrepository-layout`. This is a great resource, but it includes a lot of information about configuration and internal implementation details which are not related to the data model. It also does not explain how commits work. The result of this is that Git users (even users who have been using Git for 15+ years) struggle to read the documentation because they don't know what the core terms mean, and it's not possible to add links to help them learn more. Add an explanation of Git's data model. Some choices I've made in deciding what "core data model" means: 1. Omit pseudorefs like `FETCH_HEAD`, because it's not clear to me if those are intended to be user facing or if they're more like internal implementation details. 2. Don't talk about submodules other than by mentioning how they relate to trees. This is because Git has a lot of special features, and explaining how they all work exhaustively could quickly go down a rabbit hole which would make this document less useful for understanding Git's core behaviour. 3. Don't discuss the structure of a commit message (first line, trailers etc). 4. Don't mention configuration. 5. Don't mention the `.git` directory, to avoid getting too much into implementation details Signed-off-by: Julia Evans <julia@jvns.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12Merge branch 'tc/last-modified-active-paths-optimization'Junio C Hamano3-16/+237
"git last-modified" was optimized by narrowing the set of paths to follow as it dug deeper in the history. * tc/last-modified-active-paths-optimization: last-modified: implement faster algorithm
2025-11-12attr: avoid recursion when expanding attribute macrosJeff King2-16/+54
Given a set of attribute macros like: [attr]a1 a2 [attr]a2 a3 ... [attr]a300000 -text file a1 expanding the attributes for "file" requires expanding "a1" to "a2", "a2" to "a3", and so on until hitting a non-macro expansion ("-text", in this case). We implement this via recursion: fill_one() calls macroexpand_one(), which then recurses back to fill_one(). As a result, very deep macro chains like the one above can run out of stack space and cause us to segfault. The required stack space is fairly small; I needed on the order of 200,000 entries to get a segfault on Linux. So it's unlikely anybody would hit this accidentally, leaving only malicious inputs. There you can easily construct a repo which will segfault on clone (we look at attributes during the checkout step, but you'd see the same trying to do other operations, like diff in a bare repo). It's mostly harmless, since anybody constructing such a repo is only preventing victims from cloning their evil garbage, but it could be a nuisance for hosting sites. One option to prevent this is to limit the depth of recursion we'll allow. This is conceptually easy to implement, but it raises other questions: what should the limit be, and do we need a configuration knob for it? The recursion here is simple enough that we can avoid those questions by just converting it to iteration instead. Rather than iterate over the states of a match_attr in fill_one(), we'll put them all in a queue, and the expansion of each can add to the queue rather than recursing. Note that this is a LIFO queue in order to keep the same depth-first order we did with the recursive implementation. I've avoided using the word "stack" in the code because the term is already heavily used to refer to the stack of .gitattribute files that matches the tree structure of the repository. The test uses a limited stack size so we can trigger the problem with a much smaller input than the one shown above. The value here (3000) is enough to trigger the issue on my x86_64 Linux machine. Reported-by: Ben Stav <benstav@miggo.io> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12Git 2.52-rc2v2.52.0-rc2Junio C Hamano2-1/+9
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-12Merge branch 'dk/make-git-contacts-executable'Junio C Hamano1-1/+1
Building "git contacts" script (in contrib/) left the resulting file unexecutable, which has been corrected. * dk/make-git-contacts-executable: perl: also mark git-contacts executable
2025-11-12Merge branch 'dk/meson-html-dir'Junio C Hamano7-13/+20
The build procedure based on meson learned to allow builders to specify the directory to install HTML documents. * dk/meson-html-dir: meson: make GIT_HTML_PATH configurable
2025-11-12Merge branch 'tu/credential-wincred-makefile-update'Junio C Hamano1-8/+10
Build procedure for Wincred credential helper has been updated. * tu/credential-wincred-makefile-update: wincred: align Makefile with other Makefiles in contrib
2025-11-11.gitattributes: remove misspelled no-op whitespace attributeJunio C Hamano1-1/+1
Ever since 14f9e128 (Define the project whitespace policy, 2008-02-10) added the whitespace rules to .gitattributes, we spelled the most general rule like so: * whitespace=!indent,trail,space in the top-level .gitattributes file. The intent of this line was described in the commit log message: - Unless otherwise specified, indent with SP that could be replaced with HT are not "bad". But SP before HT in the indent is "bad", and trailing whitespaces are "bad". It clearly wanted to disable indent-with-non-tab, so !indent is most likely a misspelt form of '-indent'. Because indent-with-non-tab has never been enabled by default, by luck this was not causing any ill effect. We could either remove "!indent", or spell it "-indent". The immediate effect would be the same. It would only start to make a difference when/if we enable indent-with-non-tab by default in future versions of Git. Let's take the former option to remove "!indent" from the list. We would feel the effect first-hand ourselves before anybody else if we ever decide to change the built-in default whitespace rules, which would be hidden from us if we decide to rewrite it to "-indent" instead. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10diff: disable rename detection with --quietRené Scharfe2-0/+12
Detecting renames and copies improves diff's output. This effort is wasted if we don't show any. Disable detection in that case. This actually fixes the error code when using the options --cached, --find-copies-harder, --no-ext-diff and --quiet together: run_diff_index() indirectly calls diff-lib.c::show_modified(), which queues even non-modified entries using diff_change() because we need them for copy detection. diff_change() sets flags.has_changes, though, which causes diff_can_quit_early() to declare we're done after seeing only the very first entry -- way too soon. Using --cached, --find-copies-harder and --quiet together without --no-ext-diff was not affected even before, as it causes the flag flags.diff_from_contents to be set, which disables the optimization in a different way. Reported-by: D. Ben Knoble <ben.knoble@gmail.com> Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10maintenance: add 'is-needed' subcommandKarthik Nayak3-17/+113
The 'git-maintenance(1)' command provides tooling to run maintenance tasks over Git repositories. The 'run' subcommand, as the name suggests, runs the maintenance tasks. When used with the '--auto' flag, it uses heuristics to determine if the required thresholds are met for running said maintenance tasks. There is however a lack of insight into these heuristics. Meaning, the checks are linked to the execution. Add a new 'is-needed' subcommand to 'git-maintenance(1)' which allows users to simply check if it is needed to run maintenance without performing it. This subcommand can check if it is needed to run maintenance without actually running it. Ideally it should be used with the '--auto' flag, which would allow users to check if the thresholds required are met. The subcommand also supports the '--task' flag which can be used to check specific maintenance tasks. While adding the respective tests in 't/t7900-maintenance.sh', remove a duplicate of the test: 'worktree-prune task with --auto honors maintenance.worktree-prune.auto'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10maintenance: add checking logic in `pack_refs_condition()`Karthik Nayak2-10/+21
The 'git-maintenance(1)' command supports an '--auto' flag. Usage of the flag ensures to run maintenance tasks only if certain thresholds are met. The heuristic is defined on a task level, wherein each task defines an 'auto_condition', which states if the task should be run. The 'pack-refs' task is hard-coded to return 1 as: 1. There was never a way to check if the reference backend needs to be optimized without actually performing the optimization. 2. We can pass in the '--auto' flag to 'git-pack-refs(1)' which would optimize based on heuristics. The previous commit added a `refs_optimize_required()` function, which can be used to check if a reference backend required optimization. Use this within `pack_refs_condition()`. This allows us to add a 'git maintenance is-needed' subcommand which can notify the user if maintenance is needed without actually performing the optimization. Without this change, the reference backend would always state that optimization is needed. Since we import 'revision.h', we need to remove the definition for 'SEEN' which is duplicated in the included header. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10refs: add a `optimize_required` field to `struct ref_storage_be`Karthik Nayak7-0/+82
To allow users of the refs namespace to check if the reference backend requires optimization, add a new field `optimize_required` field to `struct ref_storage_be`. This field is of type `optimize_required_fn` which is also introduced in this commit. Modify the debug, files, packed and reftable backend to implement this field. A following commit will expose this via 'git pack-refs' and 'git refs optimize'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10reftable/stack: add function to check if optimization is requiredKarthik Nayak3-7/+58
The reftable backend performs auto-compaction as part of its regular flow, which is required to keep the number of tables part of a stack at bay. This allows it to stay optimized. Compaction can also be triggered voluntarily by the user via the 'git pack-refs' or the 'git refs optimize' command. However, currently there is no way for the user to check if optimization is required without actually performing it. Extract out the heuristics logic from 'reftable_stack_auto_compact()' into an internal function 'update_segment_if_compaction_required()'. Then use this to add and expose `reftable_stack_compaction_required()` which will allow users to check if the reftable backend can be optimized. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10reftable/stack: return stack segments directlyKarthik Nayak1-11/+12
The `stack_table_sizes_for_compaction()` function returns individual sizes of each reftable table. This function is only called by `reftable_stack_auto_compact()` to decide which tables need to be compacted, if any. Modify the function to directly return the segments, which avoids the extra step of receiving the sizes only to pass it to `suggest_compaction_segment()`. A future commit will also add functionality for checking whether auto-compaction is necessary without performing it. This change allows code re-usability in that context. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10l10n: ga.po: Update Irish translation for Git 2.52Aindriú Mac Giolla Eoin1-6603/+1938
Refreshes the Irish translation for Git 2.52, including new strings and consistency improvements. Verified with `git-po-helper check`. Signed-off-by: Aindriú Mac Giolla Eoin <aindriu80@gmail.com>
2025-11-09l10n: bg.po: Updated Bulgarian translation (6065t)Alexander Shopov1-230/+1177
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
2025-11-09l10n: fr: version 2.52Jean-Noël Avila1-228/+1208
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
2025-11-07l10n: sv.po: Update Swedish translationPeter Krefting1-235/+1151
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
2025-11-06Merge branch 'dk/parseopt-optional-filename-fixes'Junio C Hamano3-7/+5
A recently added configuration variable and command line option syntax ":(optional)" for values that are of filename type inconsistently behaved on an empty file (configuration took it happily, while the command line option pretended as if it did not exist), which has been corrected. * dk/parseopt-optional-filename-fixes: parseopt: remove unreachable code parseopt: restore const qualifier to parsed filename config: use boolean type for a simple flag parseopt: use boolean type for a simple flag doc: clarify command equivalence comment parseopt: fix :(optional) at command line to only ignore missing files
2025-11-06Merge branch 'cc/fast-import-export-i18n-cleanup'Junio C Hamano5-194/+195
Messages from fast-import/export are now marked for i18n. * cc/fast-import-export-i18n-cleanup: gpg-interface: mark a string for translation fast-import: mark strings for translation fast-export: mark strings for translation gpg-interface: use left shift to define GPG_VERIFY_* gpg-interface: simplify ssh fingerprint parsing
2025-11-06Merge branch 'js/ci-github-actions-update'Junio C Hamano1-10/+10
CI updates. * js/ci-github-actions-update: ci: update {download,upload}-artifact Action versions
2025-11-06Merge branch 'pk/reflog-migrate-message-fix'Junio C Hamano2-2/+2
Message fix. * pk/reflog-migrate-message-fix: refs: add missing space in messages
2025-11-06object: fix performance regression when peeling tagsPatrick Steinhardt5-11/+11
Our Bencher dashboards [1] have recently alerted us about a bunch of performance regressions when writing references, specifically with the reftable backend. There is a 3x regression when writing many refs with preexisting refs in the reftable format, and a 10x regression when migrating refs between backends in either of the formats. Bisecting the issue lands us at 6ec4c0b45b (refs: don't store peeled object IDs for invalid tags, 2025-10-23). The gist of the commit is that we may end up storing peeled objects in both reftables and packed-refs for corrupted tags, where the claimed tagged object type is different than the actual tagged object type. This will then cause us to create the `struct object *` with a wrong type, as well, and obviously nothing good comes out of that. The fix for this issue was to introduce a new flag to `peel_object()` that causes us to verify the tagged object's type before writing it into the refdb -- if the tag is corrupt, we skip writing the peeled value. To verify whether the peeled value is correct we have to look up the object type via the ODB and compare the actual type with the claimed type, and that additional object lookup is costly. This also explains why we see the regression only when writing refs with the reftable backend, but we see the regression with both backends when migrating refs: - The reftable backend knows to store peeled values in the new table immediately, so it has to try and peel each ref it's about to write to the transaction. So the performance regression is visible for all writes. - The files backend only stores peeled values when writing the packed-refs file, so it wouldn't hit the performance regression for normal writes. But on ref migrations we know to write all new values into the packed-refs file immediately, and that's why we see the regression for both backends there. Taking a step back though reveals an oddity in the new verification logic: we not only verify the _tagged_ object's type, but we also verify the type of the tag itself. But this isn't really needed, as we wouldn't hit the bug in such a case anyway, as we only hit the issue with corrupt tags claiming an invalid type for the tagged object. The consequence of this is that we now started to look up the target object of every single reference we're about to write, regardless of whether it even is a tag or not. And that is of course quite costly. Fix the issue by only verifying the type of the tagged objects. This means that we of course still have a performance hit for actual tags. But this only happens for writes anyway, and I'd claim it's preferable to not store corrupted data in the refdb than to be fast here. Rename the flag accordingly to clarify that we only verify the tagged object's type. This fix brings performance back to previous levels: Benchmark 1: baseline Time (mean ± σ): 46.0 ms ± 0.4 ms [User: 40.0 ms, System: 5.7 ms] Range (min … max): 45.0 ms … 47.1 ms 54 runs Benchmark 2: regression Time (mean ± σ): 140.2 ms ± 1.3 ms [User: 77.5 ms, System: 60.5 ms] Range (min … max): 138.0 ms … 142.7 ms 20 runs Benchmark 3: fix Time (mean ± σ): 46.2 ms ± 0.4 ms [User: 40.2 ms, System: 5.7 ms] Range (min … max): 45.0 ms … 47.3 ms 55 runs Summary update-ref: baseline 1.00 ± 0.01 times faster than fix 3.05 ± 0.04 times faster than regression [1]: https://bencher.dev/perf/git/plots Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06Merge branch 'ps/ref-peeled-tags' into ps/ref-peeled-tags-fixesJunio C Hamano67-852/+825
* ps/ref-peeled-tags: t7004: do not chdir around in the main process ref-filter: fix stale parsed objects ref-filter: parse objects on demand ref-filter: detect broken tags when dereferencing them refs: don't store peeled object IDs for invalid tags object: add flag to `peel_object()` to verify object type refs: drop infrastructure to peel via iterators refs: drop `current_ref_iter` hack builtin/show-ref: convert to use `reference_get_peeled_oid()` ref-filter: propagate peeled object ID upload-pack: convert to use `reference_get_peeled_oid()` refs: expose peeled object ID via the iterator refs: refactor reference status flags refs: fully reset `struct ref_iterator::ref` on iteration refs: introduce `.ref` field for the base iterator refs: introduce wrapper struct for `each_ref_fn`
2025-11-06ci: update {download,upload}-artifact Action versionsJohannes Schindelin1-10/+10
Bumps `actions/upload-artifact` from 4 to 5. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4...v5) Bumps `actions/download-artifact` from 5 to 6. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/v5...v6) Originally-authored-by: dependabot[bot] <support@github.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06gitk: add external diff file rename detectionTobias Boesch1-2/+38
If a file is renamed between commits and an external diff is started through gitk on the original or the renamed file name, gitk is unable to open the renamed file in the external diff editor. It fails to fetch the renamed file from git, because it fetches it using its original path in contrast to using the renamed path of the file. Detect the rename and open the external diff with the original and the renamed file instead of no file (fetch the renamed file path and name from git) no matter if the original or the renamed file is selected in gitk. Signed-off-by: Tobias Boesch <tobias.boesch@miele.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2025-11-06meson: make GIT_HTML_PATH configurableD. Ben Knoble7-13/+20
Makefile-based builds can configure Git's internal HTML_PATH by defining htmldir, which is useful for packagers that put documentation in different locations. Gentoo, for example, uses version-suffixed directories like ${prefix}/share/doc/git-2.51 and puts the HTML documentation in an 'html' subdirectory of the same. Propagate the same configuration knob to Meson-based builds so that "git --html-path" on such systems can be configured to output the correct directory. Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06perl: also mark git-contacts executableD. Ben Knoble1-1/+1
When installing git-contacts with Meson via -Dcontrib=contacts, the default Perl generation fails to mark it executable. As a result, "git contacts" reports "'contacts' is not a git command." Unlike generate-script.sh, we aren't testing the basename here; so, glob the script name in the case arm to match wherever the input comes from. Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06wincred: align Makefile with other Makefiles in contribThomas Uhle1-8/+10
* Replace $(LOADLIBES) because it is deprecated since long and it is used nowhere else in the git project. * Use $(gitexecdir) instead of $(libexecdir) because config.mak defines $(libexecdir) as $(prefix)/libexec, not as $(prefix)/libexec/git-core. * Similar to other Makefiles, let install target rule create $(gitexecdir) to make sure the directory exists before copying the executable and also let it respect $(DESTDIR). * Shuffle the lines for the default settings to align them with the other Makefiles in contrib/credential. * Define .PHONY for all special targets (all, install, clean). Signed-off-by: Thomas Uhle <thomas.uhle@mailbox.tu-dresden.de> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06doc: clarify server behavior for invalid 'want' lines in HTTP protocolQueen Ediri Jessa1-1/+2
Update the documentation to clearly describe how the server responds when a client sends an invalid or malformed `want` line during the HTTP protocol exchange. The server includes the offending object name in its error message. Signed-off-by: Queen Ediri Jessa <qjessa662@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06gitk: show unescaped file names on 'rename' and 'copy' linesJohannes Sixt1-0/+8
When a file is selected in the file list, the diff window scrolls to the corresponding section. The administrative data needed for this purpose is extracted from the 'rename from', 'rename to', and 'copy to' lines. Escaped file names are unescaped for this purpose. However, the lines shown in the diff window are left in the escaped form. This is not very pleasing. Replace the escaped form by the unescaped form. Add a section to treat the 'copy from' case. Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2025-11-06gitk: fix a 'continue' statement outside a loop to 'return'Johannes Sixt1-1/+1
When 5de460a2cfdd (gitk: Refactor per-line part of getblobdiffline and its support) moved the body of a loop into a separate function, several 'continue' statements were changed to 'return'. But one instance was missed. Fix it now. Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2025-11-05refs: add missing space in messagesPeter Krefting2-2/+2
Signed-off-by: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05Git 2.52-rc1v2.52.0-rc1Junio C Hamano2-1/+8
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05Merge branch 'jc/ci-use-macos-14'Junio C Hamano1-4/+4
The version of macos image used in GitHub CI has been updated to macos-14, as the macos-13 that we have been using got deprecated. * jc/ci-use-macos-14: GitHub CI: macos-13 images are no more
2025-11-05Merge branch 'rz/t0450-bisect-doc-update'Junio C Hamano3-26/+39
The help text and manual page of "git bisect" command have been made consistent with each other. * rz/t0450-bisect-doc-update: bisect: update usage and docs to match each other
2025-11-05replay: add replay.refAction config optionSiddharth Asthana4-4/+79
Add a configuration variable to control the default behavior of git replay for updating references. This allows users who prefer the traditional pipeline output to set it once in their config instead of passing --ref-action=print with every command. The config variable uses string values that mirror the behavior modes: * replay.refAction = update (default): atomic ref updates * replay.refAction = print: output commands for pipeline Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Christian Couder <christian.couder@gmail.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05replay: make atomic ref updates the default behaviorSiddharth Asthana3-40/+199
The git replay command currently outputs update commands that can be piped to update-ref to achieve a rebase, e.g. git replay --onto main topic1..topic2 | git update-ref --stdin This separation had advantages for three special cases: * it made testing easy (when state isn't modified from one step to the next, you don't need to make temporary branches or have undo commands, or try to track the changes) * it provided a natural can-it-rebase-cleanly (and what would it rebase to) capability without automatically updating refs, similar to a --dry-run * it provided a natural low-level tool for the suite of hash-object, mktree, commit-tree, mktag, merge-tree, and update-ref, allowing users to have another building block for experimentation and making new tools However, it should be noted that all three of these are somewhat special cases; users, whether on the client or server side, would almost certainly find it more ergonomic to simply have the updating of refs be the default. For server-side operations in particular, the pipeline architecture creates process coordination overhead. Server implementations that need to perform rebases atomically must maintain additional code to: 1. Spawn and manage a pipeline between git-replay and git-update-ref 2. Coordinate stdout/stderr streams across the pipe boundary 3. Handle partial failure states if the pipeline breaks mid-execution 4. Parse and validate the update-ref command output Change the default behavior to update refs directly, and atomically (at least to the extent supported by the refs backend in use). This eliminates the process coordination overhead for the common case. For users needing the traditional pipeline workflow, add a new --ref-action=<mode> option that preserves the original behavior: git replay --ref-action=print --onto main topic1..topic2 | git update-ref --stdin The mode can be: * update (default): Update refs directly using an atomic transaction * print: Output update-ref commands for pipeline use Test suite changes: All existing tests that expected command output now use --ref-action=print to preserve their original behavior. This keeps the tests valid while allowing them to verify that the pipeline workflow still works correctly. New tests were added to verify: - Default atomic behavior (no output, refs updated directly) - Bare repository support (server-side use case) - Equivalence between traditional pipeline and atomic updates - Real atomicity using a lock file to verify all-or-nothing guarantee - Test isolation using test_when_finished to clean up state - Reflog messages include replay mode and target A following commit will add a replay.refAction configuration option for users who prefer the traditional pipeline output as their default behavior. Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Christian Couder <christian.couder@gmail.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05replay: use die_for_incompatible_opt2() for option validationSiddharth Asthana1-3/+3
In preparation for adding the --ref-action option, convert option validation to use die_for_incompatible_opt2(). This helper provides standardized error messages for mutually exclusive options. The following commit introduces --ref-action which will be incompatible with certain other options. Using die_for_incompatible_opt2() now means that commit can cleanly add its validation using the same pattern, keeping the validation logic consistent and maintainable. This also aligns git-replay's option handling with how other Git commands manage option conflicts, using the established die_for_incompatible_opt*() helper family. Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04GitHub CI: macos-13 images are no moreJunio C Hamano1-4/+4
As this image was deprecated on Sep 22nd, and will be dropped on Dec 4th, replace these jobs to use macos-14 images instead. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04parseopt: remove unreachable codeJunio C Hamano1-2/+0
At this point in the code after running skip_prefix() on the variable and receiving the result in the same variable, the contents of the variable can never be NULL. The function either (1) updates the variable to point at a later part of the string it originally pointed at, or (2) leaves it intact if the string does not have the prefix. (1) will never make the variable NULL, and (2) cannot be the source of NULL, because the variable cannot be NULL before calling skip_prefix(), which would die immediately by dereferencing the NULL pointer in that case. Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04parseopt: restore const qualifier to parsed filenameD. Ben Knoble1-1/+1
This was unintentionally dropped in ccfcaf399f (parseopt: values of pathname type can be prefixed with :(optional), 2025-09-28). Notably, continue dropping the const qualifier when free'ing value; see 4049b9cfc0 (fix const issues with some functions, 2007-10-16) or 83838d5c1b (cast variable in call to free() in builtin/diff.c and submodule.c, 2011-11-06) for more details on why. Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04config: use boolean type for a simple flagD. Ben Knoble1-1/+1
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04parseopt: use boolean type for a simple flagD. Ben Knoble1-2/+2
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04doc: clarify command equivalence commentD. Ben Knoble1-1/+1
Documentation of command parsing for :(optional) includes a terse comment; expand it to be clearer to readers. Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04parseopt: fix :(optional) at command line to only ignore missing filesD. Ben Knoble1-1/+1
Unlike the configuration option magic, the parseopt code also ignores empty files: compare implementations from ccfcaf399f (parseopt: values of pathname type can be prefixed with :(optional), 2025-09-28) and 749d6d166d (config: values of pathname type can be prefixed with :(optional), 2025-09-28). Unify the 2 by not ignoring empty files, which is less surprising and the intended semantics from the first patch for config. Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04A bit more before rc1Junio C Hamano1-0/+25
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04Merge branch 'jk/doc-backslash-in-exclude'Junio C Hamano2-0/+7
The patterns used in the .gitignore files use backslash in the way documented for fnmatch(3); document as such to reduce confusion. * jk/doc-backslash-in-exclude: doc: document backslash in gitignore patterns
2025-11-04Merge branch 'jk/test-delete-gpgsig-leakfix'Junio C Hamano1-3/+4
Leakfix. * jk/test-delete-gpgsig-leakfix: test-tool: fix leak in delete-gpgsig command
2025-11-04Merge branch 'eb/t1016-hash-transition-fix'Junio C Hamano2-1/+7
Test fix. * eb/t1016-hash-transition-fix: t1016-compatObjectFormat: really freeze time for reproduciblity
2025-11-04Merge branch 'kh/doc-checkout-markup-fix'Junio C Hamano1-2/+2
Doc mark-up fix. * kh/doc-checkout-markup-fix: doc: git-checkout: fix placeholder markup
2025-11-04Merge branch 'xr/ref-debug-remove-on-disk'Junio C Hamano1-0/+9
The "debug" ref-backend was missing a method implementation, which has been corrected. * xr/ref-debug-remove-on-disk: refs: add missing remove_on_disk implementation for debug backend
2025-11-04Merge branch 'qj/doc-my1stcontrib-email-verify'Junio C Hamano1-0/+5
The "MyFirstContribution" tutorial tells the reader how to send out their patches; the section gained a hint to verify the message reached the mailing list. * qj/doc-my1stcontrib-email-verify: MyFirstContribution: add note on confirming patches
2025-11-04Merge branch 'tz/test-prepare-gnupghome'Junio C Hamano1-0/+1
Tests did not set up GNUPGHOME correctly, which is fixed but some flaky tests are exposed in t1016, which needs to be addressed before this topic can move forward. * tz/test-prepare-gnupghome: t/lib-gpg: call prepare_gnupghome() in GPG2 prereq t/lib-gpg: add prepare_gnupghome() to create GNUPGHOME dir
2025-11-04Merge branch 'jt/repo-structure'Junio C Hamano6-6/+542
"git repo structure", a new command. * jt/repo-structure: builtin/repo: add progress meter for structure stats builtin/repo: add keyvalue and nul format for structure stats builtin/repo: add object counts in structure output builtin/repo: introduce structure subcommand ref-filter: export ref_kind_from_refname() ref-filter: allow NULL filter pattern builtin/repo: rename repo_info() to cmd_repo_info()
2025-11-04Merge branch 'tu/credential-install'Junio C Hamano2-2/+12
Contributed credential helpers (obviously in contrib/) now have "cd $there && make install" target. * tu/credential-install: contrib/credential: add install target
2025-11-04Merge branch 'cc/doc-submitting-patches-with-ai'Junio C Hamano1-0/+28
AI guidelines. * cc/doc-submitting-patches-with-ai: SubmittingPatches: add section about AI
2025-11-04Merge branch 'kn/refs-optim-cleanup' into kn/maintenance-is-neededJunio C Hamano11-72/+42
* kn/refs-optim-cleanup: t/pack-refs-tests: move the 'test_done' to callees refs: rename 'pack_refs_opts' to 'refs_optimize_opts' refs: move to using the '.optimize' functions
2025-11-04Merge branch 'ps/ref-peeled-tags' into kn/maintenance-is-neededJunio C Hamano70-852/+1361
* ps/ref-peeled-tags: (23 commits) t7004: do not chdir around in the main process ref-filter: fix stale parsed objects ref-filter: parse objects on demand ref-filter: detect broken tags when dereferencing them refs: don't store peeled object IDs for invalid tags object: add flag to `peel_object()` to verify object type refs: drop infrastructure to peel via iterators refs: drop `current_ref_iter` hack builtin/show-ref: convert to use `reference_get_peeled_oid()` ref-filter: propagate peeled object ID upload-pack: convert to use `reference_get_peeled_oid()` refs: expose peeled object ID via the iterator refs: refactor reference status flags refs: fully reset `struct ref_iterator::ref` on iteration refs: introduce `.ref` field for the base iterator refs: introduce wrapper struct for `each_ref_fn` builtin/repo: add progress meter for structure stats builtin/repo: add keyvalue and nul format for structure stats builtin/repo: add object counts in structure output builtin/repo: introduce structure subcommand ...
2025-11-04t/pack-refs-tests: move the 'test_done' to calleesKarthik Nayak3-2/+4
In ac0bad0af4 (t0601: refactor tests to be shareable, 2025-09-19), we refactored 't/t0601-reffiles-pack-refs.sh' to move all of the tests to 't/pack-refs-tests.sh', which became a common test suite which was also used by 't/t1463-refs-optimize.sh'. This also moved the 'test_done' directive to 't/pack-refs-tests.sh'. Which inhibits additional tests from being added to either of the tests. Let's move the directive out to both the tests, so that we can add additional specific tests to them. Also the test flow logic shouldn't be part of tests which can be embedded in other test scripts. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: rename 'pack_refs_opts' to 'refs_optimize_opts'Karthik Nayak8-30/+30
The previous commit removed all references to 'pack_refs()' within the refs subsystem. Continue this cleanup by also renaming 'pack_refs_opts' to 'refs_optimize_opts' and the respective flags accordingly. Keeping the naming consistent will make the code easier to maintain. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: move to using the '.optimize' functionsKarthik Nayak7-44/+12
The `struct ref_store` variable exposes two ways to optimize a reftable backend: 1. pack_refs 2. optimize The former was specific to the 'files' + 'packed' refs backend. The latter is more generic and covers all backends. While the naming is different, both of these functions perform the same functionality. Consolidate this code to only maintain the 'optimize' functions. Do this by modifying the backends so that they exclusively implement the `optimize` callback, only. All users of the refs subsystem already use the 'optimize' function so there is no changes needed on the callee side. Finally, cleanup all references to the 'pack_refs' field of the structure and code around it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04Merge branch 'ps/ref-peeled-tags' into kn/refs-optim-cleanupJunio C Hamano95-2296/+3555
* ps/ref-peeled-tags: (92 commits) t7004: do not chdir around in the main process ref-filter: fix stale parsed objects ref-filter: parse objects on demand ref-filter: detect broken tags when dereferencing them refs: don't store peeled object IDs for invalid tags object: add flag to `peel_object()` to verify object type refs: drop infrastructure to peel via iterators refs: drop `current_ref_iter` hack builtin/show-ref: convert to use `reference_get_peeled_oid()` ref-filter: propagate peeled object ID upload-pack: convert to use `reference_get_peeled_oid()` refs: expose peeled object ID via the iterator refs: refactor reference status flags refs: fully reset `struct ref_iterator::ref` on iteration refs: introduce `.ref` field for the base iterator refs: introduce wrapper struct for `each_ref_fn` builtin/repo: add progress meter for structure stats builtin/repo: add keyvalue and nul format for structure stats builtin/repo: add object counts in structure output builtin/repo: introduce structure subcommand ...
2025-11-04t7004: do not chdir around in the main processJunio C Hamano1-18/+20
Move down to no-contains subdirectory inside a subshell, just like the previous step that created and used it does. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04ref-filter: fix stale parsed objectsPatrick Steinhardt2-0/+22
In 054f5f457e (ref-filter: parse objects on demand, 2025-10-23) we have started to skip parsing some objects in case we don't need to access their values in the first place. This was done by introducing a new member `struct expand_data::maybe_object` that gets populated on demand via `get_or_parse_object()`. This has led to a regression though where the object now gets reused because we don't reset it properly. The `oi` structure is declared in global scope, and there is no single place where we reset it before invoking `get_object()`. The consequence is that the `maybe_object` member doesn't get reset across calls, so subsequent calls will end up reusing the same object. This is only an issue for a subset of retrieved values, as not all of the infrastructure ends up calling `get_or_parse_object()`. So the effect is limited, which is probably why the issue wasn't detected earlier. Fix the issue by resetting `maybe_object` in `get_object()`. Reported-by: Junio C Hamano <gitster@pobox.com> Based-on-patch-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04ref-filter: parse objects on demandPatrick Steinhardt1-36/+106
When formatting an arbitrary object we parse that object regardless of whether or not we actually need any parsed data. In fact, many of the atoms we have don't require any. Refactor the code so that we parse the data on demand when we see an atom that wants to access the objects. This leads to a small speedup, for example in the Chromium repository with around 40000 refs: Benchmark 1: for-each-ref --format='%(raw)' (HEAD~) Time (mean ± σ): 388.7 ms ± 1.1 ms [User: 322.2 ms, System: 65.0 ms] Range (min … max): 387.3 ms … 390.8 ms 10 runs Benchmark 2: for-each-ref --format='%(raw)' (HEAD) Time (mean ± σ): 344.7 ms ± 0.7 ms [User: 287.8 ms, System: 55.1 ms] Range (min … max): 343.9 ms … 345.7 ms 10 runs Summary for-each-ref --format='%(raw)' (HEAD) ran 1.13 ± 0.00 times faster than for-each-ref --format='%(raw)' (HEAD~) With this change, we now spend ~90% of the time decompressing objects, which is almost as good as it gets regarding git-for-each-ref(1)'s own infrastructure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04ref-filter: detect broken tags when dereferencing themPatrick Steinhardt2-2/+5
Users can ask git-for-each-ref(1) to peel tags and return information of the tagged object by adding an asterisk to the format, like for example "%(*$objectname)". If so, git-for-each-ref(1) peels that object to the first non-tag object and then returns its values. As mentioned in preceding commits, it can happen that the tagged object type and the claimed object type differ, effectively resulting in a corrupt tag. git-for-each-ref(1) would notice this mismatch, print an error and then bail out when trying to peel the tag. But we only notice this corruption in some very specific edge cases! While we have a test in "t/for-each-ref-tests.sh" that verifies the above scenario, this test is specifically crafted to detect the issue at hand. Namely, we create two tags: - One tag points to a specific object with the correct type. - The other tag points to the *same* object with a different type. The fact that both tags point to the same object is important here: `peel_object()` wouldn't notice the corruption if the tagged objects were different. The root cause is that `peel_object()` calls `lookup_${type}()` eventually, where the type is the same type declared in the tag object. Consequently, when we have two tags pointing to the same object but with different declared types we'll call two different lookup functions. The first lookup will store the object with an unverified type A, whereas the second lookup will try to look up the object with a different unverified type B. And it is only now that we notice the discrepancy in object types, even though type A could've already been the wrong type. Fix the issue by verifying the object type in `populate_value()`. With this change we'll also notice type mismatches when only dereferencing a tag once. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: don't store peeled object IDs for invalid tagsPatrick Steinhardt4-2/+63
Both the "files" and "reftable" backend store peeled object IDs for references that point to tags: - The "files" backend stores the value when packing refs, where each peeled object ID is prefixed with "^". - The "reftable" backend stores the value whenever writing a new reference that points to a tag via a special ref record type. Both of these backends use `peel_object()` to find the peeled object ID. But as explained in the preceding commit, that function does not detect the case where the tag's tagged object and its claimed type mismatch. The consequence of storing these bogus peeled object IDs is that we're less likely to detect such corruption in other parts of Git. git-for-each-ref(1) for example does not notice anymore that the tag is broken when using "--format=%(*objectname)" to dereference tags. One could claim that this is good, because it still allows us to mostly use the tag as intended. But the biggest problem here is that we now have different behaviour for such a broken tag depending on whether or not we have its peeled value in the refdb. Fix the issue by verifying the object type when peeling the object. If that verification fails we simply skip storing the peeled value in either of the reference formats. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04object: add flag to `peel_object()` to verify object typePatrick Steinhardt9-25/+38
When peeling a tag to a non-tag object we repeatedly call `parse_object()` on the tagged object until we find the first object that isn't a tag. While this feels sensible at first, there is a big catch here: `parse_object()` doesn't actually verify the type of the tagged object. The relevant code path here eventually ends up in `parse_tag_buffer()`. Here, we parse the various fields of the tag, including the "type". Once we've figured out the type and the tagged object ID, we call one of the `lookup_${type}()` functions for whatever type we have found. There is two possible outcomes in the successful case: 1. The object is already part of our cached objects. In that case we double-check whether the type we're trying to look up matches the type that was cached. 2. The object is _not_ part of our cached objects. In that case, we simply create a new object with the expected type, but we don't parse that object. In the first case we might notice type mismatches, but only in the case where our cache has the object with the correct type. In the second case, we'll blindly assume that the type is correct and then go with it. We'll only notice that the type might be wrong when we try to parse the object at a later point. Now arguably, we could change `parse_tag_buffer()` to verify the tagged object's type for us. But that would have the effect that such a tag cannot be parsed at all anymore, and we have a small bunch of tests for exactly this case that assert we still can open such tags. So this change does not feel like something we can retroactively tighten, even though one shouldn't ever hit such corrupted tags. Instead, add a new `flags` field to `peel_object()` that allows the caller to opt in to strict object verification. This will be wired up at a subset of callsites over the next few commits. Note that this change also inlines `deref_tag_noverify()`. There's only been two callsites of that function, the one we're changing and one in our test helpers. The latter callsite can trivially use `deref_tag()` instead, so by inlining the function we avoid having to pass down the flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: drop infrastructure to peel via iteratorsPatrick Steinhardt8-141/+1
Now that the peeled object ID gets propagated via the `struct reference` there is no need anymore to call into the reference iterator itself to dereference an object. Remove this infrastructure. Most of the changes are straight-forward deletions of code. There is one exception though in `refs/packed-backend.c::write_with_updates()`. Here we stop peeling the iterator and instead just pass the peeled object ID of that iterator directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: drop `current_ref_iter` hackPatrick Steinhardt3-28/+0
In preceding commits we have refactored all callers of `peel_iterated_oid()` to instead use `reference_get_peeled_oid()`. This allows us to thus get rid of the former function. Getting rid of that function is nice, but even nicer is that this also allows us to get rid of the `current_ref_iter` hack. This global variable tracked the currently-active ref iterator so that we can use it to peel an object ID. Now that the peeled object ID is propagated via `struct reference` though we don't have to depend on this hack anymore, which makes for a more robust and easier-to-understand infrastructure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04builtin/show-ref: convert to use `reference_get_peeled_oid()`Patrick Steinhardt1-13/+19
The git-show-ref(1) command has multiple different modes: - It knows to show all references matching a pattern. - It knows to list all references that are an exact match to whatever the user has provided. - It knows to check for reference existence. The first two commands use mostly the same infrastructure to print the references via `show_one()`. But while the former mode uses a proper iterator and thus has a `struct reference` available in its context, the latter calls `refs_read_ref()` and thus doesn't. Consequently, we cannot easily use `reference_get_peeled_oid()` to print the peeled value. Adapt the code so that we manually construct a `struct reference` when verifying refs. We wouldn't ever have the peeled value available anyway as we're not using an iterator here, so we can simply plug in the values we _do_ have. With this change we now have a `struct reference` available at both callsites of `show_one()` and can thus pass it, which allows us to use `reference_get_peeled_oid()` instead of `peel_iterated_oid()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04ref-filter: propagate peeled object IDPatrick Steinhardt5-32/+45
When queueing a reference in the "ref-filter" subsystem we end up creating a new ref array item that contains the reference's info. One bit of info that we always discard though is the peeled object ID, and because of that we are forced to use `peel_iterated_oid()`. Refactor the code to propagate the peeled object ID via the ref array, if available. This allows us to manually peel tags without having to go through the object database. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04upload-pack: convert to use `reference_get_peeled_oid()`Patrick Steinhardt1-9/+13
The `write_v0_ref()` callback is invoked from two callsites: - Once via `send_ref()` which is a callback passed to `for_each_namespaced_ref_1()` and `refs_head_ref_namespaced()`. - Once manually to announce capabilities. When sending references to the client we also send the peeled value of tags. As we don't have a `struct reference` available in the second case, we cannot easily peel by calling `reference_get_peeled_oid()`, but we instead have to depend on on global state via `peel_iterated_oid()`. We do have a reference available though in the first case, it's only the second case that keeps us from using `reference_get_peeled_oid()`. But that second case only announces capabilities anyway, so we're not really handling a reference at all here. Adapt that case to construct a reference manually and pass that to `write_v0_ref()`. Start to use `reference_get_peeled_oid()` now that we always have a `struct reference` available. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: expose peeled object ID via the iteratorPatrick Steinhardt12-10/+48
Both the "files" and "reftable" backend are able to store peeled values for tags in the respective formats. This allows for a more efficient lookup of the target object of such a tag without having to manually peel via the object database. The infrastructure to access these peeled object IDs is somewhat funky though. When iterating through objects, we store a pointer reference to the current iterator in a global variable. The callbacks invoked by that iterator are then expected to call `peel_iterated_oid()`, which checks whether the globally-stored iterator's current reference refers to the one handed into that function. If so, we ask the iterator to peel the object, otherwise we manually peel the object via the object database. Depending on global state like this is somewhat weird and also quite fragile. Introduce a new `struct reference::peeled_oid` field that can be populated by the reference backends. This field can be accessed via a new function `reference_get_peeled_oid()` that either uses that value, if set, or alternatively peels via the ODB. With this change we don't have to rely on global state anymore, but make the peeled object ID available to the callback functions directly. Adjust trivial callers that already have a `struct reference` available. Remaining callers will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: refactor reference status flagsPatrick Steinhardt1-20/+21
The reference flags encode information like whether or not a reference is a symbolic reference or whether it may be broken. This information is stored in a `int flags` bitfield, which is in conflict with our modern best practices; we tend to use an unsigned integer to store flags. Change the type of the field to be `unsigned`. While at it, refactor the individual flags to be part of an `enum` instead of using preprocessor defines. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: fully reset `struct ref_iterator::ref` on iterationPatrick Steinhardt3-1/+4
With the introduction of the `struct ref_iterator::ref` field it now is a whole lot easier to introduce new fields that become accessible to the caller without having to adapt every single callsite. But there's a downside: when a new field is introduced we always have to adapt all backends to set that field. This isn't something we can avoid in the general case: when the new field is expected to be populated by all backends we of course cannot avoid doing so. But new fields may be entirely optional, in which case we'd still have such churn. And furthermore, it is very easy right now to leak state from a previous iteration into the next iteration. Address this issue by ensuring that the reference backends all fully reset the field on every single iteration. This ensures that no state from previous iterations can leak into the next one. And it ensures that any newly introduced fields will be zeroed out by default. Note that we don't have to explicitly adapt the "files" backend, as it uses the `cache_ref_iterator` internally. Furthermore, other "wrapping" iterators like for example the `prefix_ref_iterator` copy around the whole reference, so these don't need to be adapted either. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce `.ref` field for the base iteratorPatrick Steinhardt8-100/+75
The base iterator has a couple of fields that tracks the name, target, object ID and flags for the current reference. Due to this design we have to create a new `struct reference` whenever we want to hand over that reference to the callback function, which is tedious and not very efficient. Convert the structure to instead contain a `struct reference` as member. This member is expected to be populated by the implementations of the iterator and is handed over to the callback directly. While at it, simplify `should_pack_ref()` to take a `struct reference` directly instead of passing its respective fields. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt49-462/+392
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: refactor writing objects via a streamPatrick Steinhardt5-17/+27
We have two different ways to write an object into the database: - We either provide the full buffer and write the object all at once. - Or we provide an input stream that has a `read()` function so that we can chunk the object. The latter is especially used for large objects, where it may be too expensive to hold the complete object in memory all at once. While we already have `odb_write_object()` at the ODB-layer, we don't have an equivalent for streaming an object. Introduce a new function `odb_write_object_stream()` to address this gap so that callers don't have to be aware of the inner workings of how to stream an object to disk with a specific object source. Rename `stream_loose_object()` to `odb_source_loose_write_stream()` to clarify its scope. This matches our modern best practices around how to name functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: rename `write_object_file()`Patrick Steinhardt3-10/+11
Rename `write_object_file()` to `odb_source_loose_write_object()` so that it becomes clear that this is tied to a specific loose object source. This matches our modern naming schema for functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: refactor freshening of objectsPatrick Steinhardt6-28/+46
When writing an object that already exists in our object database we skip the write and instead only update mtimes of the object, either in its packed or loose object format. This logic is wholly contained in "object-file.c", but that file is really only concerned with loose objects. So it does not really make sense that it also contains the logic to freshen a packed object. Introduce a new `odb_freshen_object()` function that sits on the object database level and two functions `packfile_store_freshen_object()` and `odb_source_loose_freshen_object()`. Like this, the format-specific functions can be part of their respective subsystems, while the backend agnostic function to freshen an object sits at the object database layer. Note that this change also moves the logic that iterates through object sources from the object source layer into the object database layer. This change is intentional: object sources should ideally only have to worry about themselves, and coordination of different sources should be handled on the object database level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: rename `has_loose_object()`Patrick Steinhardt3-13/+13
Rename `has_loose_object()` to `odb_source_loose_has_object()` so that it becomes clear that this is tied to a specific loose object source. This matches our modern naming schema for functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: read objects via the loose object sourcePatrick Steinhardt4-53/+50
When reading an object via `loose_object_info()` or `map_loose_object()` we hand in the whole repository. We then iterate through each of the object sources to figure out whether that source has the object in question. This logic is reversing responsibility though: a specific backend should only care about one specific source, where the object sources themselves are then managed by the object database. Refactor the code accordingly by passing an object source to both of these functions instead. The different sources are then handled by either `do_oid_object_info_extended()`, which sits on the object database level, and by `open_istream_loose()`. The latter function arguably is still at the wrong level, but this will be cleaned up at a later point in time. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: move loose object map into loose sourcePatrick Steinhardt5-9/+9
The loose object map is used to map from the repository's canonical object hash to the compatibility hash. As the name indicates, this map is only used for loose objects, and as such it is tied to a specific loose object source. Same as with preceding commits, move this map into the loose object source accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: hide internals when we need to reprepare loose sourcesPatrick Steinhardt3-10/+15
There are two different situations where we have to clear the cache of loose objects: - When freeing the loose object source itself to avoid memory leaks. - When repreparing the loose object source so that any potentially- stale data is getting evicted from the cache. The former is already handled by `odb_source_loose_free()`. But the latter case is still done manually by in `odb_reprepare()`, so we are leaking internals into that code. Introduce a new `odb_source_loose_reprepare()` function as an equivalent to `packfile_store_prepare()` to hide these implementation details. Furthermore, while at it, rename the function `odb_clear_loose_cache()` to `odb_source_loose_clear()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: move loose object cache into loose sourcePatrick Steinhardt6-36/+39
Our loose objects use a cache that (optionally) stores all objects for each of the opened sharding directories. This cache is located in the `struct odb_source`, but now that we have `struct odb_source_loose` it makes sense to move it into the latter structure so that all state that relates to loose objects is entirely self-contained. Do so. While at it, rename corresponding functions to have a prefix that relates to `struct odb_source_loose`. Note that despite this prefix, the functions still accept a `struct odb_source` as input. This is done intentionally: once we introduce pluggable object databases, we will continue to accept this struct but then do a cast inside these functions to `struct odb_source_loose`. This design is similar to how we do it for our ref backends. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: introduce `struct odb_source_loose`Patrick Steinhardt4-0/+25
Currently, all state that relates to loose objects is held directly by the `struct odb_source`. Introduce a new `struct odb_source_loose` to hold the state instead so that it is entirely self-contained. This structure will eventually morph into the backend for accessing loose objects. As such, this is part of the refactorings to introduce pluggable object databases. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: move `fetch_if_missing`Patrick Steinhardt2-8/+8
The `fetch_if_missing` global variable is declared in "object-file.h" but defined in "odb.c". The variable relates to the whole object database instead of only loose objects, so move the declaration into "odb.h" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03odb: adjust naming to free object sourcesPatrick Steinhardt1-5/+5
The functions `free_object_directory()` and `free_object_directories()` are responsible for freeing a single object source or all object sources connected to an object database, respectively. The associated structure has been renamed from `struct object_directory` to `struct odb_source` in a1e2581a1e (object-store: rename `object_directory` to `odb_source`, 2025-07-01) though, so the names are somewhat stale nowadays. Rename them to mention the new struct name instead. Furthermore, while at it, adapt them to our modern naming schema where we first have the subject followed by a verb. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03odb: introduce `odb_source_new()`Patrick Steinhardt3-12/+29
We have three different locations where we create a new ODB source. Deduplicate the logic via a new `odb_source_new()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03odb: fix subtle logic to check whether an alternate is usablePatrick Steinhardt1-13/+17
When adding an alternate to the object database we first check whether or not the path is usable. A path is usable if: - It actually exists. - We don't have it in our object sources yet. While the former check is trivial enough, the latter part is somewhat subtle and prone for bugs. This is because the function doesn't only check whether or not the given path is usable. But if it _is_ usable, we also store that path in the map of object sources immediately. The tricky part here is that the path that gets stored in the map is _not_ copied. Instead, we rely on the fact that subsequent code uses `strbuf_detach()` to store the exact same allocated memory in the created object source. Consequently, the memory is owned by the source but _also_ stored in the map. This subtlety is easy to miss, so if one decides to refactor this code one can easily end up breaking this mechanism. Make the relationship more explicit by not storing the path as part of `alt_odb_usable()`. Instead, store the path after we have created the source so that we can use the source's path pointer directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03last-modified: implement faster algorithmToon Claes3-16/+237
The current implementation of git-last-modified(1) works by doing a revision walk, and inspecting the diff at each level of that walk to annotate entries remaining in the hashmap of paths. In other words, if the diff at some level touches a path which has not yet been associated with a commit, then that commit becomes associated with the path. While a perfectly reasonable implementation, it can perform poorly in either one of two scenarios: 1. There are many entries of interest, in which case there is simply a lot of work to do. 2. Or, there are (even a few) entries which have not been updated in a long time, and so we must walk through a lot of history in order to find a commit that touches that path. This patch rewrites the last-modified implementation that addresses the second point. The idea behind the algorithm is to propagate a set of 'active' paths (a path is 'active' if it does not yet belong to a commit) up to parents and do a truncated revision walk. The walk is truncated because it does not produce a revision for every change in the original pathspec, but rather only for active paths. More specifically, consider a priority queue of commits sorted by generation number. First, enqueue the set of boundary commits with all paths in the original spec marked as interesting. Then, while the queue is not empty, do the following: 1. Pop an element, say, 'c', off of the queue, making sure that 'c' isn't reachable by anything in the '--not' set. 2. For each parent 'p' (with index 'parent_i') of 'c', do the following: a. Compute the diff between 'c' and 'p'. b. Pass any active paths that are TREESAME from 'c' to 'p'. c. If 'p' has any active paths, push it onto the queue. 3. Any path that remains active on 'c' is associated to that commit. This ends up being equivalent to doing something like 'git log -1 -- $path' for each path simultaneously. But, it allows us to go much faster than the original implementation by limiting the number of diffs we compute, since we can avoid parts of history that would have been considered by the revision walk in the original implementation, but are known to be uninteresting to us because we have already marked all paths in that area to be inactive. To avoid computing many first-parent diffs, add another trick on top of this and check if all paths active in 'c' are DEFINITELY NOT in c's Bloom filter. Since the commit-graph only stores first-parent diffs in the Bloom filters, we can only apply this trick to first-parent diffs. Comparing the performance of this new algorithm shows about a 2.5x improvement on git.git: Benchmark 1: master no bloom Time (mean ± σ): 2.868 s ± 0.023 s [User: 2.811 s, System: 0.051 s] Range (min … max): 2.847 s … 2.926 s 10 runs Benchmark 2: master with bloom Time (mean ± σ): 949.9 ms ± 15.2 ms [User: 907.6 ms, System: 39.5 ms] Range (min … max): 933.3 ms … 971.2 ms 10 runs Benchmark 3: HEAD no bloom Time (mean ± σ): 782.0 ms ± 6.3 ms [User: 740.7 ms, System: 39.2 ms] Range (min … max): 776.4 ms … 798.2 ms 10 runs Benchmark 4: HEAD with bloom Time (mean ± σ): 307.1 ms ± 1.7 ms [User: 276.4 ms, System: 29.9 ms] Range (min … max): 303.7 ms … 309.5 ms 10 runs Summary HEAD with bloom ran 2.55 ± 0.02 times faster than HEAD no bloom 3.09 ± 0.05 times faster than master with bloom 9.34 ± 0.09 times faster than master no bloom In short, the existing implementation is comparably fast *with* Bloom filters as the new implementation is *without* Bloom filters. So, most repositories should get a dramatic speed-up by just deploying this (even without computing Bloom filters), and all repositories should get faster still when computing Bloom filters. When comparing a more extreme example of `git last-modified -- COPYING t`, the difference is even 5 times better: Benchmark 1: master Time (mean ± σ): 4.372 s ± 0.057 s [User: 4.286 s, System: 0.062 s] Range (min … max): 4.308 s … 4.509 s 10 runs Benchmark 2: HEAD Time (mean ± σ): 826.3 ms ± 22.3 ms [User: 784.1 ms, System: 39.2 ms] Range (min … max): 810.6 ms … 881.2 ms 10 runs Summary HEAD ran 5.29 ± 0.16 times faster than master As an added benefit, results are more consistent now. For example implementation in 'master' gives: $ git log --max-count=1 --format=%H -- pkt-line.h 15df15fe07ef66b51302bb77e393f3c5502629de $ git last-modified -- pkt-line.h 15df15fe07ef66b51302bb77e393f3c5502629de pkt-line.h $ git last-modified | grep pkt-line.h 5b49c1af03e600c286f63d9d9c9fb01403230b9f pkt-line.h With the changes in this patch the results of git-last-modified(1) always match those of `git log --max-count=1`. One thing to note though, the results might be outputted in a different order than before. This is not considerd to be an issue because nowhere is documented the order is guaranteed. Based-on-patches-by: Derrick Stolee <stolee@gmail.com> Based-on-patches-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Taylor Blau <me@ttaylorr.com> [jc: tweaked use of xcalloc() to unbreak coccicheck] Signed-off-by: Junio C Hamano <gitster@pobox.com>