aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2025-02-25t/unit-tests: convert oid-array test to use clar test frameworkSeyi Kuforiji4-128/+131
Adapt oid-array test script to clar framework by using clar assertions where necessary. Remove descriptions from macros to reduce redundancy, and move test input arrays to global scope for reuse across multiple test functions. Introduce `test_oid_array__initialize()` to explicitly initialize the hash algorithm. These changes streamline the test suite, making individual tests self-contained and reducing redundant code. Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25t/unit-tests: implement clar specific oid helper functionsSeyi Kuforiji5-26/+21
`get_oid_arbitrary_hex()` and `init_hash_algo()` are both required for oid-related tests to run without errors. In the current implementation, both functions are defined and declared in the `t/unit-tests/lib-oid.{c,h}` which is utilized by oid-related tests in the homegrown unit tests structure. Adapt functions in lib-oid.{c,h} to use clar. Both these functions become available for oid-related test files implemented using the clar testing framework, which requires them. This will be used by subsequent commits. Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-21The thirteenth batchJunio C Hamano1-0/+5
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-21Merge branch 'ac/doc-http-ssl-type-config'Junio C Hamano1-0/+15
Two configuration variables about SSL authentication material that weren't mentioned in the documentations are now mentioned. * ac/doc-http-ssl-type-config: docs: indicate http.sslCertType and sslKeyType
2025-02-21Merge branch 'jc/doc-boolean-synonyms'Junio C Hamano3-7/+10
Doc updates. * jc/doc-boolean-synonyms: doc: centrally document various ways tospell `true` and `false`
2025-02-21Merge branch 'en/doc-renormalize'Junio C Hamano3-4/+5
Doc updates. * en/doc-renormalize: doc: clarify the intent of the renormalize option in the merge machinery
2025-02-21Merge branch 'ua/update-server-info-sans-the-repository'Junio C Hamano1-4/+4
Code clean-up. * ua/update-server-info-sans-the-repository: builtin/update-server-info: remove the_repository global variable
2025-02-20Merge branch 'master' of https://github.com/j6t/gitkJunio C Hamano5-37/+241
* 'master' of https://github.com/j6t/gitk: gitk: introduce support for the Meson build system gitk: extract script to build executable gitk: make the "list references" default window width wider gitk: fix arrow keys in input fields with Tcl/Tk >= 8.6 gitk: Use an external icon file on Windows gitk: Unicode file name support gitk(Windows): avoid inadvertently calling executables in the worktree
2025-02-20Merge branch 'pks-meson-support' of https://github.com/pks-t/gitkJohannes Sixt4-3/+62
* 'pks-meson-support' of https://github.com/pks-t/gitk: gitk: introduce support for the Meson build system gitk: extract script to build executable Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2025-02-20Merge branch 'g4w-gitk' of https://github.com/dscho/gitkJohannes Sixt1-34/+179
* 'g4w-gitk' of https://github.com/dscho/gitk: gitk: make the "list references" default window width wider gitk: fix arrow keys in input fields with Tcl/Tk >= 8.6 gitk: Use an external icon file on Windows gitk: Unicode file name support gitk(Windows): avoid inadvertently calling executables in the worktree Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2025-02-20gitk: introduce support for the Meson build systemPatrick Steinhardt2-0/+49
Upstream Git has introduced support for the Meson build system. Introduce support for Meson into gitk, as well, so that Git can easily build its vendored copy of Gitk via a `subproject()` directive. The instructions can be set up as follows: $ meson setup build $ meson compile -C build $ meson install -C build Specific options, like for example where Gitk shall be installed to, can be specified at setup time via `-D`. Available options can be discovered by running `meson configure` either in the source or build directory. Signed-off-by: Patrick Steinhardt <ps@pks.im>
2025-02-20gitk: extract script to build executablePatrick Steinhardt2-3/+13
Extract the scrip that "builds" Gitk from our Makefile so that we can reuse it in Meson. Signed-off-by: Patrick Steinhardt <ps@pks.im>
2025-02-18The twelfth batchJunio C Hamano1-0/+17
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18Merge branch 'bc/contrib-thunderbird-patch-inline-fix'Junio C Hamano1-1/+1
A thunderbird helper script lost its bashism. * bc/contrib-thunderbird-patch-inline-fix: thunderbird-patch-inline: avoid bashism
2025-02-18Merge branch 'lo/t7603-path-is-file-update'Junio C Hamano1-12/+12
Test clean-up. * lo/t7603-path-is-file-update: t7603: replace test -f by test_path_is_file
2025-02-18Merge branch 'da/difftool-sans-the-repository'Junio C Hamano1-39/+56
"git difftool" code clean-up. * da/difftool-sans-the-repository: difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE difftool: eliminate use of the_repository difftool: eliminate use of global variables
2025-02-18Merge branch 'jt/rev-list-missing-print-info'Junio C Hamano3-17/+161
"git rev-list --missing=" learned to accept "print-info" that gives known details expected of the missing objects, like path and type. * jt/rev-list-missing-print-info: rev-list: extend print-info to print missing object type rev-list: add print-info action to print missing object path
2025-02-18Merge branch 'ps/send-pack-unhide-error-in-atomic-push'Junio C Hamano6-142/+406
"git push --atomic --porcelain" used to ignore failures from the other side, losing the error status from the child process, which has been corrected. * ps/send-pack-unhide-error-in-atomic-push: send-pack: gracefully close the connection for atomic push t5543: atomic push reports exit code failure send-pack: new return code "ERROR_SEND_PACK_BAD_REF_STATUS" t5548: add porcelain push test cases for dry-run mode t5548: add new porcelain test cases t5548: refactor test cases by resetting upstream t5548: refactor to reuse setup_upstream() function t5504: modernize test by moving heredocs into test bodies
2025-02-18Merge branch 'ds/backfill'Junio C Hamano18-14/+540
Lazy-loading missing files in a blobless clone on demand is costly as it tends to be one-blob-at-a-time. "git backfill" is introduced to help bulk-download necessary files beforehand. * ds/backfill: backfill: assume --sparse when sparse-checkout is enabled backfill: add --sparse option backfill: add --min-batch-size=<n> option backfill: basic functionality and tests backfill: add builtin boilerplate
2025-02-14The eleventh batchJunio C Hamano1-0/+14
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-14Merge branch 'ps/doc-http-upload-archive-service'Junio C Hamano1-0/+4
Doc update. * ps/doc-http-upload-archive-service: doc: documentation for http.uploadarchive config option
2025-02-14Merge branch 'kn/reflog-migration-fix-followup'Junio C Hamano9-46/+118
Code clean-up. * kn/reflog-migration-fix-followup: reftable: prevent 'update_index' changes after adding records refs: use 'uint64_t' for 'ref_update.index' refs: mark `ref_transaction_update_reflog()` as static
2025-02-14Merge branch 'bf/fetch-set-head-fix'Junio C Hamano3-13/+39
Fetching into a bare repository incorrectly assumed it always used a mirror layout when deciding to update remote-tracking HEAD, which has been corrected. * bf/fetch-set-head-fix: fetch set_head: fix non-mirror remotes in bare repositories fetch set_head: refactor to use remote directly
2025-02-14Merge branch 'op/worktree-is-main-bare-fix'Junio C Hamano2-9/+46
Going into a secondary worktree and asking "is the main worktree bare?" did not work correctly when per-worktree configuration option was in use, which has been corrected. * op/worktree-is-main-bare-fix: worktree: detect from secondary worktree if main worktree is bare
2025-02-14Merge branch 'tc/clone-single-revision'Junio C Hamano8-165/+357
"git clone" learned to make a shallow clone for a single commit that is not necessarily be at the tip of any branch. * tc/clone-single-revision: builtin/clone: teach git-clone(1) the --revision= option parse-options: introduce die_for_incompatible_opt2() clone: introduce struct clone_opts in builtin/clone.c clone: add tags refspec earlier to fetch refspec clone: refactor wanted_peer_refs() clone: make it possible to specify --tags clone: cut down on global variables in clone.c
2025-02-14Merge branch 'bc/doc-adoc-not-txt'Junio C Hamano925-626/+627
All the documentation .txt files have been renamed to .adoc to help content aware editors. * bc/doc-adoc-not-txt: Remove obsolete ".txt" extensions for AsciiDoc files doc: use .adoc extension for AsciiDoc files gitattributes: mark AsciiDoc files as LF-only editorconfig: add .adoc extension doc: update gitignore for .adoc extension
2025-02-12The tenth batchJunio C Hamano1-0/+22
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12Merge branch 'da/help-autocorrect-one-fix'Junio C Hamano3-11/+18
"git -c help.autocorrect=0 psuh" shows the suggested typofix, unlike the previous attempt in the base topic. * da/help-autocorrect-one-fix: help: add "show" as a valid configuration value help: show the suggested command when help.autocorrect is false
2025-02-12Merge branch 'sc/help-autocorrect-one'Junio C Hamano2-16/+35
"[help] autocorrect = 1" used to be a way to say "please wait for 0.1 second after suggesting a typofix of the command name before running that command"; now it means "yes, if there is a plausible typofix for the command name, please run it immediately". * sc/help-autocorrect-one: help: interpret boolean string values for help.autocorrect
2025-02-12Merge branch 'ms/remote-valid-remote-name'Junio C Hamano4-11/+12
Code shuffling. * ms/remote-valid-remote-name: remote: relocate valid_remote_name
2025-02-12Merge branch 'ms/refspec-cleanup'Junio C Hamano6-220/+244
Code clean-up. cf. <Z6G-toOJjMmK8iJG@pks.im> * ms/refspec-cleanup: refspec: relocate apply_refspecs and related funtions refspec: relocate matching related functions remote: rename query_refspecs functions refspec: relocate refname_matches_negative_refspec_item remote: rename function omit_name_by_refspec
2025-02-12Merge branch 'jp/doc-trailer-config'Junio C Hamano3-135/+140
Documentaiton updates. * jp/doc-trailer-config: config.txt: add trailer.* variables
2025-02-12Merge branch 'zh/gc-expire-to'Junio C Hamano3-2/+47
"git gc" learned the "--expire-to" option and passes it down to underlying "git repack". * zh/gc-expire-to: gc: add `--expire-to` option
2025-02-12Merge branch 'js/libgit-rust'Junio C Hamano24-81/+673
Foreign language interface for Rust into our code base has been added. * js/libgit-rust: libgit: add higher-level libgit crate libgit-sys: also export some config_set functions libgit-sys: introduce Rust wrapper for libgit.a common-main: split init and exit code into new files
2025-02-12Merge branch 'ac/t5401-use-test-path-is-file'Junio C Hamano1-8/+8
Test clean-up. * ac/t5401-use-test-path-is-file: t5401: prefer test_path_is_* helper function
2025-02-12Merge branch 'ac/t6423-unhide-git-exit-status'Junio C Hamano1-3/+6
Test clean-up. * ac/t6423-unhide-git-exit-status: t6423: fix suppression of Git’s exit code in tests
2025-02-12Merge branch 'ps/repack-keep-unreachable-in-unpacked-repo'Junio C Hamano2-1/+20
"git repack --keep-unreachable" to send unreachable objects to the main pack "git repack -ad" produces did not work when there is no existing packs, which has been corrected. * ps/repack-keep-unreachable-in-unpacked-repo: builtin/repack: fix `--keep-unreachable` when there are no packs
2025-02-12Merge branch 'ds/name-hash-tweaks'Junio C Hamano22-16/+389
"git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version
2025-02-11doc: clarify the intent of the renormalize option in the merge machineryElijah Newren3-4/+5
The -X renormalize (or merge.renormalize config) option is intended to reduce conflicts due to normalization of newer versions of history. It does so by renormalizing files that it is about to do a three-way content merge on. Some folks thought it would renormalize all files throughout the tree, and the previous wording wasn't clear enough to dispell that misconception. Update the docs to make it clear that the merge machinery will only apply renormalization to files which need a three-way content merge. (Technically, the merge machinery also does renormalization on modify/delete conflicts, in order to see if the modification was merely a normalization; if so, it can accept the delete and not report a conflict. But it's not clear that this piece needs to be explained to users, and trying to distinguish it might feel like splitting hairs and overcomplicating the explanation, so we leave it out.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-11doc: centrally document various ways tospell `true` and `false`Junio C Hamano3-7/+10
We do not seem to centrally document exhaustively ways to spell Boolean values. The description in the Environment Variables of git(1) section assumes that the reader is already familiar with how "Boolean valued configuration variables" are specified, without referring to anything, so there is no way for the readers to find out more. The description of `bool` in the section on "--type <type>" in "git config --help" might be the place to do so, but it is not telling us all that much. The description of Boolean valued placeholders in the pretty formats section of "git log --help" enumerates the possible values with "etc." implying there may be other synonyms; shrink the list of samples and instead refer to the canonical and authoritative source of truth, which now is git-config(1). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10builtin/update-server-info: remove the_repository global variableUsman Akinyemi1-4/+4
Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/update-server-info.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_update_server_info()` function with `repo` set to NULL and then early in the function, "parse_options()" call will give the options help and exit, without having to consult much of the configuration file. So it is safe to omit reading the config when `repo` argument the caller gave us is NULL. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10thunderbird-patch-inline: avoid bashismbrian m. carlson1-1/+1
The use of "echo -e" is not portable and not specified by POSIX. dash does not support any options except "-n", and so this script will not work on operating systems which use that as /bin/sh. Fortunately, the solution is easy: switch to printf(1), which is specified by POSIX and allows the escape sequences we want to use. This will allow the script to work with any POSIX shell. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10The ninth batchJunio C Hamano1-0/+16
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10Merge branch 'jk/ci-coverity-update'Junio C Hamano1-1/+1
CI update to make Coverity job work again. * jk/ci-coverity-update: ci: set CI_JOB_IMAGE for coverity job
2025-02-10Merge branch 'sk/unit-tests-0130'Junio C Hamano9-353/+348
Convert a handful of unit tests to work with the clar framework. * sk/unit-tests-0130: t/unit-tests: convert strcmp-offset test to use clar test framework t/unit-tests: convert strbuf test to use clar test framework t/unit-tests: adapt example decorate test to use clar test framework t/unit-tests: convert hashmap test to use clar test framework
2025-02-10Merge branch 'ps/hash-cleanup'Junio C Hamano23-202/+226
Further code clean-up on the use of hash functions. Now the context object knows what hash function it is working with. * ps/hash-cleanup: global: adapt callers to use generic hash context helpers hash: provide generic wrappers to update hash contexts hash: stop typedeffing the hash context hash: convert hashing context to a structure
2025-02-10Merge branch 'jt/gitlab-ci-base-fix'Junio C Hamano1-2/+2
Two CI tasks, whitespace check and style check, work on the difference from the base version and the version being checked, but the base was computed incorrectly in GitLab CI in some cases, which has been corrected. * jt/gitlab-ci-base-fix: ci: fix base commit fallback for check-whitespace and check-style
2025-02-10Merge branch 'pw/apply-ulong-overflow-check'Junio C Hamano2-0/+16
"git apply" internally uses unsigned long for line numbers and uses strtoul() to parse numbers on the hunk headers. It however forgot to check parse errors. * pw/apply-ulong-overflow-check: apply: detect overflow when parsing hunk header
2025-02-10Merge branch 'ps/setup-reinit-fixes'Junio C Hamano2-11/+27
"git init" to reinitialize a repository that already exists cannot change the hash function and ref backends; such a request is silently ignored now. * ps/setup-reinit-fixes: setup: fix reinit of repos with incompatible GIT_DEFAULT_HASH setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT t0001: remove duplicate test
2025-02-10t7603: replace test -f by test_path_is_fileLucas Oshiro1-12/+12
`test_path_is_file` provides a better output when asserting whether a file exists. Replace the occurrences of `test -f` in t7603 with it, facilitating the trace of possible test failures. Signed-off-by: Lucas Oshiro <lucasseikioshiro@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06The eighth batchJunio C Hamano1-0/+10
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06Merge branch 'ps/leakfixes-0129'Junio C Hamano2-2/+6
A few more leakfixes. * ps/leakfixes-0129: scalar: free result of `remote_default_branch()` unix-socket: fix memory leak when chdir(3p) fails
2025-02-06Merge branch 'ps/zlib-ng'Junio C Hamano20-140/+107
The code paths to interact with zlib has been cleaned up in preparation for building with zlib-ng. * ps/zlib-ng: ci: make "linux-musl" job use zlib-ng ci: switch linux-musl to use Meson compat/zlib: allow use of zlib-ng as backend git-zlib: cast away potential constness of `next_in` pointer compat/zlib: provide stubs for `deflateSetHeader()` compat/zlib: provide `deflateBound()` shim centrally git-compat-util: move include of "compat/zlib.h" into "git-zlib.h" compat: introduce new "zlib.h" header git-compat-util: drop `z_const` define compat: drop `uncompress2()` compatibility shim
2025-02-06Merge branch 'js/bundle-unbundle-fd-reuse-fix'Junio C Hamano3-1/+6
The code path used when "git fetch" fetches from a bundle file closed the same file descriptor twice, which sometimes broke things unexpectedly when the file descriptor was reused, which has been corrected. * js/bundle-unbundle-fd-reuse-fix: bundle: avoid closing file descriptor twice
2025-02-06Merge branch 'ps/ci-misc-updates'Junio C Hamano7-90/+95
CI updates (containerization, dropping stale ones, etc.). * ps/ci-misc-updates: ci: remove stale code for Azure Pipelines ci: use latest Ubuntu release ci: stop special-casing for Ubuntu 16.04 gitlab-ci: add linux32 job testing against i386 gitlab-ci: remove the "linux-old" job github: simplify computation of the job's distro github: convert all Linux jobs to be containerized github: adapt containerized jobs to be rootless t7422: fix flaky test caused by buffered stdout t0060: fix EBUSY in MinGW when setting up runtime prefix
2025-02-06difftool: eliminate use of USE_THE_REPOSITORY_VARIABLEDavid Aguilar1-2/+0
Remove the USE_THE_REPOSITORY_VARIABLE #define now that all state is passed to each function from callers. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06difftool: eliminate use of the_repositoryDavid Aguilar1-25/+29
Make callers pass a repository struct into each function instead of relying on the global the_repository variable. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06difftool: eliminate use of global variablesDavid Aguilar1-18/+33
Move difftool's global variables into a difftools_option struct in preparation for removal of USE_THE_REPOSITORY_VARIABLE. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06doc: documentation for http.uploadarchive config optionPiotr Szlazak1-0/+4
In Git v2.44.0 support for 'git archive' over HTTP protocol was added, but it was nowhere documented how it should be enabled in git-http-backend. Add missing documentation. Signed-off-by: Piotr Szlazak <piotr.szlazak@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06builtin/clone: teach git-clone(1) the --revision= optionToon Claes4-11/+178
The git-clone(1) command has the option `--branch` that allows the user to select the branch they want HEAD to point to. In a non-bare repository this also checks out that branch. Option `--branch` also accepts a tag. When a tag name is provided, the commit this tag points to is checked out and HEAD is detached. Thus `--branch` can be used to clone a repository and check out a ref kept under `refs/heads` or `refs/tags`. But some other refs might be in use as well. For example Git forges might use refs like `refs/pull/<id>` and `refs/merge-requests/<id>` to track pull/merge requests. These refs cannot be selected upon git-clone(1). Add option `--revision` to git-clone(1). This option accepts a fully qualified reference, or a hexadecimal commit ID. This enables the user to clone and check out any revision they want. `--revision` can be used in conjunction with `--depth` to do a minimal clone that only contains the blob and tree for a single revision. This can be useful for automated tests running in CI systems. Using option `--branch` and `--single-branch` together is a similar scenario, but serves a different purpose. Using these two options, a singlet remote tracking branch is created and the fetch refspec is set up so git-fetch(1) will receive updates on that branch from the remote. This allows the user work on that single branch. Option `--revision` on contrary detaches HEAD, creates no tracking branches, and writes no fetch refspec. Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Patrick Steinhardt <ps@pks.im> [jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06parse-options: introduce die_for_incompatible_opt2()Toon Claes2-3/+13
The functions die_for_incompatible_opt3() and die_for_incompatible_opt4() already exist to die whenever a user specifies three or four options respectively that are not compatible. Introduce die_for_incompatible_opt2() which dies when two options that are incompatible are set. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: introduce struct clone_opts in builtin/clone.cToon Claes3-16/+35
There is a lot of state stored in global variables in builtin/clone.c. In the long run we'd like to remove many of those. Introduce `struct clone_opts` in this file. This struct will be used to contain all details needed to perform the clone. The struct object can be thrown around to all the functions that need these details. The first field we're adding is `wants_head`. In some scenarios (specifically when both `--single-branch` and `--branch` are given) we are not interested in `HEAD` on the remote. The field `wants_head` in `struct clone_opts` will hold this information. We could have put `option_branch` and `option_single_branch` into that struct instead, but in a following commit we'll be using `wants_head` as well. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: add tags refspec earlier to fetch refspecToon Claes1-16/+11
In clone.c we call refspec_ref_prefixes() to copy the fetch refspecs from the `remote->fetch` refspec into `ref_prefixes` of `transport_ls_refs_options`. Afterwards we add the tags prefix `refs/tags/` prefix as well. At a later point, in wanted_peer_refs() we process refs using both `remote->fetch` and `TAG_REFSPEC`. Simplify the code by appending `TAG_REFSPEC` to `remote->fetch` before calling refspec_ref_prefixes(). To be able to do this, we set `option_tags` to 0 when --mirror is given. This is because --mirror mirrors (hence the name) all the refs, including tags and they do not need to be treated separately. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: refactor wanted_peer_refs()Toon Claes1-24/+15
The function wanted_peer_refs() is used to map the refs returned by the server to refs we will save in our clone. Over time this function grown to be very complex. Refactor it. Previously, there was a separate code path for when `option_single_branch` was set. It resulted in duplicated code and deeper nested conditions. After this refactor the code path for when `option_single_branch` is truthy modifies `refs` and then falls through to the common code path. This approach relies on the `refspec` being set correctly and thus only mapping refs that are relevant. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: make it possible to specify --tagsToon Claes2-14/+17
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option to clone without tags, 2017-04-26). At the time there was no need to support --tags as well, although there was some conversation about it[1]. To simplify the code and to prepare for future commits, invert the flag internally. Functionally there is no change, because the flag is default-enabled passing `--tags` has no effect, so there's no need to add tests for this. [1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/ Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06clone: cut down on global variables in clone.cToon Claes1-94/+101
In clone.c the `struct option` which is used to parse the input options for git-clone(1) is a global variable. Due to this, many variables that are used to parse the value into, are also global. Make `builtin_clone_options` a local variable in cmd_clone() and carry along all variables that are only used in that function. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05worktree: detect from secondary worktree if main worktree is bareOlga Pilipenco2-9/+46
When extensions.worktreeConfig is true and the main worktree is bare -- that is, its config.worktree file contains core.bare=true -- commands run from secondary worktrees incorrectly see the main worktree as not bare. As such, those commands incorrectly think that the repository's default branch (typically "main" or "master") is checked out in the bare repository even though it's not. This makes it impossible, for instance, to checkout or delete the default branch from a secondary worktree, among other shortcomings. This problem occurs because, when extensions.worktreeConfig is true, commands run in secondary worktrees only consult $commondir/config and $commondir/worktrees/<id>/config.worktree, thus they never see the main worktree's core.bare=true setting in $commondir/config.worktree. Fix this problem by consulting the main worktree's config.worktree file when checking whether it is bare. (This extra work is performed only when running from a secondary worktree.) Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Olga Pilipenco <olga.pilipenco@shopify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05docs: indicate http.sslCertType and sslKeyTypeAndrew Carter1-0/+15
0a01d41ee4 (http: add support for different sslcert and sslkey types., 2023-03-20) added useful SSL config options, but did not document them. Signed-off-by: Andrew Carter <andrew@emailcarter.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05rev-list: extend print-info to print missing object typeJustin Tobler3-4/+13
Additional information about missing objects found in git-rev-list(1) can be printed by specifying the `print-info` missing action for the `--missing` option. Extend this action to also print missing object type information inferred from its containing object. This token follows the form `type=<type>` and specifies the expected object type of the missing object. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05rev-list: add print-info action to print missing object pathJustin Tobler3-17/+152
Missing objects identified through git-rev-list(1) can be printed by setting the `--missing=print` option. Additional information about the missing object, such as its path and type, may be present in its containing object. Add the `print-info` missing action for the `--missing` option that, when set, prints additional insight about the missing object inferred from its containing object. Each line of output for a missing object is in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs containing additional information are separated from each other by a SP. The value is encoded in a token specific fashion, but SP or LF contained in value are always expected to be represented in such a way that the resulting encoded value does not have either of these two problematic bytes. This format is kept generic so it can be extended in the future to support additional information. For now, only a missing object path info is implemented. It follows the form `path=<path>` and specifies the full path to the object from the top-level tree. A path containing SP or special characters is enclosed in double-quotes in the C style as needed. In a subsequent commit, missing object type info will also be added. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04builtin/repack: fix `--keep-unreachable` when there are no packsPatrick Steinhardt2-1/+20
The "--keep-unreachable" flag is supposed to append any unreachable objects to the newly written pack. This flag is explicitly documented as appending both packed and loose unreachable objects to the new packfile. And while this works alright when repacking with preexisting packfiles, it stops working when the repository does not have any packfiles at all. The root cause are the conditions used to decide whether or not we want to append "--pack-loose-unreachable" to git-pack-objects(1). There are a couple of conditions here: - `has_existing_non_kept_packs()` checks whether there are existing packfiles. This condition makes sense to guard "--keep-pack=", "--unpack-unreachable" and "--keep-unreachable", because all of these flags only make sense in combination with existing packfiles. But it does not make sense to disable `--pack-loose-unreachable` when there aren't any preexisting packfiles, as loose objects can be packed into the new packfile regardless of that. - `delete_redundant` checks whether we want to delete any objects or packs that are about to become redundant. The documentation of `--keep-unreachable` explicitly says that `git repack -ad` needs to be executed for the flag to have an effect. It is not immediately obvious why such redundant objects need to be deleted in order for "--pack-unreachable-objects" to be effective. But as things are working as documented this is nothing we'll change for now. - `pack_everything & PACK_CRUFT` checks that we're not creating a cruft pack. This condition makes sense in the context of "--pack-loose-unreachable", as unreachable objects would end up in the cruft pack anyway. So while the second and third condition are sensible, it does not make any sense to condition `--pack-loose-unreachable` on the existence of packfiles. Fix the bug by splitting out the "--pack-loose-unreachable" and only making it depend on the second and third condition. Like this, loose unreachable objects will be packed regardless of any preexisting packfiles. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04remote: relocate valid_remote_nameMeet Soni4-11/+12
Move the `valid_remote_name()` function from the refspec subsystem to the remote subsystem to better align with the separation of concerns. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04refspec: relocate apply_refspecs and related funtionsMeet Soni4-42/+44
Move the functions `apply_refspecs()` and `apply_negative_refspecs()` from `remote.c` to `refspec.c`. These functions focus on applying refspecs, so centralizing them in `refspec.c` improves code organization by keeping refspec-related logic in one place. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04refspec: relocate matching related functionsMeet Soni3-122/+139
Move the functions `refspec_find_match()`, `refspec_find_all_matches()` and `refspec_find_negative_match()` from `remote.c` to `refspec.c`. These functions focus on matching refspecs, so centralizing them in `refspec.c` improves code organization by keeping refspec-related logic in one place. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04remote: rename query_refspecs functionsMeet Soni3-12/+12
Rename functions related to handling refspecs in preparation for their move from `remote.c` to `refspec.c`. Update their names to better reflect their intent: - `query_refspecs()` -> `refspec_find_match()` for clarity, as it finds a single matching refspec. - `query_refspecs_multiple()` -> `refspec_find_all_matches()` to better reflect that it collects all matching refspecs instead of returning just the first match. - `query_matches_negative_refspec()` -> `refspec_find_negative_match()` for consistency with the updated naming convention, even though this static function didn't strictly require renaming. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04refspec: relocate refname_matches_negative_refspec_itemMeet Soni3-48/+57
Move the functions `refname_matches_negative_refspec_item()`, `refspec_match()`, and `match_name_with_pattern()` from `remote.c` to `refspec.c`. These functions focus on refspec matching, so placing them in `refspec.c` aligns with the separation of concerns. Keep refspec-related logic in `refspec.c` and remote-specific logic in `remote.c` for better code organization. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04remote: rename function omit_name_by_refspecMeet Soni3-10/+6
Rename the function `omit_name_by_refspec()` to `refname_matches_negative_refspec_item()` to provide clearer intent. The previous function name was vague and did not accurately describe its purpose. By using `refname_matches_negative_refspec_item`, make the function's purpose more intuitive, clarifying that it checks if a reference name matches any negative refspec. Rename function parameters for consistency with existing naming conventions. Use `refname` instead of `name` to align with terminology in `refs.h`. Remove the redundant doc comment since the function name is now self-explanatory. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: assume --sparse when sparse-checkout is enabledDerrick Stolee3-2/+21
The previous change introduced the '--[no-]sparse' option for the 'git backfill' command, but did not assume it as enabled by default. However, this is likely the behavior that users will most often want to happen. Without this default, users with a small sparse-checkout may be confused when 'git backfill' downloads every version of every object in the full history. However, this is left as a separate change so this decision can be reviewed independently of the value of the '--[no-]sparse' option. Add a test of adding the '--sparse' option to a repo without sparse-checkout to make it clear that supplying it without a sparse-checkout is an error. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add --sparse optionDerrick Stolee10-15/+208
One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensive as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Non-cone mode can describe the included files using both positive and negative patterns, which changes the possible return values of path_matches_pattern_list(). Test both kinds of matches for increased coverage. To test this, we can create a blobless sparse clone, expand the sparse-checkout slightly, and then run 'git backfill --sparse' to see how much data is downloaded. The general steps are 1. git clone --filter=blob:none --sparse <url> 2. git sparse-checkout set <dir1> ... <dirN> 3. git backfill --sparse For the Git repository with the 'builtin' directory in the sparse-checkout, we get these results for various batch sizes: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|-------| | (Initial clone) | 3 | 110 MB | | | 10K | 12 | 192 MB | 17.2s | | 15K | 9 | 192 MB | 15.5s | | 20K | 8 | 192 MB | 15.5s | | 25K | 7 | 192 MB | 14.7s | This case matters less because a full clone of the Git repository from GitHub is currently at 277 MB. Using a copy of the Linux repository with the 'kernel/' directory in the sparse-checkout, we get these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|------| | (Initial clone) | 2 | 1,876 MB | | | 10K | 11 | 2,187 MB | 46s | | 25K | 7 | 2,188 MB | 43s | | 50K | 5 | 2,194 MB | 44s | | 100K | 4 | 2,194 MB | 48s | This case is more meaningful because a full clone of the Linux repository is currently over 6 GB, so this is a valuable way to download a fraction of the repository and no longer need network access for all reachable objects within the sparse-checkout. Choosing a batch size will depend on a lot of factors, including the user's network speed or reliability, the repository's file structure, and how many versions there are of the file within the sparse-checkout scope. There will not be a one-size-fits-all solution. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add --min-batch-size=<n> optionDerrick Stolee3-2/+32
Users may want to specify a minimum batch size for their needs. This is only a minimum: the path-walk API provides a list of OIDs that correspond to the same path, and thus it is optimal to allow delta compression across those objects in a single server request. We could consider limiting the request to have a maximum batch size in the future. For now, we let the path-walk API batches determine the boundaries. To get a feeling for the value of specifying the --min-batch-size parameter, I tested a number of open source repositories available on GitHub. The procedure was generally: 1. git clone --filter=blob:none <url> 2. git backfill Checking the number of packfiles and the size of the .git/objects/pack directory helps to identify the effects of different batch sizes. For the Git repository, we get these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|-------| | (Initial clone) | 2 | 119 MB | | | 25K | 8 | 290 MB | 24s | | 50K | 5 | 290 MB | 24s | | 100K | 4 | 290 MB | 29s | Other than the packfile counts decreasing as we need fewer batches, the size and time required is not changing much for this small example. For the nodejs/node repository, we see these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|--------| | (Initial clone) | 2 | 330 MB | | | 25K | 19 | 1,222 MB | 1m 22s | | 50K | 11 | 1,221 MB | 1m 24s | | 100K | 7 | 1,223 MB | 1m 40s | | 250K | 4 | 1,224 MB | 2m 23s | | 500K | 3 | 1,216 MB | 4m 38s | Here, we don't have much difference in the size of the repo, though the 500K batch size results in a few MB gained. That comes at a cost of a much longer time. This extra time is due to server-side delta compression happening as the on-disk deltas don't appear to be reusable all the time. But for smaller batch sizes, the server is able to find reasonable deltas partly because we are asking for objects that appear in the same region of the directory tree and include all versions of a file at a specific path. To contrast this example, I tested the microsoft/fluentui repo, which has been known to have inefficient packing due to name hash collisions. These results are found before GitHub had the opportunity to repack the server with more advanced name hash versions: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|--------| | (Initial clone) | 2 | 105 MB | | | 5K | 53 | 348 MB | 2m 26s | | 10K | 28 | 365 MB | 2m 22s | | 15K | 19 | 407 MB | 2m 21s | | 20K | 15 | 393 MB | 2m 28s | | 25K | 13 | 417 MB | 2m 06s | | 50K | 8 | 509 MB | 1m 34s | | 100K | 5 | 535 MB | 1m 56s | | 250K | 4 | 698 MB | 1m 33s | | 500K | 3 | 696 MB | 1m 42s | Here, a larger variety of batch sizes were chosen because of the great variation in results. By asking the server to download small batches corresponding to fewer paths at a time, the server is able to provide better compression for these batches than it would for a regular clone. A typical full clone for this repository would require 738 MB. This example justifies the choice to batch requests by path name, leading to improved communication with a server that is not optimally packed. Finally, the same experiment for the Linux repository had these results: | Batch Size | Pack Count | Pack Size | Time | |-----------------|------------|-----------|---------| | (Initial clone) | 2 | 2,153 MB | | | 25K | 63 | 6,380 MB | 14m 08s | | 50K | 58 | 6,126 MB | 15m 11s | | 100K | 30 | 6,135 MB | 18m 11s | | 250K | 14 | 6,146 MB | 18m 22s | | 500K | 8 | 6,143 MB | 33m 29s | Even in this example, where the default name hash algorithm leads to decent compression of the Linux kernel repository, there is value for selecting a smaller batch size, to a limit. The 25K batch size has the fastest time, but uses 250 MB more than the 50K batch size. The 500K batch size took much more time due to server compression time and thus we should avoid large batch sizes like this. Based on these experiments, a batch size of 50,000 was chosen as the default value. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: basic functionality and testsDerrick Stolee5-4/+227
The default behavior of 'git backfill' is to fetch all missing blobs that are reachable from HEAD. Document and test this behavior. The implementation is a very simple use of the path-walk API, initializing the revision walk at HEAD to start the path-walk from all commits reachable from HEAD. Ignore the object arrays that correspond to tree entries, assuming that they are all present already. The path-walk API provides lists of objects in batches according to a common path, but that list could be very small. We want to balance the number of requests to the server with the ability to have the process interrupted with minimal repeated work to catch up in the next run. Based on some experiments (detailed in the next change) a minimum batch size of 50,000 is selected for the default. This batch size is a _minimum_. As the path-walk API emits lists of blob IDs, they are collected into a list of objects for a request to the server. When that list is at least the minimum batch size, then the request is sent to the server for the new objects. However, the list of blob IDs from the path-walk API could be much longer than the batch size. At this moment, it is unclear if there is a benefit to split the list when there are too many objects at the same path. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03backfill: add builtin boilerplateDerrick Stolee9-0/+60
In anticipation of implementing 'git backfill', populate the necessary files with the boilerplate of a new builtin. Mark the builtin as experimental at this time, allowing breaking changes in the near future, if necessary. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'master' into ds/backfillJunio C Hamano649-7943/+18783
* master: (446 commits) The seventh batch The sixth batch The fifth batch The fourth batch refs/reftable: fix uninitialized memory access of `max_index` remote: announce removal of "branches/" and "remotes/" The third batch hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper trace2: prevent segfault on config collection with valueless true refs: fix creation of reflog entries for symrefs ci: wire up Visual Studio build with Meson ci: raise error when Meson generates warnings meson: fix compilation with Visual Studio ...
2025-02-03send-pack: gracefully close the connection for atomic pushJiang Xin3-11/+4
Patrick reported an issue that the exit code of git-receive-pack(1) is ignored during atomic push with "--porcelain" flag, and added new test cases in t5543. This issue originated from commit 7dcbeaa0df (send-pack: fix inconsistent porcelain output, 2020-04-17). At that time, I chose to ignore the exit code of "finish_connect()" without investigating the root cause of the abnormal termination of git-receive-pack. That was an incorrect solution. The root cause is that an atomic push operation terminates early without sending a flush packet to git-receive-pack. As a result, git-receive-pack continues waiting for commands without exiting. By sending a flush packet at the appropriate location in "send_pack()", we ensure that the git-receive-pack process closes properly, avoiding an erroneous exit code for git-push. At the same time, revert the changes to the "transport.c" file made in commit 7dcbeaa0df. Reported-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03t5543: atomic push reports exit code failurePatrick Steinhardt1-0/+30
Add new test cases in t5543 to avoid ignoring the exit code of git-receive-pack(1) during atomic push with "--porcelain" flag. We'd typically notice this case because the refs would have their error message set. But there is an edge case when pushing refs succeeds, but git-receive-pack(1) exits with a non-zero exit code at a later point in time due to another error. An atomic git-push(1) would ignore that error code, and consequently it would return successfully and not print any error message at all. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03send-pack: new return code "ERROR_SEND_PACK_BAD_REF_STATUS"Jiang Xin3-7/+22
The "push_refs" function in the transport_vtable is the handler for git-push operation. All the "push_refs" functions for different transports (protocols) should have the same behavior, but the behavior of "git_transport_push()" function for builtin_smart_vtable in "transport.c" (which calls "send_pack()" in "send-pack.c") differs from the handler of the HTTP protocol. The "push_refs()" function for the HTTP protocol which calls the "push_refs_with_push()" function in "transport-helper.c" will return 0 even when a bad REF_STATUS (such as REF_STATUS_REJECT_NONFASTFORWARD) was found. But "send_pack()" for Git smart protocol will return -1 for a bad REF_STATUS. We cannot ignore bad REF_STATUS directly in the "send_pack()" function, because the function is also used in "builtin/send-pack.c". So we add a new non-zero error code "SEND_PACK_ERROR_REF_STATUS" for "send_pack()". Ignore the specific error code in the "git_transport_push()" function to have the same behavior as "push_refs()" for HTTP protocol. Note that even though we ignore the error here, we'll ultimately still end up detecting that a subset of refs was not pushed in `transport_push()` because we eventually call `push_had_errors()` on the remote refs. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03t5548: add porcelain push test cases for dry-run modeJiang Xin1-0/+153
New dry-run test cases: - git push --porcelain --dry-run - git push --porcelain --dry-run --force - git push --porcelain --dry-run --atomic - git push --porcelain --dry-run --atomic --force Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03t5548: add new porcelain test casesPatrick Steinhardt1-0/+68
Add two more test cases exercising git-push(1) with `--procelain`, one exercising a non-atomic and one exercising an atomic push. Based-on-patch-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03t5548: refactor test cases by resetting upstreamJiang Xin1-82/+67
Refactor the test cases with the following changes: - Calling setup_upstream() to reset upstream after running each test case. - Change the initial branch tips of the workspace to reduce the branch setup operations in the workspace. - Reduced the two steps of setting up and cleaning up the pre-receive hook by moving the operations into the corresponding test case, Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03t5548: refactor to reuse setup_upstream() functionJiang Xin1-30/+53
Refactor the function setup_upstream_and_workbench(), extracting create_upstream_template() and setup_upstream() from it. The former is used to create the upstream repository template, while the latter is used to rebuild the upstream repository and will be reused in subsequent commits. To ensure that setup_upstream() works properly in both local and HTTP protocols, the HTTP settings have been moved to the setup_upstream() and setup_upstream_and_workbench() functions. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03t5504: modernize test by moving heredocs into test bodiesPatrick Steinhardt1-19/+16
We have several heredocs in t5504 located outside of any particular test bodies. Move these into the test bodies to match our modern coding style. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03t6423: fix suppression of Git’s exit code in testsAyush Chandekar1-3/+6
Some test in t6423 supress Git's exit code, which can cause test failures go unnoticed. Specifically using git <subcommand> | <other-command> masks potential failures of the Git command. This commit ensures that Git's exit status is correctly propogated by: - Avoiding pipes that suppress exit codes. Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03help: add "show" as a valid configuration valueDavid Aguilar3-2/+4
Add a literal value for showing the suggested autocorrection for consistency with the rest of the help.autocorrect options. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03help: show the suggested command when help.autocorrect is falseDavid Aguilar3-11/+16
Make the handling of false boolean values for help.autocorrect consistent with the handling of value 0 by showing the suggested commands but not running them. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'sc/help-autocorrect-one' into da/help-autocorrect-one-fixJunio C Hamano2-16/+35
* sc/help-autocorrect-one: help: interpret boolean string values for help.autocorrect
2025-02-03t5401: prefer test_path_is_* helper functionambar chakravartty1-8/+8
"test -f" does not provide a nice error message when we hit test failures, so use test_path_is_file instead. Signed-off-by: ambar chakravartty <amch9605@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03The seventh batchJunio C Hamano1-0/+15
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'kn/reflog-migration-fix-fix'Junio C Hamano4-10/+30
Fix bugs in an earlier attempt to fix "git refs migration". * kn/reflog-migration-fix-fix: refs/reftable: fix uninitialized memory access of `max_index` reftable: write correct max_update_index to header
2025-02-03Merge branch 'kn/pack-write-with-reduced-globals'Junio C Hamano9-74/+106
Code clean-up. * kn/pack-write-with-reduced-globals: pack-write: pass hash_algo to internal functions pack-write: pass hash_algo to `write_rev_file()` pack-write: pass hash_algo to `write_idx_file()` pack-write: pass repository to `index_pack_lockfile()` pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-02-03Merge branch 'ps/build-meson-fixes'Junio C Hamano7-38/+218
More build fixes and enhancements on meson based build procedure. * ps/build-meson-fixes: ci: wire up Visual Studio build with Meson ci: raise error when Meson generates warnings meson: fix compilation with Visual Studio meson: make the CSPRNG backend configurable meson: wire up fuzzers meson: wire up generation of distribution archive meson: wire up development environments meson: fix dependencies for generated headers meson: populate project version via GIT-VERSION-GEN GIT-VERSION-GEN: allow running without input and output files GIT-VERSION-GEN: simplify computing the dirty marker
2025-02-03Merge branch 'ps/3.0-remote-deprecation'Junio C Hamano21-59/+132
Following the procedure we established to introduce breaking changes for Git 3.0, allow an early opt-in for removing support of $GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure remotes. * ps/3.0-remote-deprecation: remote: announce removal of "branches/" and "remotes/" builtin/pack-redundant: remove subcommand with breaking changes ci: repurpose "linux-gcc" job for deprecations ci: merge linux-gcc-default into linux-gcc Makefile: wire up build option for deprecated features
2025-02-03Merge branch 'jk/combine-diff-cleanup'Junio C Hamano4-184/+102
Code clean-up for code paths around combined diff. * jk/combine-diff-cleanup: tree-diff: make list tail-passing more explicit tree-diff: simplify emit_path() list management tree-diff: use the name "tail" to refer to list tail tree-diff: drop list-tail argument to diff_tree_paths() combine-diff: drop public declaration of combine_diff_path_size() tree-diff: inline path_appendnew() tree-diff: pass whole path string to path_appendnew() tree-diff: drop path_appendnew() alloc optimization run_diff_files(): de-mystify the size of combine_diff_path struct diff: add a comment about combine_diff_path.parent.path combine-diff: use pointer for parent paths tree-diff: clear parent array in path_appendnew() combine-diff: add combine_diff_path_new() run_diff_files(): delay allocation of combine_diff_path
2025-02-03Merge branch 'tb/unsafe-hash-cleanup'Junio C Hamano12-70/+107
The API around choosing to use unsafe variant of SHA-1 implementation has been updated in an attempt to make it harder to abuse. * tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper
2025-02-03ci: set CI_JOB_IMAGE for coverity jobJeff King1-1/+1
The main GitHub Actions workflow switched away from the "$distro" variable in b133d3071a (github: simplify computation of the job's distro, 2025-01-10). Since the Coverity job also depends on our ci/install-dependencies.sh script, it needs to likewise set CI_JOB_IMAGE to find the correct dependencies (without this patch, we don't install curl and the build fails). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'ps/ci-misc-updates' into jk/ci-coverity-updateJunio C Hamano7-95/+100
* ps/ci-misc-updates: ci: remove stale code for Azure Pipelines ci: use latest Ubuntu release ci: stop special-casing for Ubuntu 16.04 gitlab-ci: add linux32 job testing against i386 gitlab-ci: remove the "linux-old" job github: simplify computation of the job's distro github: convert all Linux jobs to be containerized github: adapt containerized jobs to be rootless t7422: fix flaky test caused by buffered stdout t0060: fix EBUSY in MinGW when setting up runtime prefix
2025-01-31t/unit-tests: convert strcmp-offset test to use clar test frameworkSeyi Kuforiji4-37/+47
Adapt strcmp-offset test script to clar framework by using clar assertions where necessary. Introduce `test_strcmp_offset__empty()` to verify `check_strcmp_offset()` behavior when both input strings are empty. This ensures the function correctly handles edge cases and returns expected values. Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31t/unit-tests: convert strbuf test to use clar test frameworkSeyi Kuforiji4-124/+121
Adapt strbuf test script to clar framework by using clar assertions where necessary. Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31t/unit-tests: adapt example decorate test to use clar test frameworkSeyi Kuforiji4-76/+66
Introduce `test_example_decorate__initialize()` to explicitly set up object IDs and retrieve corresponding objects before tests run. This ensures a consistent and predictable test state without relying on data from previous tests. Add `test_example_decorate__cleanup()` to clear decorations after each test, preventing interference between tests and ensuring each runs in isolation. Adapt example decorate test script to clar framework by using clar assertions where necessary. Previously, tests relied on data written by earlier tests, leading to unintended dependencies between them. This explicitly initializes the necessary state within `test_example_decorate__readd`, ensuring it does not depend on prior test executions. Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31t/unit-tests: convert hashmap test to use clar test frameworkSeyi Kuforiji3-116/+114
Adapts hashmap test script to clar framework by using clar assertions where necessary. Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31ci: fix base commit fallback for check-whitespace and check-styleJustin Tobler1-2/+2
The check-whitespace and check-style CI scripts require a base commit. In GitLab CI, the base commit can be provided by several different predefined CI variables depending on the type of pipeline being performed. In 30c4f7e350 (check-whitespace: detect if no base_commit is provided, 2024-07-23), the GitLab check-whitespace CI job was modified to support CI_MERGE_REQUEST_DIFF_BASE_SHA as a fallback base commit if CI_MERGE_REQUEST_TARGET_BRANCH_SHA was not provided. The same fallback strategy was also implemented for the GitLab check-style CI job in bce7e52d4e (ci: run style check on GitHub and GitLab, 2024-07-23). The base commit fallback is implemented using shell parameter expansion where, if the first variable is unset, the second variable is used as fallback. In GitLab CI, these variables can be set but null. This has the unintended effect of selecting an empty first variable which results in CI jobs providing an invalid base commit and failing. Fix the issue by defaulting to the fallback variable if the first is unset or null. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31global: adapt callers to use generic hash context helpersPatrick Steinhardt19-107/+103
Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31hash: provide generic wrappers to update hash contextsPatrick Steinhardt2-0/+27
The hash context is supposed to be updated via the `git_hash_algo` structure, which contains a list of function pointers to update, clone or finalize a hashing context. This requires the callers to track which algorithm was used to initialize the context and continue to use the exact same algorithm. If they fail to do that correctly, it can happen that we start to access context state of one hash algorithm with functions of a different hash algorithm. The result would typically be a segfault, as could be seen e.g. in the patches part of 98422943f0 (Merge branch 'ps/weak-sha1-for-tail-sum-fix', 2025-01-01). The situation was significantly improved starting with 04292c3796 (hash.h: drop unsafe_ function variants, 2025-01-23) and its parent commits. These refactorings ensure that it is not possible to mix up safe and unsafe variants of the same hash algorithm anymore. But in theory, it is still possible to mix up different hash algorithms with each other, even though this is a lot less likely to happen. But still, we can do better: instead of asking the caller to remember the hash algorithm used to initialize a context, we can instead make the context itself remember which algorithm it has been initialized with. If we do so, callers can use a set of generic helpers to update the context and don't need to be aware of the hash algorithm at all anymore. Adapt the context initialization functions to store the hash algorithm in the hashing context and introduce these generic helpers. Callers will be adapted in the subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31hash: stop typedeffing the hash contextPatrick Steinhardt22-74/+73
We generally avoid using `typedef` in the Git codebase. One exception though is the `git_hash_ctx`, likely because it used to be a union rather than a struct until the preceding commit refactored it. But now that it is a normal `struct` there isn't really a need for a typedef anymore. Drop the typedef and adapt all callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31hash: convert hashing context to a structurePatrick Steinhardt2-21/+22
The `git_hash_context` is a union containing the different hash-specific states for SHA1, its unsafe variant as well as SHA256. We know that only one of these states will ever be in use at the same time because hash contexts cannot be used for multiple different hashes at the same point in time. We're about to extend the structure though to keep track of the hash algorithm used to initialize the context, which is impossible to do while the context is a union. Refactor it to instead be a structure that contains the union of context states. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31Merge branch 'tb/unsafe-hash-cleanup' into ps/hash-cleanupJunio C Hamano12-70/+107
* tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper
2025-01-31The sixth batchJunio C Hamano1-0/+7
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31Merge branch 'jc/show-index-h-update'Junio C Hamano2-2/+2
Doc and short-help text for "show-index" has been clarified to stress that the command reads its data from the standard input. * jc/show-index-h-update: show-index: the short help should say the command reads from its input
2025-01-31Merge branch 'ja/doc-notes-markup-updates'Junio C Hamano2-111/+112
Doc mark-up updates. * ja/doc-notes-markup-updates: doc: convert git-notes to new documentation format
2025-01-31Merge branch 'sk/strlen-returns-size_t'Junio C Hamano1-3/+3
Code clean-up. * sk/strlen-returns-size_t: date.c: Fix type missmatch warings from msvc
2025-01-31Merge branch 'ja/doc-restore-markup-update'Junio C Hamano1-55/+55
Doc mark-up updates. * ja/doc-restore-markup-update: doc: convert git-restore to new style format
2025-01-30setup: fix reinit of repos with incompatible GIT_DEFAULT_HASHPatrick Steinhardt2-1/+15
The exact same issue as described in the preceding commit also exists for GIT_DEFAULT_HASH. Thus, reinitializing a repository that e.g. uses SHA1 with `GIT_DEFAULT_HASH=sha256 git init` will cause the object format of that repository to change to SHA256. This is of course bogus as any existing objects and refs will not be converted, thus causing repository corruption: $ git init repo Initialized empty Git repository in /tmp/repo/.git/ $ cd repo/ $ git commit --allow-empty -m message [main (root-commit) 35a7344] message $ GIT_DEFAULT_HASH=sha256 git init Reinitialized existing Git repository in /tmp/repo/.git/ $ git show fatal: your current branch appears to be broken Fix the issue by ignoring the environment variable in case the repo has already been initialized with an object hash. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-30setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMATPatrick Steinhardt2-1/+12
The GIT_DEFAULT_REF_FORMAT environment variable can be set to influence the default ref format that new repostiories shall be initialized with. While this is the expected behaviour when creating a new repository, it is not when reinitializing a repository: we should retain the ref format currently used by it in that case. This doesn't work correctly right now: $ git init --ref-format=files repo Initialized empty Git repository in /tmp/repo/.git/ $ GIT_DEFAULT_REF_FORMAT=reftable git init repo fatal: could not open '/tmp/repo/.git/refs/heads' for writing: Is a directory Instead of retaining the current ref format, the reinitialization tries to reinitialize the repository with the different format. This action fails when git-init(1) tries to write the ".git/refs/heads" stub, which in the context of the reftable backend is always written as a file so that we can detect clients which inadvertently try to access the repo with the wrong ref format. Seems like the protection mechanism works for this case, as well. Fix the issue by ignoring the environment variable in case the repo has already been initialized with a ref storage format. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-30t0001: remove duplicate testPatrick Steinhardt1-9/+0
The test in question is an exact copy of the testcase preceding it. Remove it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-30apply: detect overflow when parsing hunk headerPhillip Wood2-0/+16
"git apply" uses strtoul() to parse the numbers in the hunk header but silently ignores overflows. As LONG_MAX is a legitimate return value for strtoul() we need to set errno to zero before the call to strtoul() and check that it is still zero afterwards. The error message we display is not particularly helpful as it does not say what was wrong. However, it seems pretty unlikely that users are going to trigger this error in practice and we can always improve it later if needed. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-30scalar: free result of `remote_default_branch()`Patrick Steinhardt1-1/+3
We don't free the result of `remote_default_branch()`, leading to a memory leak. This leak is exposed by t9211, but only when run with Meson with the `-Db_sanitize=leak` option: Direct leak of 5 byte(s) in 1 object(s) allocated from: #0 0x5555555cfb93 in malloc (scalar+0x7bb93) #1 0x5555556b05c2 in do_xmalloc ../wrapper.c:55:8 #2 0x5555556b06c4 in do_xmallocz ../wrapper.c:89:8 #3 0x5555556b0656 in xmallocz ../wrapper.c:97:9 #4 0x5555556b0728 in xmemdupz ../wrapper.c:113:16 #5 0x5555556b07a7 in xstrndup ../wrapper.c:119:9 #6 0x5555555d3a4b in remote_default_branch ../scalar.c:338:14 #7 0x5555555d20e6 in cmd_clone ../scalar.c:493:28 #8 0x5555555d196b in cmd_main ../scalar.c:992:14 #9 0x5555557c4059 in main ../common-main.c:64:11 #10 0x7ffff7a2a1fb in __libc_start_call_main (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0) #11 0x7ffff7a2a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0) #12 0x555555592054 in _start (scalar+0x3e054) DEDUP_TOKEN: __interceptor_malloc--do_xmalloc--do_xmallocz--xmallocz--xmemdupz--xstrndup--remote_default_branch--cmd_clone--cmd_main--main--__libc_start_call_main--__libc_start_main@GLIBC_2.2.5--_start SUMMARY: LeakSanitizer: 5 byte(s) leaked in 1 allocation(s). As the `branch` variable may contain a string constant obtained from parsing command line arguments we cannot free the leaking variable directly. Instead, introduce a new `branch_to_free` variable that only ever gets assigned the allocated string and free that one to plug the leak. It is unclear why the leak isn't flagged when running the test via our Makefile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-30unix-socket: fix memory leak when chdir(3p) failsPatrick Steinhardt1-1/+3
When trying to create a Unix socket in a path that exceeds the maximum socket name length we try to first change the directory into the parent folder before creating the socket to reduce the length of the name. When this fails we error out of `unix_sockaddr_init()` with an error code, which indicates to the caller that the context has not been initialized. Consequently, they don't release that context. This leads to a memory leak: when we have already populated the context with the original directory that we need to chdir(3p) back into, but then the chdir(3p) into the socket's parent directory fails, then we won't release the original directory's path. The leak is exposed by t0301, but only when running tests in a directory hierarchy whose path is long enough to make the socket name length exceed the maximum socket name length: Direct leak of 129 byte(s) in 1 object(s) allocated from: #0 0x5555555e85c6 in realloc.part.0 lsan_interceptors.cpp.o #1 0x55555590e3d6 in xrealloc ../wrapper.c:140:8 #2 0x5555558c8fc6 in strbuf_grow ../strbuf.c:114:2 #3 0x5555558cacab in strbuf_getcwd ../strbuf.c:605:3 #4 0x555555923ff6 in unix_sockaddr_init ../unix-socket.c:65:7 #5 0x555555923e42 in unix_stream_connect ../unix-socket.c:84:6 #6 0x55555562a984 in send_request ../builtin/credential-cache.c:46:11 #7 0x55555562a89e in do_cache ../builtin/credential-cache.c:108:6 #8 0x55555562a655 in cmd_credential_cache ../builtin/credential-cache.c:178:3 #9 0x555555700547 in run_builtin ../git.c:480:11 #10 0x5555556ff0e0 in handle_builtin ../git.c:740:9 #11 0x5555556ffee8 in run_argv ../git.c:807:4 #12 0x5555556fee6b in cmd_main ../git.c:947:19 #13 0x55555593f689 in main ../common-main.c:64:11 #14 0x7ffff7a2a1fb in __libc_start_call_main (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0) #15 0x7ffff7a2a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0) #16 0x5555555ad1d4 in _start (git+0x591d4) DEDUP_TOKEN: ___interceptor_realloc.part.0--xrealloc--strbuf_grow--strbuf_getcwd--unix_sockaddr_init--unix_stream_connect--send_request--do_cache--cmd_credential_cache--run_builtin--handle_builtin--run_argv--cmd_main--main--__libc_start_call_main--__libc_start_main@GLIBC_2.2.5--_start SUMMARY: LeakSanitizer: 129 byte(s) leaked in 1 allocation(s). Fix this leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-29libgit: add higher-level libgit crateCalvin Wan13-8/+242
The C functions exported by libgit-sys do not provide an idiomatic Rust interface. To make it easier to use these functions via Rust, add a higher-level "libgit" crate, that wraps the lower-level configset API with an interface that is more Rust-y. This combination of $X and $X-sys crates is a common pattern for FFI in Rust, as documented in "The Cargo Book" [1]. [1] https://doc.rust-lang.org/cargo/reference/build-scripts.html#-sys-packages Co-authored-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-29libgit-sys: also export some config_set functionsJosh Steadmon3-1/+76
In preparation for implementing a higher-level Rust API for accessing Git configs, export some of the upstream configset API via libgitpub and libgit-sys. Since this will be exercised as part of the higher-level API in the next commit, no tests have been added for libgit-sys. While we're at it, add git_configset_alloc() and git_configset_free() functions in libgitpub so that callers can manage config_set structs on the heap. This also allows non-C external consumers to treat config_sets as opaque structs. Co-authored-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-29The fifth batchJunio C Hamano1-0/+21
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-29Merge branch 'ps/reflog-migration-with-logall-fix'Junio C Hamano2-1/+18
The "git refs migrate" command did not migrate the reflog for refs/stash, which is the contents of the stashes, which has been corrected. * ps/reflog-migration-with-logall-fix: refs: fix migration of reflogs respecting "core.logAllRefUpdates"
2025-01-29Merge branch 'am/trace2-with-valueless-true'Junio C Hamano5-6/+18
The trace2 code was not prepared to show a configuration variable that is set to true using the valueless true syntax, which has been corrected. * am/trace2-with-valueless-true: trace2: prevent segfault on config collection with valueless true
2025-01-29Merge branch 'kn/reflog-symref-fix'Junio C Hamano2-3/+9
reflog entries for symbolic ref updates were broken, which has been corrected. * kn/reflog-symref-fix: refs: fix creation of reflog entries for symrefs
2025-01-29Merge branch 'rs/ref-fitler-used-atoms-value-fix'Junio C Hamano8-72/+121
"git branch --sort=..." and "git for-each-ref --format=... --sort=..." did not work as expected with some atoms, which has been corrected. * rs/ref-fitler-used-atoms-value-fix: ref-filter: remove ref_format_clear() ref-filter: move is-base tip to used_atom ref-filter: move ahead-behind bases into used_atom
2025-01-29Merge branch 'ja/doc-commit-markup-updates'Junio C Hamano5-159/+161
Doc updates. * ja/doc-commit-markup-updates: doc: migrate git-commit manpage secondary files to new format doc: convert git commit config to new format doc: make more direct explanations in git commit options doc: the mode param of -u of git commit is optional doc: apply new documentation guidelines to git commit
2025-01-29Merge branch 'ds/path-walk-1'Junio C Hamano12-0/+1220
Introduce a new API to visit objects in batches based on a common path, or by type. * ds/path-walk-1: path-walk: drop redundant parse_tree() call path-walk: reorder object visits path-walk: mark trees and blobs as UNINTERESTING path-walk: visit tags and cached objects path-walk: allow consumer to specify object types t6601: add helper for testing path-walk API test-lib-functions: add test_cmp_sorted path-walk: introduce an object walk by path
2025-01-28libgit-sys: introduce Rust wrapper for libgit.aJosh Steadmon10-0/+263
Introduce libgit-sys, a Rust wrapper crate that allows Rust code to call functions in libgit.a. This initial patch defines build rules and an interface that exposes user agent string getter functions as a proof of concept. This library can be tested with `cargo test`. In later commits, a higher-level library containing a more Rust-friendly interface will be added at `contrib/libgit-rs`. Symbols in libgit can collide with symbols from other libraries such as libgit2. We avoid this by first exposing library symbols in public_symbol_export.[ch]. These symbols are prepended with "libgit_" to avoid collisions and set to visible using a visibility pragma. In build.rs, Rust builds contrib/libgit-rs/libgit-sys/libgitpub.a, which also contains libgit.a and other dependent libraries, with -fvisibility=hidden to hide all symbols within those libraries that haven't been exposed with a visibility pragma. Co-authored-by: Kyle Lippincott <spectral@google.com> Co-authored-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28common-main: split init and exit code into new filesJosh Steadmon6-81/+101
Currently, object files in libgit.a reference common_exit(), which is contained in common-main.o. However, common-main.o also includes main(), which references cmd_main() in git.o, which in turn depends on all the builtin/*.o objects. We would like to allow external users to link libgit.a without needing to include so many extra objects. Enable this by splitting common_exit() and check_bug_if_BUG() into a new file common-exit.c, and add common-exit.o to LIB_OBJS so that these are included in libgit.a. This split has previously been proposed ([1], [2]) to support fuzz tests and unit tests by avoiding conflicting definitions for main(). However, both of those issues were resolved by other methods of avoiding symbol conflicts. Now we are trying to make libgit.a more self-contained, so hopefully we can revisit this approach. Additionally, move the initialization code out of main() into a new init_git() function in its own file. Include this in libgit.a as well, so that external users can share our setup code without calling our main(). [1] https://lore.kernel.org/git/Yp+wjCPhqieTku3X@google.com/ [2] https://lore.kernel.org/git/20230517-unit-tests-v2-v2-1-21b5b60f4b32@google.com/ Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28ci: make "linux-musl" job use zlib-ngPatrick Steinhardt1-1/+1
We don't yet have any test coverage for the new zlib-ng backend as part of our CI. Add it by installing zlib-ng in Alpine Linux, which causes Meson to pick it up automatically. Note that we are somewhat limited with regards to where we run that job: Debian-based distributions don't have zlib-ng in their repositories, Fedora has it but doesn't run tests, and Alma Linux doesn't have the package either. Alpine Linux does have it available and is running our test suite, which is why it was picked. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28ci: switch linux-musl to use MesonPatrick Steinhardt7-9/+9
Switch over the "linux-musl" job to use Meson instead of Makefiles. This is done due to multiple reasons: - It simplifies our CI infrastructure a bit as we don't have to manually specify a couple of build options anymore. - It verifies that Meson detects and sets those build options automatically. - It makes it easier for us to wire up a new CI job using zlib-ng as backend. One platform compatibility that Meson cannot easily detect automatically is the `GIT_TEST_UTF8_LOCALE` variable used in tests. Wire up a build option for it, which we set via a new "MESONFLAGS" environment variable. Note that we also drop the CC variable, which is set to "gcc". We already default to GCC when CC is unset in "ci/lib.sh", so this is not needed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28compat/zlib: allow use of zlib-ng as backendPatrick Steinhardt4-15/+64
The zlib-ng library is a hard fork of the old and venerable zlib library. It describes itself as zlib replacement with optimizations for "next generation" systems. As such, it contains several implementations of central algorithms using for example SSE2, AVX2 and other vectorized CPU intrinsics that supposedly speed up in- and deflating data. And indeed, compiling Git against zlib-ng leads to a significant speedup when reading objects. The following benchmark uses git-cat-file(1) with `--batch --batch-all-objects` in the Git repository: Benchmark 1: zlib Time (mean ± σ): 52.085 s ± 0.141 s [User: 51.500 s, System: 0.456 s] Range (min … max): 52.004 s … 52.335 s 5 runs Benchmark 2: zlib-ng Time (mean ± σ): 40.324 s ± 0.134 s [User: 39.731 s, System: 0.490 s] Range (min … max): 40.135 s … 40.484 s 5 runs Summary zlib-ng ran 1.29 ± 0.01 times faster than zlib So we're looking at a ~25% speedup compared to zlib. This is of course an extreme example, as it makes us read through all objects in the repository. But regardless, it should be possible to see some sort of speedup in most commands that end up accessing the object database. The zlib-ng library provides a compatibility layer that makes it a proper drop-in replacement for zlib: nothing needs to change in the build system to support it. Unfortunately though, this mode isn't easy to use on most systems because distributions do not allow you to install zlib-ng in that way, as that would mean that the zlib library would be globally replaced. Instead, many distributions provide a package that installs zlib-ng without the compatibility layer. This version does provide effectively the same APIs like zlib does, but all of the symbols are prefixed with `zng_` to avoid symbol collisions. Implement a new build option that allows us to link against zlib-ng directly. If set, we redefine zlib symbols so that we use the `zng_` prefixed versions thereof provided by that library. Like this, it becomes possible to install both zlib and zlib-ng (without the compat layer) and then pick whichever library one wants to link against for Git. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28git-zlib: cast away potential constness of `next_in` pointerPatrick Steinhardt1-1/+2
The `struct git_zstream::next_in` variable points to the input data and is used in combination with `struct z_stream::next_in`. While that latter field is not marked as a constant in zlib, it is marked as such in zlib-ng. This causes a couple of compiler errors when we try to assign these fields to one another due to mismatching constness. Fix the issue by casting away the potential constness of `next_in`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28compat/zlib: provide stubs for `deflateSetHeader()`Patrick Steinhardt2-4/+19
The function `deflateSetHeader()` has been introduced with zlib v1.2.2.1, so we don't use it when linking against an older version of it. Refactor the code to instead provide a central stub via "compat/zlib.h" so that we can adapt it based on whether or not we use zlib-ng in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28compat/zlib: provide `deflateBound()` shim centrallyPatrick Steinhardt2-4/+4
The `deflateBound()` function has only been introduced with zlib 1.2.0. When linking against a zlib version older than that we thus provide our own compatibility shim. Move this shim into "compat/zlib.h" so that we can adapt it based on whether or not we use zlib-ng in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"Patrick Steinhardt8-4/+8
We include "compat/zlib.h" in "git-compat-util.h", which is unnecessarily broad given that we only have a small handful of files that use the zlib library. Move the header into "git-zlib.h" instead and adapt users of zlib to include that header. One exception is the reftable library, as we don't want to use the Git-specific wrapper of zlib there, so we include "compat/zlib.h" instead. Furthermore, we move the include into "reftable/system.h" so that users of the library other than Git can wire up zlib themselves. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28compat: introduce new "zlib.h" headerPatrick Steinhardt3-2/+8
Introduce a new "compat/zlib-compat.h" header that we include instead of including <zlib.h> directly. This will allow us to wire up zlib-ng as an alternative backend for zlib compression in a subsequent commit. Note that we cannot just call the file "compat/zlib.h", as that may otherwise cause us to include that file instead of <zlib.h>. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28git-compat-util: drop `z_const` definePatrick Steinhardt1-1/+0
Before including <zlib.h> we explicitly define `z_const` to an empty value. This has the effect that the `z_const` macro in "zconf.h" itself will remain empty instead of being defined as `const`, which effectively adapts a couple of APIs so that their parameters are not marked as being constants. It is dubious though whether this is something we actually want: not marking a parameter as a constant doesn't make it any less constant than it was. The define was added via 07564773c2 (compat: auto-detect if zlib has uncompress2(), 2022-01-24), where it was seemingly carried over from our internal compatibility shim for `uncompress2()` that was removed in the preceding commit. The commit message doesn't mention why we carry over the define and make it public, either, and I cannot think of any reason for why we would want to have it. Drop the define. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28compat: drop `uncompress2()` compatibility shimPatrick Steinhardt4-107/+0
Our compat library has an implementation of zlib's `uncompress2()` function that gets used when linking against an old version of zlib that doesn't yet have it. The last user of `uncompress2()` got removed in 15a60b747e (reftable/block: open-code call to `uncompress2()`, 2024-04-08), so the compatibility code is not required anymore. Drop it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28The fourth batchJunio C Hamano1-0/+20
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28Merge branch 'jp/t8002-printf-fix'Junio C Hamano1-2/+2
Test fix. * jp/t8002-printf-fix: t8002: fix ambiguous printf conversion specifications
2025-01-28Merge branch 'ps/reftable-sign-compare'Junio C Hamano19-151/+157
The reftable/ library code has been made -Wsign-compare clean. * ps/reftable-sign-compare: reftable: address trivial -Wsign-compare warnings reftable/blocksource: adjust `read_block()` to return `ssize_t` reftable/blocksource: adjust type of the block length reftable/block: adjust type of the restart length reftable/block: adapt header and footer size to return a `size_t` reftable/basics: adjust `hash_size()` to return `uint32_t` reftable/basics: adjust `common_prefix_size()` to return `size_t` reftable/record: handle overflows when decoding varints reftable/record: drop unused `print` function pointer meson: stop disabling -Wsign-compare
2025-01-28Merge branch 'mh/credential-cache-authtype-request-fix'Junio C Hamano2-2/+17
The "cache" credential back-end did not handle authtype correctly, which has been corrected. * mh/credential-cache-authtype-request-fix: credential-cache: respect authtype capability
2025-01-28Merge branch 'jk/pack-header-parse-alignment-fix'Junio C Hamano6-50/+64
It was possible for "git unpack-objects" and "git index-pack" to make an unaligned access, which has been corrected. * jk/pack-header-parse-alignment-fix: index-pack, unpack-objects: use skip_prefix to avoid magic number index-pack, unpack-objects: use get_be32() for reading pack header parse_pack_header_option(): avoid unaligned memory writes packfile: factor out --pack_header argument parsing bswap.h: squelch potential sparse -Wcast-truncate warnings
2025-01-28Merge branch 'ps/build-meson-subtree'Junio C Hamano6-10/+95
The meson-driven build is now aware of "git-subtree" housed in contrib/subtree hierarchy. * ps/build-meson-subtree: meson: wire up the git-subtree(1) command meson: introduce build option for contrib contrib/subtree: fix building docs
2025-01-28Merge branch 'mh/connect-sign-compare'Junio C Hamano1-12/+11
The code in connect.c has been updated to work around complaints from -Wsign-compare. * mh/connect-sign-compare: connect: address -Wsign-compare warnings
2025-01-28Merge branch 'sk/unit-tests'Junio C Hamano8-147/+137
Move a few more unit tests to the clar test framework. * sk/unit-tests: t/unit-tests: convert reftable tree test to use clar test framework t/unit-tests: adapt priority queue test to use clar test framework t/unit-tests: convert mem-pool test to use clar test framework t/unit-tests: handle dashes in test suite filenames
2025-01-28Merge branch 'jc/show-usage-help'Junio C Hamano41-64/+121
The help text from "git $cmd -h" appear on the standard output for some $cmd and the standard error for others. The built-in commands have been fixed to show them on the standard output consistently. * jc/show-usage-help: builtin: send usage() help text to standard output oddballs: send usage() help text to standard output builtins: send usage_with_options() help text to standard output usage: add show_usage_if_asked() parse-options: add show_usage_with_options_if_asked() t0012: optionally check that "-h" output goes to stdout
2025-01-27pack-objects: prevent name hash version changeDerrick Stolee1-0/+8
When the --name-hash-version option is used in 'git pack-objects', it can change from the initial assignment to when it is used based on interactions with other arguments. Specifically, when writing or reading bitmaps, we must force version 1 for now. This could change in the future when the bitmap format can store a name hash version value, indicating which was used during the writing of the packfile. Protect the 'git pack-objects' process from getting confused by failing with a BUG() statement if the value of the name hash version changes between calls to pack_name_hash_fn(). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27test-tool: add helper for name-hash valuesDerrick Stolee6-0/+87
Add a new test-tool helper, name-hash, to output the value of the name-hash algorithms for the input list of strings, one per line. Since the name-hash values can be stored in the .bitmap files, it is important that these hash functions do not change across Git versions. Add a simple test to t5310-pack-bitmaps.sh to provide some testing of the current values. Due to how these functions are implemented, it would be difficult to change them without disturbing these values. The paths used for this test are carefully selected to demonstrate some of the behavior differences of the two current name hash versions, including which conditions will cause them to collide. Create a performance test that uses test_size to demonstrate how collisions occur for these hash algorithms. This test helps inform someone as to the behavior of the name-hash algorithms for their repo based on the paths at HEAD. My copy of the Git repository shows modest statistics around the collisions of the default name-hash algorithm: Test this tree -------------------------------------------------- 5314.1: paths at head 4.5K 5314.2: distinct hash value: v1 4.1K 5314.3: maximum multiplicity: v1 13 5314.4: distinct hash value: v2 4.2K 5314.5: maximum multiplicity: v2 9 Here, the maximum collision multiplicity is 13, but around 10% of paths have a collision with another path. In a more interesting example, the microsoft/fluentui [1] repo had these statistics at time of committing: Test this tree -------------------------------------------------- 5314.1: paths at head 19.5K 5314.2: distinct hash value: v1 8.2K 5314.3: maximum multiplicity: v1 279 5314.4: distinct hash value: v2 17.8K 5314.5: maximum multiplicity: v2 44 [1] https://github.com/microsoft/fluentui That demonstrates that of the nearly twenty thousand path names, they are assigned around eight thousand distinct values. 279 paths are assigned to a single value, leading the packing algorithm to sort objects from those paths together, by size. With the v2 name hash function, the maximum multiplicity lowers to 44, leaving some room for further improvement. In a more extreme example, an internal monorepo had a much worse collision rate: Test this tree -------------------------------------------------- 5314.1: paths at head 227.3K 5314.2: distinct hash value: v1 72.3K 5314.3: maximum multiplicity: v1 14.4K 5314.4: distinct hash value: v2 166.5K 5314.5: maximum multiplicity: v2 138 Here, we can see that the v2 name hash function provides somem improvements, but there are still a number of collisions that could lead to repacking problems at this scale. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27p5313: add size comparison testDerrick Stolee1-0/+70
As custom options are added to 'git pack-objects' and 'git repack' to adjust how compression is done, use this new performance test script to demonstrate their effectiveness in performance and size. The recently-added --name-hash-version option allows for testing different name hash functions. Version 2 intends to preserve some of the locality of version 1 while more often breaking collisions due to long filenames. Distinguishing objects by more of the path is critical when there are many name hash collisions and several versions of the same path in the full history, giving a significant boost to the full repack case. The locality of the hash function is critical to compressing something like a shallow clone or a thin pack representing a push of a single commit. This can be seen by running pt5313 on the open source fluentui repository [1]. Most commits will have this kind of output for the thin and big pack cases, though certain commits (such as [2]) will have problematic thin pack size for other reasons. [1] https://github.com/microsoft/fluentui [2] a637a06df05360ce5ff21420803f64608226a875 Checked out at the parent of [2], I see the following statistics: Test HEAD --------------------------------------------------------------- 5313.2: thin pack with version 1 0.37(0.44+0.02) 5313.3: thin pack size with version 1 1.2M 5313.4: big pack with version 1 2.04(7.77+0.23) 5313.5: big pack size with version 1 20.4M 5313.6: shallow fetch pack with version 1 1.41(2.94+0.11) 5313.7: shallow pack size with version 1 34.4M 5313.8: repack with version 1 95.70(676.41+2.87) 5313.9: repack size with version 1 439.3M 5313.10: thin pack with version 2 0.12(0.12+0.06) 5313.11: thin pack size with version 2 22.0K 5313.12: big pack with version 2 2.80(5.43+0.34) 5313.13: big pack size with version 2 25.9M 5313.14: shallow fetch pack with version 2 1.77(2.80+0.19) 5313.15: shallow pack size with version 2 33.7M 5313.16: repack with version 2 33.68(139.52+2.58) 5313.17: repack size with version 2 160.5M To make comparisons easier, I will reformat this output into a different table style: | Test | V1 Time | V2 Time | V1 Size | V2 Size | |--------------|---------|---------|---------|---------| | Thin Pack | 0.37 s | 0.12 s | 1.2 M | 22.0 K | | Big Pack | 2.04 s | 2.80 s | 20.4 M | 25.9 M | | Shallow Pack | 1.41 s | 1.77 s | 34.4 M | 33.7 M | | Repack | 95.70 s | 33.68 s | 439.3 M | 160.5 M | The v2 hash function successfully differentiates the CHANGELOG.md files from each other, which leads to significant improvements in the thin pack (simulating a push of this commit) and the full repack. There is some bloat in the "big pack" scenario and essentially the same results for the shallow pack. In the case of the Git repository, these numbers show some of the issues with this approach: | Test | V1 Time | V2 Time | V1 Size | V2 Size | |--------------|---------|---------|---------|---------| | Thin Pack | 0.02 s | 0.02 s | 1.1 K | 1.1 K | | Big Pack | 1.69 s | 1.95 s | 13.5 M | 14.5 M | | Shallow Pack | 1.26 s | 1.29 s | 12.0 M | 12.2 M | | Repack | 29.51 s | 29.01 s | 237.7 M | 238.2 M | Here, the attempts to remove conflicts in the v2 function seem to cause slight bloat to these sizes. This shows that the Git repository benefits a lot from cross-path delta pairs. The results are similar with the nodejs/node repo: | Test | V1 Time | V2 Time | V1 Size | V2 Size | |--------------|---------|---------|---------|---------| | Thin Pack | 0.02 s | 0.02 s | 1.6 K | 1.6 K | | Big Pack | 4.61 s | 3.26 s | 56.0 M | 52.8 M | | Shallow Pack | 7.82 s | 7.51 s | 104.6 M | 107.0 M | | Repack | 88.90 s | 73.75 s | 740.1 M | 764.5 M | Here, the v2 name-hash causes some size bloat more often than it reduces the size, but it also universally improves performance time, which is an interesting reversal. This must mean that it is helping to short-circuit some delta computations even if it is not finding the most efficient ones. The performance improvement cannot be explained only due to the I/O cost of writing the resulting packfile. The Linux kernel repository was the initial target of the default name hash value, and its naming conventions are practically build to take the most advantage of the default name hash values: | Test | V1 Time | V2 Time | V1 Size | V2 Size | |--------------|----------|----------|---------|---------| | Thin Pack | 0.17 s | 0.07 s | 4.6 K | 4.6 K | | Big Pack | 17.88 s | 12.35 s | 201.1 M | 159.1 M | | Shallow Pack | 11.05 s | 22.94 s | 269.2 M | 273.8 M | | Repack | 727.39 s | 566.95 s | 2.5 G | 2.5 G | Here, the thin and big packs gain some performance boosts in time, with a modest gain in the size of the big pack. The shallow pack, however, is more expensive to compute, likely because similarly-named files across different directories are farther apart in the name hash ordering in v2. The repack also gains benefits in computation time but no meaningful change to the full size. Finally, an internal Javascript repo of moderate size shows significant gains when repacking with --name-hash-version=2 due to it having many name hash collisions. However, it's worth noting that only the full repack case has significant differences from the v1 name hash: | Test | V1 Time | V2 Time | V1 Size | V2 Size | |-----------|-----------|----------|---------|---------| | Thin Pack | 8.28 s | 7.28 s | 16.8 K | 16.8 K | | Big Pack | 12.81 s | 11.66 s | 29.1 M | 29.1 M | | Shallow | 4.86 s | 4.06 s | 42.5 M | 44.1 M | | Repack | 3126.50 s | 496.33 s | 6.2 G | 855.6 M | Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27pack-objects: add GIT_TEST_NAME_HASH_VERSIONDerrick Stolee9-10/+41
Add a new environment variable to opt-in to different values of the --name-hash-version=<n> option in 'git pack-objects'. This allows for extra testing of the feature without repeating all of the test scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add this to the linux-TEST-vars CI build as that test run is already overloaded. The behavior exposed by this test variable is of low risk and should be sufficient to allow manual testing when an issue arises. But this option isn't free. There are a few tests that change behavior with the variable enabled. First, there are a few tests that are very sensitive to certain delta bases being picked. These are both involving the generation of thin bundles and then counting their objects via 'git index-pack --fix-thin' which pulls the delta base into the new packfile. For these tests, disable the option as a decent long-term option. Second, there are some tests that compare the exact output of a 'git pack-objects' process when using bitmaps. The warning that ignores the --name-hash-version=2 and forces version 1 causes these tests to fail. Disable the environment variable to get around this issue. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27repack: add --name-hash-version optionDerrick Stolee5-3/+48
The new '--name-hash-version' option for 'git repack' is a simple pass-through to the underlying 'git pack-objects' subcommand. However, this subcommand may have other options and a temporary filename as part of the subcommand execution that may not be predictable or could change over time. The existing test_subcommand method requires an exact list of arguments for the subcommand. This is too rigid for our needs here, so create a new method, test_subcommand_flex. Use it to check that the --name-hash-version option is passing through. Since we are modifying the 'git repack' command, let's bring its usage in line with the Documentation's synopsis. This removes it from the allow list in t0450 so it will remain in sync in the future. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27pack-objects: add --name-hash-version optionDerrick Stolee3-6/+109
The previous change introduced a new pack_name_hash_v2() function that intends to satisfy much of the hash locality features of the existing pack_name_hash() function while also distinguishing paths with similar final components of their paths. This change adds a new --name-hash-version option for 'git pack-objects' to allow users to select their preferred function version. This use of an integer version allows for future expansion and a direct way to later store a name hash version in the .bitmap format. For now, let's consider how effective this mechanism is when repacking a repository with different name hash versions. Specifically, we will execute 'git pack-objects' the same way a 'git repack -adf' process would, except we include --name-hash-version=<n> for testing. On the Git repository, we do not expect much difference. All path names are short. This is backed by our results: | Stage | Pack Size | Repack Time | |-----------------------|-----------|-------------| | After clone | 260 MB | N/A | | --name-hash-version=1 | 127 MB | 129s | | --name-hash-version=2 | 127 MB | 112s | This example demonstrates how there is some natural overhead coming from the cloned copy because the server is hosting many forks and has not optimized for exactly this set of reachable objects. But the full repack has similar characteristics for both versions. Let's consider some repositories that are hitting too many collisions with version 1. First, let's explore the kinds of paths that are commonly causing these collisions: * "/CHANGELOG.json" is 15 characters, and is created by the beachball [1] tool. Only the final character of the parent directory can differentiate different versions of this file, but also only the two most-significant digits. If that character is a letter, then this is always a collision. Similar issues occur with the similar "/CHANGELOG.md" path, though there is more opportunity for differences In the parent directory. * Localization files frequently have common filenames but differentiates via parent directories. In C#, the name "/strings.resx.lcl" is used for these localization files and they will all collide in name-hash. [1] https://github.com/microsoft/beachball I've come across many other examples where some internal tool uses a common name across multiple directories and is causing Git to repack poorly due to name-hash collisions. One open-source example is the fluentui [2] repo, which uses beachball to generate CHANGELOG.json and CHANGELOG.md files, and these files have very poor delta characteristics when comparing against versions across parent directories. | Stage | Pack Size | Repack Time | |-----------------------|-----------|-------------| | After clone | 694 MB | N/A | | --name-hash-version=1 | 438 MB | 728s | | --name-hash-version=2 | 168 MB | 142s | [2] https://github.com/microsoft/fluentui In this example, we see significant gains in the compressed packfile size as well as the time taken to compute the packfile. Using a collection of repositories that use the beachball tool, I was able to make similar comparisions with dramatic results. While the fluentui repo is public, the others are private so cannot be shared for reproduction. The results are so significant that I find it important to share here: | Repo | --name-hash-version=1 | --name-hash-version=2 | |----------|-----------------------|-----------------------| | fluentui | 440 MB | 161 MB | | Repo B | 6,248 MB | 856 MB | | Repo C | 37,278 MB | 6,755 MB | | Repo D | 131,204 MB | 7,463 MB | Future changes could include making --name-hash-version implied by a config value or even implied by default during a full repack. It is important to point out that the name hash value is stored in the .bitmap file format, so we must force --name-hash-version=1 when bitmaps are being read or written. Later, the bitmap format could be updated to be aware of the name hash version so deltas can be quickly computed across the bitmapped/not-bitmapped boundary. To promote the safety of this parameter, the validate_name_hash_version() method will die() if the given name-hash version is incorrect and will disable newer versions if not yet compatible with other features, such as --write-bitmap-index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27pack-objects: create new name-hash function versionJonathan Tan1-0/+28
As we will explore in later changes, the default name-hash function used in 'git pack-objects' has a tendency to cause collisions and cause poor delta selection. This change creates an alternative that avoids some collisions while preserving some amount of hash locality. The pack_name_hash() method has not been materially changed since it was introduced in ce0bd64 (pack-objects: improve path grouping heuristics., 2006-06-05). The intention here is to group objects by path name, but also attempt to group similar file types together by making the most-significant digits of the hash be focused on the final characters. Here's the crux of the implementation: /* * This effectively just creates a sortable number from the * last sixteen non-whitespace characters. Last characters * count "most", so things that end in ".c" sort together. */ while ((c = *name++) != 0) { if (isspace(c)) continue; hash = (hash >> 2) + (c << 24); } As the comment mentions, this only cares about the last sixteen non-whitespace characters. This cause some filenames to collide more than others. This collision is somewhat by design in order to promote hash locality for files that have similar types (.c, .h, .json) or could be the same file across a directory rename (a/foo.txt to b/foo.txt). This leads to decent cross-path deltas in cases like shallow clones or packing a repository with very few historical versions of files that share common data with other similarly-named files. However, when the name-hash instead leads to a large number of name-hash collisions for otherwise unrelated files, this can lead to confusing the delta calculation to prefer cross-path deltas over previous versions of the same file. The new pack_name_hash_v2() function attempts to fix this issue by taking more of the directory path into account through its hash function. Its naming implies that we will later wire up details for choosing a name-hash function by version. The first change is to be more careful about paths using non-ASCII characters. With these characters in mind, reverse the bits in the byte as the least-significant bits have the highest entropy and we want to maximize their influence. This is done with some bit manipulation that swaps the two halves, then the quarters within those halves, and then the bits within those quarters. The second change is to perform hash composition operations at every level of the path. This is done by storing a 'base' hash value that contains the hash of the parent directory. When reaching a directory boundary, we XOR the current level's name-hash value with a downshift of the previous level's hash. This perturbation intends to create low-bit distinctions for paths with the same final 16 bytes but distinct parent directory structures. The collision rate and effectiveness of this hash function will be explored in later changes as the function is integrated with 'git pack-objects' and 'git repack'. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27refs/reftable: fix uninitialized memory access of `max_index`Karthik Nayak1-3/+3
When migrating reflogs between reference backends, maintaining the original order of the reflog entries is crucial. To achieve this, an `index` field is stored within the `ref_update` struct that encodes the relative order of reflog entries. This field is used by the reftable backend as update index for the respective reflog entries to maintain that ordering. These update indices must be respected when writing table headers, which encode the minimum and maximum update index of contained records in the header and footer. This logic was added in commit bc67b4ab5f (reftable: write correct max_update_index to header, 2025-01-15), which started to use `reftable_writer_set_limits()` to propagate the mininum and maximum update index of all records contained in a ref transaction. However, we only set the maximum update index for the first transaction argument, even though there can be multiple such arguments. This is the case when we write to multiple stacks in a single transaction, e.g. when updating references in two different worktrees at once. Consequently, the update index for all but the first argument remain uninitialized, which may cause undefined behaviour. Fix this by moving the assignment of the maximum update index in `reftable_be_transaction_finish()` inside the loop, which ensures that all elements of the array are correctly initialized. Furthermore, initialize the `max_index` field to 0 when queueing a new transaction argument. This is not strictly necessary, as all elements of `write_transaction_table_arg.max_index` are now assigned correctly. However, this initialization is added for consistency and to safeguard against potential future changes that might inadvertently introduce uninitialized memory access. Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27fetch set_head: fix non-mirror remotes in bare repositoriesBence Ferdinandy3-5/+32
In b1b713f722 (fetch set_head: handle mirrored bare repositories, 2024-11-22) it was implicitly assumed that all remotes will be mirrors in a bare repository, thus fetching a non-mirrored remote could lead to HEAD pointing to a non-existent reference. Make sure we only overwrite HEAD if we are in a bare repository and fetching from a mirror. Otherwise, proceed as normally, and create refs/remotes/<nonmirrorremote>/HEAD instead. Reported-by: Christian Hesse <list@eworm.de> Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27fetch set_head: refactor to use remote directlyBence Ferdinandy1-8/+7
As a preparatory step to use even more properties from the remote struct, refactor set_head to take the entire struct as a parameter, instead of the necessary bits. This also allows consolidating the use of gtransport->remote in set_head, making the access of the remote's properties consistent in the function. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-25bundle: avoid closing file descriptor twiceJohannes Schindelin3-1/+6
Already when introduced in c7a8a16239 (Add bundle transport, 2007-09-10), the `bundle` transport had a bug where it would open a file descriptor to the bundle file and then close it _twice_: First, the file descriptor (`data->fd`) is passed to `unbundle()`, which would use it as the `stdin` of the `index-pack` process, which as a consequence would close it via `start_command()`. However, `data->fd` would still hold the numerical value of the file descriptor, and `close_bundle()` would see that and happily close it again. This seems not to have caused too many problems in almost two decades, but I encountered a situation today where it _does_ cause problems: In i686 variants of Git for Windows, it seems that file descriptors are reused quickly after they have been closed. In the particular scenario I faced, `git fetch <bundle> <ref>` gets the same file descriptor value when opening the bundle file and importing its embedded packfile (which implicitly closes the file descriptor) and then when opening a pack file in `fetch_and_consume_refs()` while looking up an object's header. Later on, after the bundle has been imported (and the `close_bundle()` function erroneously closes the file descriptor that has _already_ been closed when using it as `stdin` for `git index-pack`), the same file descriptor value has now been reused via `use_pack()`. Now, when either the recursive fetch (which defaults to "on", unfortunately) or a commit-graph update needs to `mmap()` the packfile, it fails due to a now-invalid file descriptor that _should_ point to the pack file but doesn't anymore. To fix that, let's invalidate `data->fd` after calling `unbundle()`. That way, `close_bundle()` does not close a file descriptor that may have been reused for something different. While at it, document that `unbundle()` closes the file descriptor, and ensure that it also does that when failing to verify the bundle. Luckily, this bug does not affect the bundle URI feature, it only affects the `git fetch <bundle>` code path. Note that this patch does not _completely_ clarifies who is responsible to close that file descriptor, as `run_command()` may fail _without_ closing `cmd->in`. Addressing this issue thoroughly, however, would require a rather thorough re-design of the `start_command()` and `finish_command()` functionality to make it a lot less murky who is responsible for what file descriptors. At least this here patch is relatively easy to reason about, and addresses a hard failure (`fatal: mmap: could not determine filesize`) at the expense of leaking a file descriptor under very rare circumstances in which `git fetch` would error out anyway. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-24gc: add `--expire-to` optionZheNing Hu3-2/+47
This commit extends the functionality of `git gc` by adding a new option, `--expire-to=<dir>`. Previously, this feature was implemented in 91badeba32 (builtin/repack.c: implement `--expire-to` for storing pruned objects, 2022-10-24), which allowing users to specify a directory where unreachable and expired cruft packs are stored during garbage collection. However, users had to run `git repack --cruft --expire-to=<dir>` followed by `git prune` to achieve similar results within `git gc`. By introducing `--expire-to=<dir>` directly into `git gc`, we simplify the process for users who wish to manage their repository's cleanup more efficiently. This change involves passing the `--expire-to=<dir>` parameter through to `git repack`, making it easier for users to set up a backup location for cruft packs that will be pruned. Due to the original `git gc --prune=now` deleting all unreachable objects by passing the `-a` parameter to git repack. With the addition of the `--cruft` and `--expire-to` options, it is necessary to modify this default behavior: instead of deleting these unreachable objects, they should be merged into a cruft pack and collected in a specified directory. Therefore, we do not pass `-a` to the repack command but instead pass `--cruft`, `--expire-to`, and `--cruft-expiration=now` to repack. Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-24config.txt: add trailer.* variablesJulian Prein3-135/+140
The trailer.* configuration variables are currently only described in git-interpret-trailers(1) but affect git-commit and git-tag as well. Move that section into its own config/trailer.txt file and also include it in git-config(1). Signed-off-by: Julian Prein <julian@druckdev.xyz> Acked-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-24remote: announce removal of "branches/" and "remotes/"Patrick Steinhardt9-43/+99
Back when Git was in its infancy, remotes were configured via separate files in "branches/" (back in 2005). This mechanism was replaced later that year with the "remotes/" directory. Both mechanisms have eventually been replaced by config-based remotes, and it is very unlikely that anybody still uses these directories to configure their remotes. Both of these directories have been marked as deprecated, one in 2005 and the other one in 2011. Follow through with the deprecation and finally announce the removal of these features in Git 3.0. Signed-off-by: Patrick Steinhardt <ps@pks.im> [jc: with a small tweak to the help message] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23The third batchJunio C Hamano1-0/+22
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23Merge branch 'jc/cli-doc-option-and-config'Junio C Hamano1-0/+17
Doc update. * jc/cli-doc-option-and-config: gitcli: document that command line trumps config and env
2025-01-23Merge branch 'mh/doc-credential-helpers-with-pat'Junio C Hamano2-12/+46
Document that it is insecure to use Personal Access Tokens, which some hosting providers take as username/password, embedded in URLs. * mh/doc-credential-helpers-with-pat: docs: discuss caching personal access tokens docs: list popular credential helpers
2025-01-23Merge branch 'ak/instaweb-python-port-binding-fix'Junio C Hamano1-2/+2
The "instaweb" bound only to local IP address without "--local" and to all addresses with "--local", which was the other way around, when using Python's http.server class, which has been corrected. * ak/instaweb-python-port-binding-fix: instaweb: fix ip binding for the python http.server
2025-01-23Merge branch 'sj/meson-doc-technical-dependency-fix'Junio C Hamano1-0/+1
The meson build procedure for Documentation/technical/ hierarchy was missing necessary dependencies, which has been corrected. * sj/meson-doc-technical-dependency-fix: meson: fix missing deps for technical articles
2025-01-23Merge branch 'tc/meson-use-our-version-def-h'Junio C Hamano2-2/+9
The meson build procedure looked for the 'version-def.h' file in a wrong directory, which has been corrected. * tc/meson-use-our-version-def-h: meson: ensure correct version-def.h is used
2025-01-23Merge branch 'en/object-name-with-funny-refname-fix'Junio C Hamano3-5/+113
Extended SHA-1 expression parser did not work well when a branch with an unusual name (e.g. "foo{bar") is involved. * en/object-name-with-funny-refname-fix: object-name: be more strict in parsing describe-like output object-name: fix resolution of object names containing curly braces
2025-01-23Merge branch 'ds/path-walk-1' into ds/backfillJunio C Hamano9-0/+1217
* ds/path-walk-1: path-walk: drop redundant parse_tree() call path-walk: reorder object visits path-walk: mark trees and blobs as UNINTERESTING path-walk: visit tags and cached objects path-walk: allow consumer to specify object types t6601: add helper for testing path-walk API test-lib-functions: add test_cmp_sorted path-walk: introduce an object walk by path
2025-01-23hash.h: drop unsafe_ function variantsTaylor Blau2-30/+0
Now that all callers have been converted from: the_hash_algo->unsafe_init_fn(); to unsafe_hash_algo(the_hash_algo)->init_fn(); and similar, we can remove the scaffolding for the unsafe_ function variants and force callers to use the new unsafe_hash_algo() mechanic instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23csum-file: introduce hashfile_checkpoint_init()Taylor Blau4-4/+15
In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1 backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to initialize a hashfile_checkpoint with the same hash function implementation as is used by the hashfile it is used to checkpoint. While both 106140a99f and 9218c0bfe1 work around the immediate crash, changing the hash function implementation within the hashfile API to, for example, the non-unsafe variant would re-introduce the crash. This is a result of the tight coupling between initializing hashfiles and hashfile_checkpoints. Introduce and use a new function which ensures that both parts of a hashfile and hashfile_checkpoint pair use the same hash function implementation to avoid such crashes. A few things worth noting: - In the change to builtin/fast-import.c::stream_blob(), we can see that by removing the explicit reference to 'the_hash_algo->unsafe_init_fn()', we are hardened against the hashfile API changing away from the_hash_algo (or its unsafe variant) in the future. - The bulk-checkin code no longer needs to explicitly zero-initialize the hashfile_checkpoint, since it is now done as a result of calling 'hashfile_checkpoint_init()'. - Also in the bulk-checkin code, we add an additional call to prepare_to_stream() outside of the main loop in order to initialize 'state->f' so we know which hash function implementation to use when calling 'hashfile_checkpoint_init()'. This is OK, since subsequent 'prepare_to_stream()' calls are noops. However, we only need to call 'prepare_to_stream()' when we have the HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling 'prepare_to_stream()' does not assign 'state->f', so we have nothing to initialize. - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are appropriately guarded. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23t/helper/test-hash.c: use unsafe_hash_algo()Taylor Blau1-12/+5
Remove a series of conditionals within the shared cmd_hash_impl() helper that powers the 'sha1' and 'sha1-unsafe' helpers. Instead, replace them with a single conditional that transforms the specified hash algorithm into its unsafe variant. Then all subsequent calls can directly use whatever function it wants to call without having to decide between the safe and unsafe variants. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23csum-file.c: use unsafe_hash_algo()Taylor Blau1-11/+11
Instead of calling the unsafe_ hash function variants directly, make use of the shared 'algop' pointer by initializing it to: f->algop = unsafe_hash_algo(the_hash_algo); , thus making all calls use the unsafe variants directly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23hash.h: introduce `unsafe_hash_algo()`Taylor Blau2-1/+38
In 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants, 2024-09-26), we introduced "unsafe" variants of the SHA-1 hashing functions by introducing new functions like "unsafe_init_fn()" and so on. This approach has a major shortcoming that callers must remember to consistently use one variant or the other. Failing to consistently use (or not use) the unsafe variants can lead to crashes at best, or subtle memory corruption issues at worst. In the hashfile API, this isn't difficult to achieve, but verifying that all callers consistently use the unsafe variants is somewhat of a chore given how spread out all of the callers are. In the sha1 and sha1-unsafe test helpers, all of the calls to various hash functions are guarded by an "if (unsafe)" conditional, which is repetitive and cumbersome. Address these issues by introducing a new pattern whereby one 'git_hash_algo' can return a pointer to another 'git_hash_algo' that represents the unsafe version of itself. So instead of having something like: if (unsafe) the_hash_algo->init_fn(...); the_hash_algo->update_fn(...); the_hash_algo->final_fn(...); else the_hash_algo->unsafe_init_fn(...); the_hash_algo->unsafe_update_fn(...); the_hash_algo->unsafe_final_fn(...); we can instead write: struct git_hash_algo *algop = the_hash_algo; if (unsafe) algop = unsafe_hash_algo(algop); algop->init_fn(...); algop->update_fn(...); algop->final_fn(...); This removes the existing shortcoming by no longer forcing the caller to "remember" which variant of the hash functions it wants to call, only to hold onto a 'struct git_hash_algo' pointer that is initialized once. Similarly, while there currently is still a way to "mix" safe and unsafe functions, this too will go away after subsequent commits remove all direct calls to the unsafe_ variants. Note that hash_algo_by_ptr() needs an adjustment to allow passing in the unsafe variant of a hash function. All other query functions on the hash_algos array will continue to return the safe variants of any function. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23csum-file.c: extract algop from hashfile_checksum_valid()Taylor Blau1-6/+7
Perform a similar transformation as in the previous commit, but focused instead on hashfile_checksum_valid(). This function does not work with a hashfile structure itself, and instead validates the raw contents of a file written using the hashfile API. We'll want to be prepared for a similar change to this function in the future, so prepare ourselves for that by extracting 'the_hash_algo' into its own field for use within this function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23csum-file: store the hash algorithm as a struct fieldTaylor Blau2-9/+12
Throughout the hashfile API, we rely on a reference to 'the_hash_algo', and call its _unsafe function variants directly. Prepare for a future change where we may use a different 'git_hash_algo' pointer (instead of just relying on 'the_hash_algo' throughout) by making the 'git_hash_algo' pointer a member of the 'hashfile' structure itself. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23t/helper/test-tool: implement sha1-unsafe helperTaylor Blau6-23/+45
With the new "unsafe" SHA-1 build knob, it is convenient to have a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing, similar to 't/helper/test-tool sha1'. Implement that helper by altering the implementation of that test-tool (in cmd_hash_impl(), which is generic and parameterized over different hash functions) to conditionally run the unsafe variants of the chosen hash function, and expose the new behavior via a new 'sha1-unsafe' test helper. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23trace2: prevent segfault on config collection with valueless trueAdam Murray5-6/+18
When TRACE2 analytics is enabled, a configuration variable set to "valueless true" causes a segfault. Steps to Reproduce GIT_TRACE2=true GIT_TRACE2_CONFIG_PARAMS=status.* git -c status.relativePaths version Expected Result git version 2.46.0 Actual Result zsh: segmentation fault GIT_TRACE2=true Add checks to prevent the segfault and instead show that the variable without value. Signed-off-by: Adam Murray <ad@canva.com> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23refs: fix creation of reflog entries for symrefsKarthik Nayak2-3/+9
The commit 297c09eabb (refs: allow multiple reflog entries for the same refname, 2024-12-16) added logic to exit early in `lock_ref_for_update()` after obtaining the required lock. This was added as a performance optimization on a false assumption that no further processing was required for reflog-only updates. However the assumption was wrong. For a symref's reflog entry, the update needs to be populated with the old_oid value, but the early exit skipped this necessary step. This caused a bug in Git 2.48 in the files backend where target references of symrefs being updated would create a corrupted reflog entry for the symref since the old_oid is not populated. Everything the early exit skipped in the code path is necessary for both regular and symbolic ref, so eliminate the mistaken optimization, and also add a test to ensure that such an issue doesn't arise in the future. Reported-by: Nika Layzell <nika@thelayzells.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22path-walk: drop redundant parse_tree() callJeff King1-1/+0
This call to parse_tree() was flagged by Coverity for ignoring the return value. But if we look a little further up the function, we can see that there is already a call to parse_tree_gently(), and we'll return early if that fails. So by this point the tree will always be parsed, and the call is redundant. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22Merge branch 'ps/build-meson-fixes' into ps/zlib-ngJunio C Hamano7-38/+218
* ps/build-meson-fixes: ci: wire up Visual Studio build with Meson ci: raise error when Meson generates warnings meson: fix compilation with Visual Studio meson: make the CSPRNG backend configurable meson: wire up fuzzers meson: wire up generation of distribution archive meson: wire up development environments meson: fix dependencies for generated headers meson: populate project version via GIT-VERSION-GEN GIT-VERSION-GEN: allow running without input and output files GIT-VERSION-GEN: simplify computing the dirty marker
2025-01-22ci: wire up Visual Studio build with MesonPatrick Steinhardt2-0/+90
Add a new job to GitHub Actions and GitLab CI that builds and tests Meson-based builds with Visual Studio. A couple notes: - While the build job is mandatory, the test job is marked as "manual" on GitLab so that it doesn't run by default. We already have a bunch of Windows-based jobs, and the computational overhead that these cause is simply out of proportion to run the test suite twice. The same isn't true for GitHub as I could not find a way to make a subset of jobs manually triggered. - We disable Perl. This is because we pick up Perl from Git for Windows, which outputs different paths ("/c/" instead of "C:\") than what we expect in our tests. - We don't use the Git for Windows SDK. Instead, the build only depends on Visual Studio, Meson and Git for Windows. All the other dependencies like curl, pcre2 and zlib get pulled in and compiled automatically by Meson and thus do not have to be provided by the system. - We open-code "ci/run-test-slice.sh". This is because we only have direct access to PowerShell, so we manually implement the logic. There is an upstream pull request for the Meson build system [1] to implement test slicing in Meson directly. - We don't process test artifacts for failed CI jobs. This is done to keep down prerequisites to a minimum. All tests are passing. [1]: https://github.com/mesonbuild/meson/pull/14092 Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22ci: raise error when Meson generates warningsPatrick Steinhardt1-0/+1
Meson prints warnings in several cases, like for example when using a feature supported by the current version of Meson, but not yet supported by the minimum required version as declared by the project. These warnings will not cause the setup to fail by default, which makes it quite easy to miss them. Improve this by passing `--fatal-meson-warnings` to `meson setup` so that our CI jobs will fail on warnings. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22meson: fix compilation with Visual StudioPatrick Steinhardt1-0/+8
The Visual Studio compiler defaults to C89 unless explicitly asked to use a different version of the C standard. We don't specify any C standard at all though in our Meson build, and consequently compiling Git fails: ...\git\git-compat-util.h(14): fatal error C1189: #error: "Required C99 support is in a test phase. Please see git-compat-util.h for more details." Fix the issue by specifying the project's C standard. Funny enough, specifying C99 does not work because apparently, `__STDC_VERSION__` is not getting defined in that version at all. Instead, we have to specify C11 as the project's C standard, which is also done in our CMake build instructions. We don't want to generally enforce C11 though, as our requiremets only state that a C99 compiler is required. In fact, we don't even require plain C99, but rather the GNU variant thereof. Meson allows us to handle this case rather easily by specifying "gnu99,c11", which will cause it to fall back to C11 in case GNU C99 is unsupported. This feature has only been introduced with Meson 1.3.0 though, and we support 0.61.0 and newer. In case we use such an oldish version though we fall back to requiring GNU99 unconditionally. This means that Windows essentially requires Meson 1.3.0 and newer when using Visual Studio, but I doubt that this is ever going to be a real problem. Tested-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22meson: make the CSPRNG backend configurablePatrick Steinhardt2-7/+23
The CSPRNG backend is not configurable in Meson and isn't quite discoverable, either. Make it configurable and add the actual backend used to the summary. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22meson: wire up fuzzersPatrick Steinhardt4-1/+28
Meson does not yet know to build our fuzzers. Introduce a new build option "fuzzers" and wire up the fuzzers in case it is enabled. Adapt our CI jobs so that they build the fuzzers by default. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22meson: wire up generation of distribution archivePatrick Steinhardt1-0/+13
Meson knows to generate distribution archives via `meson dist`. In addition to generating the archive itself, this target also knows to compile and execute tests from that archive, which helps to ensure that the result is an adequate drop-in replacement for the versioned project. While this already works as-is, one omission is that we don't propagate the commit that this is built from into the resulting archive. This can be fixed though by adding a distribution script that propagates the version into the "version" file, which GIT-VERSION-GEN knows to read if present. Use GIT-VERSION-GEN to populate that file. As the script is executed in the build directory, not in the directory where we generate the archive, we have to use a shell to resolve the "MESON_DIST_ROOT" environment variable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22meson: wire up development environmentsPatrick Steinhardt1-0/+8
The Meson build system is able to wire up development environments. The intent is to make build artifacts of the project available. This is typically used to export e.g. paths to linkable libraries, which isn't all that interesting in our context given that we don't have an official library interface. But what we can use this mechanism for is to expose the built Git executables as well as the build directory. This allows users to play around with the built Git version in the devenv, and allows them to execute our test scripts directly with the built distribution. Wire up this feature, which can then be used via `meson devenv` in the build directory. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22meson: fix dependencies for generated headersPatrick Steinhardt1-9/+9
We generate a couple of headers from our documentation. These headers are added to the libgit sources, but two of them aren't used by the library, but instead by our builtins. This can cause parallel builds to fail because the builtin object may be compiled before the header was generated. Fix the issue by adding both "config-list.h" and "hook-list.h" to the list of builtin sources. While "command-list.h" is generated similarly, it is used by "help.c" and thus part of the libgit sources indeed. Reported-by: Evan Martin <evan.martin@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22meson: populate project version via GIT-VERSION-GENPatrick Steinhardt1-1/+8
The Git version for Meson is currently wired up manually. It can thus grow (and already has grown) stale quite easily, as having multiple sources of truth is never a good idea. This issue is mostly of cosmetic nature as we don't use the project version anywhere, and instead use the GIT-VERSION-GEN script to propagate the correct version into our build. But it is somewhat puzzling when `meson setup` announces to build an old Git release. There are a couple of alternatives for how to solve this: - We can keep the version undefined, but this makes Meson output "undefined" for the version, as well. - We can use GIT-VERSION-GEN to generate the version for us. At the point of configuring the project we haven't yet figured out host details though, and thus we didn't yet set up the shell environment. While not an issue for Unix-based systems, this would be an issue in Windows, where the shell typically gets provided via Git for Windows and thus requires some special setup. - We can pull the default version out of GIT-VERSION-GEN and move it into its own file. This likely requires some adjustments for scripts that bump the version, but allows Meson to read the version from that file trivially. Pick the second option and use GIT-VERSION-GEN as it gives us the most accurate version. In order to fix the bootstrapping issue on Windows systems we simply set the version to 'unknown' in case no shell was found. As the version is only of cosmetic value this isn't really much of an issue. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-22GIT-VERSION-GEN: allow running without input and output filesPatrick Steinhardt1-15/+29
The GIT-VERSION-GEN script requires an input file containing formatting directives to be replaced as well as an output file that will get overwritten in case the file contents have changed. When computing the project version for Meson we don't want to have either though: - We only want to compute the version without anything else, but don't have an input file that would match that exact format. While we could of course introduce a new file just for that usecase, it feels suboptimal to add another file every time we want to have a slightly different format for versioned data. - The computed version needs to be read from stdout so that Meson can wire it up for the project. Extend the script to handle both usecases by recognizing `--format=` as alternative to providing an input path and by writing to stdout in case no output file was given. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>