| Age | Commit message (Collapse) | Author | Files | Lines |
|
"git -c help.autocorrect=0 psuh" shows the suggested typofix,
unlike the previous attempt in the base topic.
* da/help-autocorrect-one-fix:
help: add "show" as a valid configuration value
help: show the suggested command when help.autocorrect is false
|
|
"[help] autocorrect = 1" used to be a way to say "please wait for
0.1 second after suggesting a typofix of the command name before
running that command"; now it means "yes, if there is a plausible
typofix for the command name, please run it immediately".
* sc/help-autocorrect-one:
help: interpret boolean string values for help.autocorrect
|
|
Code shuffling.
* ms/remote-valid-remote-name:
remote: relocate valid_remote_name
|
|
Code clean-up. cf. <Z6G-toOJjMmK8iJG@pks.im>
* ms/refspec-cleanup:
refspec: relocate apply_refspecs and related funtions
refspec: relocate matching related functions
remote: rename query_refspecs functions
refspec: relocate refname_matches_negative_refspec_item
remote: rename function omit_name_by_refspec
|
|
Documentaiton updates.
* jp/doc-trailer-config:
config.txt: add trailer.* variables
|
|
"git gc" learned the "--expire-to" option and passes it down to
underlying "git repack".
* zh/gc-expire-to:
gc: add `--expire-to` option
|
|
Foreign language interface for Rust into our code base has been added.
* js/libgit-rust:
libgit: add higher-level libgit crate
libgit-sys: also export some config_set functions
libgit-sys: introduce Rust wrapper for libgit.a
common-main: split init and exit code into new files
|
|
Test clean-up.
* ac/t5401-use-test-path-is-file:
t5401: prefer test_path_is_* helper function
|
|
Test clean-up.
* ac/t6423-unhide-git-exit-status:
t6423: fix suppression of Git’s exit code in tests
|
|
"git repack --keep-unreachable" to send unreachable objects to the
main pack "git repack -ad" produces did not work when there is no
existing packs, which has been corrected.
* ps/repack-keep-unreachable-in-unpacked-repo:
builtin/repack: fix `--keep-unreachable` when there are no packs
|
|
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
CI update to make Coverity job work again.
* jk/ci-coverity-update:
ci: set CI_JOB_IMAGE for coverity job
|
|
Convert a handful of unit tests to work with the clar framework.
* sk/unit-tests-0130:
t/unit-tests: convert strcmp-offset test to use clar test framework
t/unit-tests: convert strbuf test to use clar test framework
t/unit-tests: adapt example decorate test to use clar test framework
t/unit-tests: convert hashmap test to use clar test framework
|
|
Further code clean-up on the use of hash functions. Now the
context object knows what hash function it is working with.
* ps/hash-cleanup:
global: adapt callers to use generic hash context helpers
hash: provide generic wrappers to update hash contexts
hash: stop typedeffing the hash context
hash: convert hashing context to a structure
|
|
Two CI tasks, whitespace check and style check, work on the
difference from the base version and the version being checked, but
the base was computed incorrectly in GitLab CI in some cases, which
has been corrected.
* jt/gitlab-ci-base-fix:
ci: fix base commit fallback for check-whitespace and check-style
|
|
"git apply" internally uses unsigned long for line numbers and uses
strtoul() to parse numbers on the hunk headers. It however forgot
to check parse errors.
* pw/apply-ulong-overflow-check:
apply: detect overflow when parsing hunk header
|
|
"git init" to reinitialize a repository that already exists cannot
change the hash function and ref backends; such a request is
silently ignored now.
* ps/setup-reinit-fixes:
setup: fix reinit of repos with incompatible GIT_DEFAULT_HASH
setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT
t0001: remove duplicate test
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A few more leakfixes.
* ps/leakfixes-0129:
scalar: free result of `remote_default_branch()`
unix-socket: fix memory leak when chdir(3p) fails
|
|
The code paths to interact with zlib has been cleaned up in
preparation for building with zlib-ng.
* ps/zlib-ng:
ci: make "linux-musl" job use zlib-ng
ci: switch linux-musl to use Meson
compat/zlib: allow use of zlib-ng as backend
git-zlib: cast away potential constness of `next_in` pointer
compat/zlib: provide stubs for `deflateSetHeader()`
compat/zlib: provide `deflateBound()` shim centrally
git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
compat: introduce new "zlib.h" header
git-compat-util: drop `z_const` define
compat: drop `uncompress2()` compatibility shim
|
|
The code path used when "git fetch" fetches from a bundle file
closed the same file descriptor twice, which sometimes broke things
unexpectedly when the file descriptor was reused, which has been
corrected.
* js/bundle-unbundle-fd-reuse-fix:
bundle: avoid closing file descriptor twice
|
|
CI updates (containerization, dropping stale ones, etc.).
* ps/ci-misc-updates:
ci: remove stale code for Azure Pipelines
ci: use latest Ubuntu release
ci: stop special-casing for Ubuntu 16.04
gitlab-ci: add linux32 job testing against i386
gitlab-ci: remove the "linux-old" job
github: simplify computation of the job's distro
github: convert all Linux jobs to be containerized
github: adapt containerized jobs to be rootless
t7422: fix flaky test caused by buffered stdout
t0060: fix EBUSY in MinGW when setting up runtime prefix
|
|
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.
The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:
- `has_existing_non_kept_packs()` checks whether there are existing
packfiles. This condition makes sense to guard "--keep-pack=",
"--unpack-unreachable" and "--keep-unreachable", because all of
these flags only make sense in combination with existing packfiles.
But it does not make sense to disable `--pack-loose-unreachable`
when there aren't any preexisting packfiles, as loose objects can be
packed into the new packfile regardless of that.
- `delete_redundant` checks whether we want to delete any objects or
packs that are about to become redundant. The documentation of
`--keep-unreachable` explicitly says that `git repack -ad` needs to
be executed for the flag to have an effect.
It is not immediately obvious why such redundant objects need to be
deleted in order for "--pack-unreachable-objects" to be effective.
But as things are working as documented this is nothing we'll change
for now.
- `pack_everything & PACK_CRUFT` checks that we're not creating a
cruft pack. This condition makes sense in the context of
"--pack-loose-unreachable", as unreachable objects would end up in
the cruft pack anyway.
So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.
Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the `valid_remote_name()` function from the refspec subsystem to
the remote subsystem to better align with the separation of concerns.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the functions `apply_refspecs()` and `apply_negative_refspecs()`
from `remote.c` to `refspec.c`. These functions focus on applying
refspecs, so centralizing them in `refspec.c` improves code organization
by keeping refspec-related logic in one place.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the functions `refspec_find_match()`, `refspec_find_all_matches()`
and `refspec_find_negative_match()` from `remote.c` to `refspec.c`.
These functions focus on matching refspecs, so centralizing them in
`refspec.c` improves code organization by keeping refspec-related logic
in one place.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Rename functions related to handling refspecs in preparation for their
move from `remote.c` to `refspec.c`. Update their names to better
reflect their intent:
- `query_refspecs()` -> `refspec_find_match()` for clarity, as it
finds a single matching refspec.
- `query_refspecs_multiple()` -> `refspec_find_all_matches()` to
better reflect that it collects all matching refspecs instead of
returning just the first match.
- `query_matches_negative_refspec()` ->
`refspec_find_negative_match()` for consistency with the
updated naming convention, even though this static function
didn't strictly require renaming.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the functions `refname_matches_negative_refspec_item()`,
`refspec_match()`, and `match_name_with_pattern()` from `remote.c` to
`refspec.c`. These functions focus on refspec matching, so placing them
in `refspec.c` aligns with the separation of concerns. Keep
refspec-related logic in `refspec.c` and remote-specific logic in
`remote.c` for better code organization.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Rename the function `omit_name_by_refspec()` to
`refname_matches_negative_refspec_item()` to provide clearer intent.
The previous function name was vague and did not accurately describe its
purpose. By using `refname_matches_negative_refspec_item`, make the
function's purpose more intuitive, clarifying that it checks if a
reference name matches any negative refspec.
Rename function parameters for consistency with existing naming
conventions. Use `refname` instead of `name` to align with terminology
in `refs.h`.
Remove the redundant doc comment since the function name is now
self-explanatory.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some test in t6423 supress Git's exit code, which can cause test
failures go unnoticed. Specifically using git <subcommand> |
<other-command> masks potential failures of the Git command.
This commit ensures that Git's exit status is correctly propogated by:
- Avoiding pipes that suppress exit codes.
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a literal value for showing the suggested autocorrection
for consistency with the rest of the help.autocorrect options.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the handling of false boolean values for help.autocorrect
consistent with the handling of value 0 by showing the suggested
commands but not running them.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* sc/help-autocorrect-one:
help: interpret boolean string values for help.autocorrect
|
|
"test -f" does not provide a nice error message when we hit test
failures, so use test_path_is_file instead.
Signed-off-by: ambar chakravartty <amch9605@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix bugs in an earlier attempt to fix "git refs migration".
* kn/reflog-migration-fix-fix:
refs/reftable: fix uninitialized memory access of `max_index`
reftable: write correct max_update_index to header
|
|
Code clean-up.
* kn/pack-write-with-reduced-globals:
pack-write: pass hash_algo to internal functions
pack-write: pass hash_algo to `write_rev_file()`
pack-write: pass hash_algo to `write_idx_file()`
pack-write: pass repository to `index_pack_lockfile()`
pack-write: pass hash_algo to `fixup_pack_header_footer()`
|
|
More build fixes and enhancements on meson based build procedure.
* ps/build-meson-fixes:
ci: wire up Visual Studio build with Meson
ci: raise error when Meson generates warnings
meson: fix compilation with Visual Studio
meson: make the CSPRNG backend configurable
meson: wire up fuzzers
meson: wire up generation of distribution archive
meson: wire up development environments
meson: fix dependencies for generated headers
meson: populate project version via GIT-VERSION-GEN
GIT-VERSION-GEN: allow running without input and output files
GIT-VERSION-GEN: simplify computing the dirty marker
|
|
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.
* ps/3.0-remote-deprecation:
remote: announce removal of "branches/" and "remotes/"
builtin/pack-redundant: remove subcommand with breaking changes
ci: repurpose "linux-gcc" job for deprecations
ci: merge linux-gcc-default into linux-gcc
Makefile: wire up build option for deprecated features
|
|
Code clean-up for code paths around combined diff.
* jk/combine-diff-cleanup:
tree-diff: make list tail-passing more explicit
tree-diff: simplify emit_path() list management
tree-diff: use the name "tail" to refer to list tail
tree-diff: drop list-tail argument to diff_tree_paths()
combine-diff: drop public declaration of combine_diff_path_size()
tree-diff: inline path_appendnew()
tree-diff: pass whole path string to path_appendnew()
tree-diff: drop path_appendnew() alloc optimization
run_diff_files(): de-mystify the size of combine_diff_path struct
diff: add a comment about combine_diff_path.parent.path
combine-diff: use pointer for parent paths
tree-diff: clear parent array in path_appendnew()
combine-diff: add combine_diff_path_new()
run_diff_files(): delay allocation of combine_diff_path
|
|
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
|
|
The main GitHub Actions workflow switched away from the "$distro"
variable in b133d3071a (github: simplify computation of the job's
distro, 2025-01-10). Since the Coverity job also depends on our
ci/install-dependencies.sh script, it needs to likewise set CI_JOB_IMAGE
to find the correct dependencies (without this patch, we don't install
curl and the build fails).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/ci-misc-updates:
ci: remove stale code for Azure Pipelines
ci: use latest Ubuntu release
ci: stop special-casing for Ubuntu 16.04
gitlab-ci: add linux32 job testing against i386
gitlab-ci: remove the "linux-old" job
github: simplify computation of the job's distro
github: convert all Linux jobs to be containerized
github: adapt containerized jobs to be rootless
t7422: fix flaky test caused by buffered stdout
t0060: fix EBUSY in MinGW when setting up runtime prefix
|
|
Adapt strcmp-offset test script to clar framework by using clar
assertions where necessary. Introduce `test_strcmp_offset__empty()` to
verify `check_strcmp_offset()` behavior when both input strings are
empty. This ensures the function correctly handles edge cases and
returns expected values.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapt strbuf test script to clar framework by using clar assertions
where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce `test_example_decorate__initialize()` to explicitly set up
object IDs and retrieve corresponding objects before tests run. This
ensures a consistent and predictable test state without relying on data
from previous tests.
Add `test_example_decorate__cleanup()` to clear decorations after each
test, preventing interference between tests and ensuring each runs in
isolation.
Adapt example decorate test script to clar framework by using clar
assertions where necessary. Previously, tests relied on data written by
earlier tests, leading to unintended dependencies between them. This
explicitly initializes the necessary state within
`test_example_decorate__readd`, ensuring it does not depend on prior
test executions.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapts hashmap test script to clar framework by using clar assertions
where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The check-whitespace and check-style CI scripts require a base commit.
In GitLab CI, the base commit can be provided by several different
predefined CI variables depending on the type of pipeline being
performed.
In 30c4f7e350 (check-whitespace: detect if no base_commit is provided,
2024-07-23), the GitLab check-whitespace CI job was modified to support
CI_MERGE_REQUEST_DIFF_BASE_SHA as a fallback base commit if
CI_MERGE_REQUEST_TARGET_BRANCH_SHA was not provided. The same fallback
strategy was also implemented for the GitLab check-style CI job in
bce7e52d4e (ci: run style check on GitHub and GitLab, 2024-07-23).
The base commit fallback is implemented using shell parameter expansion
where, if the first variable is unset, the second variable is used as
fallback. In GitLab CI, these variables can be set but null. This has
the unintended effect of selecting an empty first variable which results
in CI jobs providing an invalid base commit and failing.
Fix the issue by defaulting to the fallback variable if the first is
unset or null.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The hash context is supposed to be updated via the `git_hash_algo`
structure, which contains a list of function pointers to update, clone
or finalize a hashing context. This requires the callers to track which
algorithm was used to initialize the context and continue to use the
exact same algorithm. If they fail to do that correctly, it can happen
that we start to access context state of one hash algorithm with
functions of a different hash algorithm. The result would typically be a
segfault, as could be seen e.g. in the patches part of 98422943f0 (Merge
branch 'ps/weak-sha1-for-tail-sum-fix', 2025-01-01).
The situation was significantly improved starting with 04292c3796
(hash.h: drop unsafe_ function variants, 2025-01-23) and its parent
commits. These refactorings ensure that it is not possible to mix up
safe and unsafe variants of the same hash algorithm anymore. But in
theory, it is still possible to mix up different hash algorithms with
each other, even though this is a lot less likely to happen.
But still, we can do better: instead of asking the caller to remember
the hash algorithm used to initialize a context, we can instead make the
context itself remember which algorithm it has been initialized with. If
we do so, callers can use a set of generic helpers to update the context
and don't need to be aware of the hash algorithm at all anymore.
Adapt the context initialization functions to store the hash algorithm
in the hashing context and introduce these generic helpers. Callers will
be adapted in the subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.
Drop the typedef and adapt all callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `git_hash_context` is a union containing the different hash-specific
states for SHA1, its unsafe variant as well as SHA256. We know that only
one of these states will ever be in use at the same time because hash
contexts cannot be used for multiple different hashes at the same point
in time.
We're about to extend the structure though to keep track of the hash
algorithm used to initialize the context, which is impossible to do
while the context is a union. Refactor it to instead be a structure that
contains the union of context states.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Doc and short-help text for "show-index" has been clarified to
stress that the command reads its data from the standard input.
* jc/show-index-h-update:
show-index: the short help should say the command reads from its input
|
|
Doc mark-up updates.
* ja/doc-notes-markup-updates:
doc: convert git-notes to new documentation format
|
|
Code clean-up.
* sk/strlen-returns-size_t:
date.c: Fix type missmatch warings from msvc
|
|
Doc mark-up updates.
* ja/doc-restore-markup-update:
doc: convert git-restore to new style format
|
|
The exact same issue as described in the preceding commit also exists
for GIT_DEFAULT_HASH. Thus, reinitializing a repository that e.g. uses
SHA1 with `GIT_DEFAULT_HASH=sha256 git init` will cause the object
format of that repository to change to SHA256. This is of course bogus
as any existing objects and refs will not be converted, thus causing
repository corruption:
$ git init repo
Initialized empty Git repository in /tmp/repo/.git/
$ cd repo/
$ git commit --allow-empty -m message
[main (root-commit) 35a7344] message
$ GIT_DEFAULT_HASH=sha256 git init
Reinitialized existing Git repository in /tmp/repo/.git/
$ git show
fatal: your current branch appears to be broken
Fix the issue by ignoring the environment variable in case the repo has
already been initialized with an object hash.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The GIT_DEFAULT_REF_FORMAT environment variable can be set to influence
the default ref format that new repostiories shall be initialized with.
While this is the expected behaviour when creating a new repository, it
is not when reinitializing a repository: we should retain the ref format
currently used by it in that case.
This doesn't work correctly right now:
$ git init --ref-format=files repo
Initialized empty Git repository in /tmp/repo/.git/
$ GIT_DEFAULT_REF_FORMAT=reftable git init repo
fatal: could not open '/tmp/repo/.git/refs/heads' for writing: Is a directory
Instead of retaining the current ref format, the reinitialization tries
to reinitialize the repository with the different format. This action
fails when git-init(1) tries to write the ".git/refs/heads" stub, which
in the context of the reftable backend is always written as a file so
that we can detect clients which inadvertently try to access the repo
with the wrong ref format. Seems like the protection mechanism works for
this case, as well.
Fix the issue by ignoring the environment variable in case the repo has
already been initialized with a ref storage format.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The test in question is an exact copy of the testcase preceding it.
Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git apply" uses strtoul() to parse the numbers in the hunk header but
silently ignores overflows. As LONG_MAX is a legitimate return value for
strtoul() we need to set errno to zero before the call to strtoul() and
check that it is still zero afterwards. The error message we display is
not particularly helpful as it does not say what was wrong. However, it
seems pretty unlikely that users are going to trigger this error in
practice and we can always improve it later if needed.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We don't free the result of `remote_default_branch()`, leading to a
memory leak. This leak is exposed by t9211, but only when run with Meson
with the `-Db_sanitize=leak` option:
Direct leak of 5 byte(s) in 1 object(s) allocated from:
#0 0x5555555cfb93 in malloc (scalar+0x7bb93)
#1 0x5555556b05c2 in do_xmalloc ../wrapper.c:55:8
#2 0x5555556b06c4 in do_xmallocz ../wrapper.c:89:8
#3 0x5555556b0656 in xmallocz ../wrapper.c:97:9
#4 0x5555556b0728 in xmemdupz ../wrapper.c:113:16
#5 0x5555556b07a7 in xstrndup ../wrapper.c:119:9
#6 0x5555555d3a4b in remote_default_branch ../scalar.c:338:14
#7 0x5555555d20e6 in cmd_clone ../scalar.c:493:28
#8 0x5555555d196b in cmd_main ../scalar.c:992:14
#9 0x5555557c4059 in main ../common-main.c:64:11
#10 0x7ffff7a2a1fb in __libc_start_call_main (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
#11 0x7ffff7a2a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
#12 0x555555592054 in _start (scalar+0x3e054)
DEDUP_TOKEN: __interceptor_malloc--do_xmalloc--do_xmallocz--xmallocz--xmemdupz--xstrndup--remote_default_branch--cmd_clone--cmd_main--main--__libc_start_call_main--__libc_start_main@GLIBC_2.2.5--_start
SUMMARY: LeakSanitizer: 5 byte(s) leaked in 1 allocation(s).
As the `branch` variable may contain a string constant obtained from
parsing command line arguments we cannot free the leaking variable
directly. Instead, introduce a new `branch_to_free` variable that only
ever gets assigned the allocated string and free that one to plug the
leak.
It is unclear why the leak isn't flagged when running the test via our
Makefile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When trying to create a Unix socket in a path that exceeds the maximum
socket name length we try to first change the directory into the parent
folder before creating the socket to reduce the length of the name. When
this fails we error out of `unix_sockaddr_init()` with an error code,
which indicates to the caller that the context has not been initialized.
Consequently, they don't release that context.
This leads to a memory leak: when we have already populated the context
with the original directory that we need to chdir(3p) back into, but
then the chdir(3p) into the socket's parent directory fails, then we
won't release the original directory's path. The leak is exposed by
t0301, but only when running tests in a directory hierarchy whose path
is long enough to make the socket name length exceed the maximum socket
name length:
Direct leak of 129 byte(s) in 1 object(s) allocated from:
#0 0x5555555e85c6 in realloc.part.0 lsan_interceptors.cpp.o
#1 0x55555590e3d6 in xrealloc ../wrapper.c:140:8
#2 0x5555558c8fc6 in strbuf_grow ../strbuf.c:114:2
#3 0x5555558cacab in strbuf_getcwd ../strbuf.c:605:3
#4 0x555555923ff6 in unix_sockaddr_init ../unix-socket.c:65:7
#5 0x555555923e42 in unix_stream_connect ../unix-socket.c:84:6
#6 0x55555562a984 in send_request ../builtin/credential-cache.c:46:11
#7 0x55555562a89e in do_cache ../builtin/credential-cache.c:108:6
#8 0x55555562a655 in cmd_credential_cache ../builtin/credential-cache.c:178:3
#9 0x555555700547 in run_builtin ../git.c:480:11
#10 0x5555556ff0e0 in handle_builtin ../git.c:740:9
#11 0x5555556ffee8 in run_argv ../git.c:807:4
#12 0x5555556fee6b in cmd_main ../git.c:947:19
#13 0x55555593f689 in main ../common-main.c:64:11
#14 0x7ffff7a2a1fb in __libc_start_call_main (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
#15 0x7ffff7a2a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
#16 0x5555555ad1d4 in _start (git+0x591d4)
DEDUP_TOKEN: ___interceptor_realloc.part.0--xrealloc--strbuf_grow--strbuf_getcwd--unix_sockaddr_init--unix_stream_connect--send_request--do_cache--cmd_credential_cache--run_builtin--handle_builtin--run_argv--cmd_main--main--__libc_start_call_main--__libc_start_main@GLIBC_2.2.5--_start
SUMMARY: LeakSanitizer: 129 byte(s) leaked in 1 allocation(s).
Fix this leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The C functions exported by libgit-sys do not provide an idiomatic Rust
interface. To make it easier to use these functions via Rust, add a
higher-level "libgit" crate, that wraps the lower-level configset API
with an interface that is more Rust-y.
This combination of $X and $X-sys crates is a common pattern for FFI in
Rust, as documented in "The Cargo Book" [1].
[1] https://doc.rust-lang.org/cargo/reference/build-scripts.html#-sys-packages
Co-authored-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In preparation for implementing a higher-level Rust API for accessing
Git configs, export some of the upstream configset API via libgitpub and
libgit-sys. Since this will be exercised as part of the higher-level API
in the next commit, no tests have been added for libgit-sys.
While we're at it, add git_configset_alloc() and git_configset_free()
functions in libgitpub so that callers can manage config_set structs on
the heap. This also allows non-C external consumers to treat config_sets
as opaque structs.
Co-authored-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "git refs migrate" command did not migrate the reflog for
refs/stash, which is the contents of the stashes, which has been
corrected.
* ps/reflog-migration-with-logall-fix:
refs: fix migration of reflogs respecting "core.logAllRefUpdates"
|
|
The trace2 code was not prepared to show a configuration variable
that is set to true using the valueless true syntax, which has been
corrected.
* am/trace2-with-valueless-true:
trace2: prevent segfault on config collection with valueless true
|
|
reflog entries for symbolic ref updates were broken, which has been
corrected.
* kn/reflog-symref-fix:
refs: fix creation of reflog entries for symrefs
|
|
"git branch --sort=..." and "git for-each-ref --format=... --sort=..."
did not work as expected with some atoms, which has been corrected.
* rs/ref-fitler-used-atoms-value-fix:
ref-filter: remove ref_format_clear()
ref-filter: move is-base tip to used_atom
ref-filter: move ahead-behind bases into used_atom
|
|
Doc updates.
* ja/doc-commit-markup-updates:
doc: migrate git-commit manpage secondary files to new format
doc: convert git commit config to new format
doc: make more direct explanations in git commit options
doc: the mode param of -u of git commit is optional
doc: apply new documentation guidelines to git commit
|
|
Introduce a new API to visit objects in batches based on a common
path, or by type.
* ds/path-walk-1:
path-walk: drop redundant parse_tree() call
path-walk: reorder object visits
path-walk: mark trees and blobs as UNINTERESTING
path-walk: visit tags and cached objects
path-walk: allow consumer to specify object types
t6601: add helper for testing path-walk API
test-lib-functions: add test_cmp_sorted
path-walk: introduce an object walk by path
|
|
Introduce libgit-sys, a Rust wrapper crate that allows Rust code to call
functions in libgit.a. This initial patch defines build rules and an
interface that exposes user agent string getter functions as a proof of
concept. This library can be tested with `cargo test`. In later commits,
a higher-level library containing a more Rust-friendly interface will be
added at `contrib/libgit-rs`.
Symbols in libgit can collide with symbols from other libraries such as
libgit2. We avoid this by first exposing library symbols in
public_symbol_export.[ch]. These symbols are prepended with "libgit_" to
avoid collisions and set to visible using a visibility pragma. In
build.rs, Rust builds contrib/libgit-rs/libgit-sys/libgitpub.a, which also
contains libgit.a and other dependent libraries, with
-fvisibility=hidden to hide all symbols within those libraries that
haven't been exposed with a visibility pragma.
Co-authored-by: Kyle Lippincott <spectral@google.com>
Co-authored-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Currently, object files in libgit.a reference common_exit(), which is
contained in common-main.o. However, common-main.o also includes main(),
which references cmd_main() in git.o, which in turn depends on all the
builtin/*.o objects.
We would like to allow external users to link libgit.a without needing
to include so many extra objects. Enable this by splitting common_exit()
and check_bug_if_BUG() into a new file common-exit.c, and add
common-exit.o to LIB_OBJS so that these are included in libgit.a.
This split has previously been proposed ([1], [2]) to support fuzz tests
and unit tests by avoiding conflicting definitions for main(). However,
both of those issues were resolved by other methods of avoiding symbol
conflicts. Now we are trying to make libgit.a more self-contained, so
hopefully we can revisit this approach.
Additionally, move the initialization code out of main() into a new
init_git() function in its own file. Include this in libgit.a as well,
so that external users can share our setup code without calling our
main().
[1] https://lore.kernel.org/git/Yp+wjCPhqieTku3X@google.com/
[2] https://lore.kernel.org/git/20230517-unit-tests-v2-v2-1-21b5b60f4b32@google.com/
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We don't yet have any test coverage for the new zlib-ng backend as part
of our CI. Add it by installing zlib-ng in Alpine Linux, which causes
Meson to pick it up automatically.
Note that we are somewhat limited with regards to where we run that job:
Debian-based distributions don't have zlib-ng in their repositories,
Fedora has it but doesn't run tests, and Alma Linux doesn't have the
package either. Alpine Linux does have it available and is running our
test suite, which is why it was picked.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Switch over the "linux-musl" job to use Meson instead of Makefiles. This
is done due to multiple reasons:
- It simplifies our CI infrastructure a bit as we don't have to
manually specify a couple of build options anymore.
- It verifies that Meson detects and sets those build options
automatically.
- It makes it easier for us to wire up a new CI job using zlib-ng as
backend.
One platform compatibility that Meson cannot easily detect automatically
is the `GIT_TEST_UTF8_LOCALE` variable used in tests. Wire up a build
option for it, which we set via a new "MESONFLAGS" environment variable.
Note that we also drop the CC variable, which is set to "gcc". We
already default to GCC when CC is unset in "ci/lib.sh", so this is not
needed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The zlib-ng library is a hard fork of the old and venerable zlib
library. It describes itself as zlib replacement with optimizations for
"next generation" systems. As such, it contains several implementations
of central algorithms using for example SSE2, AVX2 and other vectorized
CPU intrinsics that supposedly speed up in- and deflating data.
And indeed, compiling Git against zlib-ng leads to a significant speedup
when reading objects. The following benchmark uses git-cat-file(1) with
`--batch --batch-all-objects` in the Git repository:
Benchmark 1: zlib
Time (mean ± σ): 52.085 s ± 0.141 s [User: 51.500 s, System: 0.456 s]
Range (min … max): 52.004 s … 52.335 s 5 runs
Benchmark 2: zlib-ng
Time (mean ± σ): 40.324 s ± 0.134 s [User: 39.731 s, System: 0.490 s]
Range (min … max): 40.135 s … 40.484 s 5 runs
Summary
zlib-ng ran
1.29 ± 0.01 times faster than zlib
So we're looking at a ~25% speedup compared to zlib. This is of course
an extreme example, as it makes us read through all objects in the
repository. But regardless, it should be possible to see some sort of
speedup in most commands that end up accessing the object database.
The zlib-ng library provides a compatibility layer that makes it a
proper drop-in replacement for zlib: nothing needs to change in the
build system to support it. Unfortunately though, this mode isn't easy
to use on most systems because distributions do not allow you to install
zlib-ng in that way, as that would mean that the zlib library would be
globally replaced. Instead, many distributions provide a package that
installs zlib-ng without the compatibility layer. This version does
provide effectively the same APIs like zlib does, but all of the symbols
are prefixed with `zng_` to avoid symbol collisions.
Implement a new build option that allows us to link against zlib-ng
directly. If set, we redefine zlib symbols so that we use the `zng_`
prefixed versions thereof provided by that library. Like this, it
becomes possible to install both zlib and zlib-ng (without the compat
layer) and then pick whichever library one wants to link against for
Git.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `struct git_zstream::next_in` variable points to the input data and
is used in combination with `struct z_stream::next_in`. While that
latter field is not marked as a constant in zlib, it is marked as such
in zlib-ng. This causes a couple of compiler errors when we try to
assign these fields to one another due to mismatching constness.
Fix the issue by casting away the potential constness of `next_in`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function `deflateSetHeader()` has been introduced with zlib v1.2.2.1,
so we don't use it when linking against an older version of it. Refactor
the code to instead provide a central stub via "compat/zlib.h" so that
we can adapt it based on whether or not we use zlib-ng in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `deflateBound()` function has only been introduced with zlib 1.2.0.
When linking against a zlib version older than that we thus provide our
own compatibility shim. Move this shim into "compat/zlib.h" so that we
can adapt it based on whether or not we use zlib-ng in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We include "compat/zlib.h" in "git-compat-util.h", which is
unnecessarily broad given that we only have a small handful of files
that use the zlib library. Move the header into "git-zlib.h" instead and
adapt users of zlib to include that header.
One exception is the reftable library, as we don't want to use the
Git-specific wrapper of zlib there, so we include "compat/zlib.h"
instead. Furthermore, we move the include into "reftable/system.h" so
that users of the library other than Git can wire up zlib themselves.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new "compat/zlib-compat.h" header that we include instead of
including <zlib.h> directly. This will allow us to wire up zlib-ng as an
alternative backend for zlib compression in a subsequent commit.
Note that we cannot just call the file "compat/zlib.h", as that may
otherwise cause us to include that file instead of <zlib.h>.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Before including <zlib.h> we explicitly define `z_const` to an empty
value. This has the effect that the `z_const` macro in "zconf.h" itself
will remain empty instead of being defined as `const`, which effectively
adapts a couple of APIs so that their parameters are not marked as being
constants.
It is dubious though whether this is something we actually want: not
marking a parameter as a constant doesn't make it any less constant than
it was. The define was added via 07564773c2 (compat: auto-detect if zlib
has uncompress2(), 2022-01-24), where it was seemingly carried over from
our internal compatibility shim for `uncompress2()` that was removed in
the preceding commit. The commit message doesn't mention why we carry
over the define and make it public, either, and I cannot think of any
reason for why we would want to have it.
Drop the define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Our compat library has an implementation of zlib's `uncompress2()`
function that gets used when linking against an old version of zlib
that doesn't yet have it. The last user of `uncompress2()` got removed
in 15a60b747e (reftable/block: open-code call to `uncompress2()`,
2024-04-08), so the compatibility code is not required anymore. Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Test fix.
* jp/t8002-printf-fix:
t8002: fix ambiguous printf conversion specifications
|
|
The reftable/ library code has been made -Wsign-compare clean.
* ps/reftable-sign-compare:
reftable: address trivial -Wsign-compare warnings
reftable/blocksource: adjust `read_block()` to return `ssize_t`
reftable/blocksource: adjust type of the block length
reftable/block: adjust type of the restart length
reftable/block: adapt header and footer size to return a `size_t`
reftable/basics: adjust `hash_size()` to return `uint32_t`
reftable/basics: adjust `common_prefix_size()` to return `size_t`
reftable/record: handle overflows when decoding varints
reftable/record: drop unused `print` function pointer
meson: stop disabling -Wsign-compare
|
|
The "cache" credential back-end did not handle authtype correctly,
which has been corrected.
* mh/credential-cache-authtype-request-fix:
credential-cache: respect authtype capability
|
|
It was possible for "git unpack-objects" and "git index-pack" to
make an unaligned access, which has been corrected.
* jk/pack-header-parse-alignment-fix:
index-pack, unpack-objects: use skip_prefix to avoid magic number
index-pack, unpack-objects: use get_be32() for reading pack header
parse_pack_header_option(): avoid unaligned memory writes
packfile: factor out --pack_header argument parsing
bswap.h: squelch potential sparse -Wcast-truncate warnings
|
|
The meson-driven build is now aware of "git-subtree" housed in
contrib/subtree hierarchy.
* ps/build-meson-subtree:
meson: wire up the git-subtree(1) command
meson: introduce build option for contrib
contrib/subtree: fix building docs
|
|
The code in connect.c has been updated to work around complaints
from -Wsign-compare.
* mh/connect-sign-compare:
connect: address -Wsign-compare warnings
|
|
Move a few more unit tests to the clar test framework.
* sk/unit-tests:
t/unit-tests: convert reftable tree test to use clar test framework
t/unit-tests: adapt priority queue test to use clar test framework
t/unit-tests: convert mem-pool test to use clar test framework
t/unit-tests: handle dashes in test suite filenames
|
|
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others. The built-in commands
have been fixed to show them on the standard output consistently.
* jc/show-usage-help:
builtin: send usage() help text to standard output
oddballs: send usage() help text to standard output
builtins: send usage_with_options() help text to standard output
usage: add show_usage_if_asked()
parse-options: add show_usage_with_options_if_asked()
t0012: optionally check that "-h" output goes to stdout
|
|
When the --name-hash-version option is used in 'git pack-objects', it
can change from the initial assignment to when it is used based on
interactions with other arguments. Specifically, when writing or reading
bitmaps, we must force version 1 for now. This could change in the
future when the bitmap format can store a name hash version value,
indicating which was used during the writing of the packfile.
Protect the 'git pack-objects' process from getting confused by failing
with a BUG() statement if the value of the name hash version changes
between calls to pack_name_hash_fn().
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new test-tool helper, name-hash, to output the value of the
name-hash algorithms for the input list of strings, one per line.
Since the name-hash values can be stored in the .bitmap files, it is
important that these hash functions do not change across Git versions.
Add a simple test to t5310-pack-bitmaps.sh to provide some testing of
the current values. Due to how these functions are implemented, it would
be difficult to change them without disturbing these values. The paths
used for this test are carefully selected to demonstrate some of the
behavior differences of the two current name hash versions, including
which conditions will cause them to collide.
Create a performance test that uses test_size to demonstrate how
collisions occur for these hash algorithms. This test helps inform
someone as to the behavior of the name-hash algorithms for their repo
based on the paths at HEAD.
My copy of the Git repository shows modest statistics around the
collisions of the default name-hash algorithm:
Test this tree
--------------------------------------------------
5314.1: paths at head 4.5K
5314.2: distinct hash value: v1 4.1K
5314.3: maximum multiplicity: v1 13
5314.4: distinct hash value: v2 4.2K
5314.5: maximum multiplicity: v2 9
Here, the maximum collision multiplicity is 13, but around 10% of paths
have a collision with another path.
In a more interesting example, the microsoft/fluentui [1] repo had these
statistics at time of committing:
Test this tree
--------------------------------------------------
5314.1: paths at head 19.5K
5314.2: distinct hash value: v1 8.2K
5314.3: maximum multiplicity: v1 279
5314.4: distinct hash value: v2 17.8K
5314.5: maximum multiplicity: v2 44
[1] https://github.com/microsoft/fluentui
That demonstrates that of the nearly twenty thousand path names, they
are assigned around eight thousand distinct values. 279 paths are
assigned to a single value, leading the packing algorithm to sort
objects from those paths together, by size.
With the v2 name hash function, the maximum multiplicity lowers to 44,
leaving some room for further improvement.
In a more extreme example, an internal monorepo had a much worse
collision rate:
Test this tree
--------------------------------------------------
5314.1: paths at head 227.3K
5314.2: distinct hash value: v1 72.3K
5314.3: maximum multiplicity: v1 14.4K
5314.4: distinct hash value: v2 166.5K
5314.5: maximum multiplicity: v2 138
Here, we can see that the v2 name hash function provides somem
improvements, but there are still a number of collisions that could lead
to repacking problems at this scale.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As custom options are added to 'git pack-objects' and 'git repack' to
adjust how compression is done, use this new performance test script to
demonstrate their effectiveness in performance and size.
The recently-added --name-hash-version option allows for testing
different name hash functions. Version 2 intends to preserve some of the
locality of version 1 while more often breaking collisions due to long
filenames.
Distinguishing objects by more of the path is critical when there are
many name hash collisions and several versions of the same path in the
full history, giving a significant boost to the full repack case. The
locality of the hash function is critical to compressing something like
a shallow clone or a thin pack representing a push of a single commit.
This can be seen by running pt5313 on the open source fluentui
repository [1]. Most commits will have this kind of output for the thin
and big pack cases, though certain commits (such as [2]) will have
problematic thin pack size for other reasons.
[1] https://github.com/microsoft/fluentui
[2] a637a06df05360ce5ff21420803f64608226a875
Checked out at the parent of [2], I see the following statistics:
Test HEAD
---------------------------------------------------------------
5313.2: thin pack with version 1 0.37(0.44+0.02)
5313.3: thin pack size with version 1 1.2M
5313.4: big pack with version 1 2.04(7.77+0.23)
5313.5: big pack size with version 1 20.4M
5313.6: shallow fetch pack with version 1 1.41(2.94+0.11)
5313.7: shallow pack size with version 1 34.4M
5313.8: repack with version 1 95.70(676.41+2.87)
5313.9: repack size with version 1 439.3M
5313.10: thin pack with version 2 0.12(0.12+0.06)
5313.11: thin pack size with version 2 22.0K
5313.12: big pack with version 2 2.80(5.43+0.34)
5313.13: big pack size with version 2 25.9M
5313.14: shallow fetch pack with version 2 1.77(2.80+0.19)
5313.15: shallow pack size with version 2 33.7M
5313.16: repack with version 2 33.68(139.52+2.58)
5313.17: repack size with version 2 160.5M
To make comparisons easier, I will reformat this output into a different
table style:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.37 s | 0.12 s | 1.2 M | 22.0 K |
| Big Pack | 2.04 s | 2.80 s | 20.4 M | 25.9 M |
| Shallow Pack | 1.41 s | 1.77 s | 34.4 M | 33.7 M |
| Repack | 95.70 s | 33.68 s | 439.3 M | 160.5 M |
The v2 hash function successfully differentiates the CHANGELOG.md files
from each other, which leads to significant improvements in the thin
pack (simulating a push of this commit) and the full repack. There is
some bloat in the "big pack" scenario and essentially the same results
for the shallow pack.
In the case of the Git repository, these numbers show some of the issues
with this approach:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.02 s | 0.02 s | 1.1 K | 1.1 K |
| Big Pack | 1.69 s | 1.95 s | 13.5 M | 14.5 M |
| Shallow Pack | 1.26 s | 1.29 s | 12.0 M | 12.2 M |
| Repack | 29.51 s | 29.01 s | 237.7 M | 238.2 M |
Here, the attempts to remove conflicts in the v2 function seem to cause
slight bloat to these sizes. This shows that the Git repository benefits
a lot from cross-path delta pairs.
The results are similar with the nodejs/node repo:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.02 s | 0.02 s | 1.6 K | 1.6 K |
| Big Pack | 4.61 s | 3.26 s | 56.0 M | 52.8 M |
| Shallow Pack | 7.82 s | 7.51 s | 104.6 M | 107.0 M |
| Repack | 88.90 s | 73.75 s | 740.1 M | 764.5 M |
Here, the v2 name-hash causes some size bloat more often than it reduces
the size, but it also universally improves performance time, which is an
interesting reversal. This must mean that it is helping to short-circuit
some delta computations even if it is not finding the most efficient
ones. The performance improvement cannot be explained only due to the
I/O cost of writing the resulting packfile.
The Linux kernel repository was the initial target of the default name
hash value, and its naming conventions are practically build to take the
most advantage of the default name hash values:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|----------|----------|---------|---------|
| Thin Pack | 0.17 s | 0.07 s | 4.6 K | 4.6 K |
| Big Pack | 17.88 s | 12.35 s | 201.1 M | 159.1 M |
| Shallow Pack | 11.05 s | 22.94 s | 269.2 M | 273.8 M |
| Repack | 727.39 s | 566.95 s | 2.5 G | 2.5 G |
Here, the thin and big packs gain some performance boosts in time, with
a modest gain in the size of the big pack. The shallow pack, however, is
more expensive to compute, likely because similarly-named files across
different directories are farther apart in the name hash ordering in v2.
The repack also gains benefits in computation time but no meaningful
change to the full size.
Finally, an internal Javascript repo of moderate size shows significant
gains when repacking with --name-hash-version=2 due to it having many name
hash collisions. However, it's worth noting that only the full repack
case has significant differences from the v1 name hash:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|-----------|-----------|----------|---------|---------|
| Thin Pack | 8.28 s | 7.28 s | 16.8 K | 16.8 K |
| Big Pack | 12.81 s | 11.66 s | 29.1 M | 29.1 M |
| Shallow | 4.86 s | 4.06 s | 42.5 M | 44.1 M |
| Repack | 3126.50 s | 496.33 s | 6.2 G | 855.6 M |
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new environment variable to opt-in to different values of the
--name-hash-version=<n> option in 'git pack-objects'. This allows for
extra testing of the feature without repeating all of the test
scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add
this to the linux-TEST-vars CI build as that test run is already
overloaded. The behavior exposed by this test variable is of low risk
and should be sufficient to allow manual testing when an issue arises.
But this option isn't free. There are a few tests that change behavior
with the variable enabled.
First, there are a few tests that are very sensitive to certain delta
bases being picked. These are both involving the generation of thin
bundles and then counting their objects via 'git index-pack --fix-thin'
which pulls the delta base into the new packfile. For these tests,
disable the option as a decent long-term option.
Second, there are some tests that compare the exact output of a 'git
pack-objects' process when using bitmaps. The warning that ignores the
--name-hash-version=2 and forces version 1 causes these tests to fail.
Disable the environment variable to get around this issue.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The new '--name-hash-version' option for 'git repack' is a simple
pass-through to the underlying 'git pack-objects' subcommand. However,
this subcommand may have other options and a temporary filename as part
of the subcommand execution that may not be predictable or could change
over time.
The existing test_subcommand method requires an exact list of arguments
for the subcommand. This is too rigid for our needs here, so create a
new method, test_subcommand_flex. Use it to check that the
--name-hash-version option is passing through.
Since we are modifying the 'git repack' command, let's bring its usage
in line with the Documentation's synopsis. This removes it from the
allow list in t0450 so it will remain in sync in the future.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous change introduced a new pack_name_hash_v2() function that
intends to satisfy much of the hash locality features of the existing
pack_name_hash() function while also distinguishing paths with similar
final components of their paths.
This change adds a new --name-hash-version option for 'git pack-objects'
to allow users to select their preferred function version. This use of
an integer version allows for future expansion and a direct way to later
store a name hash version in the .bitmap format.
For now, let's consider how effective this mechanism is when repacking a
repository with different name hash versions. Specifically, we will
execute 'git pack-objects' the same way a 'git repack -adf' process
would, except we include --name-hash-version=<n> for testing.
On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 260 MB | N/A |
| --name-hash-version=1 | 127 MB | 129s |
| --name-hash-version=2 | 127 MB | 112s |
This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics for both versions.
Let's consider some repositories that are hitting too many collisions
with version 1. First, let's explore the kinds of paths that are
commonly causing these collisions:
* "/CHANGELOG.json" is 15 characters, and is created by the beachball
[1] tool. Only the final character of the parent directory can
differentiate different versions of this file, but also only the two
most-significant digits. If that character is a letter, then this is
always a collision. Similar issues occur with the similar
"/CHANGELOG.md" path, though there is more opportunity for
differences In the parent directory.
* Localization files frequently have common filenames but
differentiates via parent directories. In C#, the name
"/strings.resx.lcl" is used for these localization files and they
will all collide in name-hash.
[1] https://github.com/microsoft/beachball
I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.
One open-source example is the fluentui [2] repo, which uses beachball
to generate CHANGELOG.json and CHANGELOG.md files, and these files have
very poor delta characteristics when comparing against versions across
parent directories.
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 694 MB | N/A |
| --name-hash-version=1 | 438 MB | 728s |
| --name-hash-version=2 | 168 MB | 142s |
[2] https://github.com/microsoft/fluentui
In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.
Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:
| Repo | --name-hash-version=1 | --name-hash-version=2 |
|----------|-----------------------|-----------------------|
| fluentui | 440 MB | 161 MB |
| Repo B | 6,248 MB | 856 MB |
| Repo C | 37,278 MB | 6,755 MB |
| Repo D | 131,204 MB | 7,463 MB |
Future changes could include making --name-hash-version implied by a config
value or even implied by default during a full repack.
It is important to point out that the name hash value is stored in the
.bitmap file format, so we must force --name-hash-version=1 when bitmaps
are being read or written. Later, the bitmap format could be updated to
be aware of the name hash version so deltas can be quickly computed
across the bitmapped/not-bitmapped boundary. To promote the safety of
this parameter, the validate_name_hash_version() method will die() if
the given name-hash version is incorrect and will disable newer versions
if not yet compatible with other features, such as --write-bitmap-index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As we will explore in later changes, the default name-hash function used
in 'git pack-objects' has a tendency to cause collisions and cause poor
delta selection. This change creates an alternative that avoids some
collisions while preserving some amount of hash locality.
The pack_name_hash() method has not been materially changed since it was
introduced in ce0bd64 (pack-objects: improve path grouping
heuristics., 2006-06-05). The intention here is to group objects by path
name, but also attempt to group similar file types together by making
the most-significant digits of the hash be focused on the final
characters.
Here's the crux of the implementation:
/*
* This effectively just creates a sortable number from the
* last sixteen non-whitespace characters. Last characters
* count "most", so things that end in ".c" sort together.
*/
while ((c = *name++) != 0) {
if (isspace(c))
continue;
hash = (hash >> 2) + (c << 24);
}
As the comment mentions, this only cares about the last sixteen
non-whitespace characters. This cause some filenames to collide more than
others. This collision is somewhat by design in order to promote hash
locality for files that have similar types (.c, .h, .json) or could be the
same file across a directory rename (a/foo.txt to b/foo.txt). This leads to
decent cross-path deltas in cases like shallow clones or packing a
repository with very few historical versions of files that share common data
with other similarly-named files.
However, when the name-hash instead leads to a large number of name-hash
collisions for otherwise unrelated files, this can lead to confusing the
delta calculation to prefer cross-path deltas over previous versions of the
same file.
The new pack_name_hash_v2() function attempts to fix this issue by
taking more of the directory path into account through its hash
function. Its naming implies that we will later wire up details for
choosing a name-hash function by version.
The first change is to be more careful about paths using non-ASCII
characters. With these characters in mind, reverse the bits in the byte
as the least-significant bits have the highest entropy and we want to
maximize their influence. This is done with some bit manipulation that
swaps the two halves, then the quarters within those halves, and then
the bits within those quarters.
The second change is to perform hash composition operations at every
level of the path. This is done by storing a 'base' hash value that
contains the hash of the parent directory. When reaching a directory
boundary, we XOR the current level's name-hash value with a downshift of
the previous level's hash. This perturbation intends to create low-bit
distinctions for paths with the same final 16 bytes but distinct parent
directory structures.
The collision rate and effectiveness of this hash function will be
explored in later changes as the function is integrated with 'git
pack-objects' and 'git repack'.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When migrating reflogs between reference backends, maintaining the
original order of the reflog entries is crucial. To achieve this, an
`index` field is stored within the `ref_update` struct that encodes the
relative order of reflog entries. This field is used by the reftable
backend as update index for the respective reflog entries to maintain
that ordering.
These update indices must be respected when writing table headers, which
encode the minimum and maximum update index of contained records in the
header and footer. This logic was added in commit bc67b4ab5f (reftable:
write correct max_update_index to header, 2025-01-15), which started to
use `reftable_writer_set_limits()` to propagate the mininum and maximum
update index of all records contained in a ref transaction.
However, we only set the maximum update index for the first transaction
argument, even though there can be multiple such arguments. This is the
case when we write to multiple stacks in a single transaction, e.g. when
updating references in two different worktrees at once. Consequently,
the update index for all but the first argument remain uninitialized,
which may cause undefined behaviour.
Fix this by moving the assignment of the maximum update index in
`reftable_be_transaction_finish()` inside the loop, which ensures that
all elements of the array are correctly initialized.
Furthermore, initialize the `max_index` field to 0 when queueing a new
transaction argument. This is not strictly necessary, as all elements of
`write_transaction_table_arg.max_index` are now assigned correctly.
However, this initialization is added for consistency and to safeguard
against potential future changes that might inadvertently introduce
uninitialized memory access.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Already when introduced in c7a8a16239 (Add bundle transport,
2007-09-10), the `bundle` transport had a bug where it would open a file
descriptor to the bundle file and then close it _twice_: First, the file
descriptor (`data->fd`) is passed to `unbundle()`, which would use it as
the `stdin` of the `index-pack` process, which as a consequence would
close it via `start_command()`. However, `data->fd` would still hold the
numerical value of the file descriptor, and `close_bundle()` would see
that and happily close it again.
This seems not to have caused too many problems in almost two decades,
but I encountered a situation today where it _does_ cause problems: In
i686 variants of Git for Windows, it seems that file descriptors are
reused quickly after they have been closed.
In the particular scenario I faced, `git fetch <bundle> <ref>` gets the
same file descriptor value when opening the bundle file and importing
its embedded packfile (which implicitly closes the file descriptor) and
then when opening a pack file in `fetch_and_consume_refs()` while
looking up an object's header.
Later on, after the bundle has been imported (and the `close_bundle()`
function erroneously closes the file descriptor that has _already_ been
closed when using it as `stdin` for `git index-pack`), the same file
descriptor value has now been reused via `use_pack()`. Now, when either
the recursive fetch (which defaults to "on", unfortunately) or a
commit-graph update needs to `mmap()` the packfile, it fails due to a
now-invalid file descriptor that _should_ point to the pack file but
doesn't anymore.
To fix that, let's invalidate `data->fd` after calling `unbundle()`.
That way, `close_bundle()` does not close a file descriptor that may
have been reused for something different. While at it, document that
`unbundle()` closes the file descriptor, and ensure that it also does
that when failing to verify the bundle.
Luckily, this bug does not affect the bundle URI feature, it only
affects the `git fetch <bundle>` code path.
Note that this patch does not _completely_ clarifies who is responsible
to close that file descriptor, as `run_command()` may fail _without_
closing `cmd->in`. Addressing this issue thoroughly, however, would
require a rather thorough re-design of the `start_command()` and
`finish_command()` functionality to make it a lot less murky who is
responsible for what file descriptors.
At least this here patch is relatively easy to reason about, and
addresses a hard failure (`fatal: mmap: could not determine filesize`)
at the expense of leaking a file descriptor under very rare
circumstances in which `git fetch` would error out anyway.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This commit extends the functionality of `git gc`
by adding a new option, `--expire-to=<dir>`. Previously,
this feature was implemented in 91badeba32 (builtin/repack.c:
implement `--expire-to` for storing pruned objects, 2022-10-24),
which allowing users to specify a directory where unreachable
and expired cruft packs are stored during garbage collection.
However, users had to run `git repack --cruft --expire-to=<dir>`
followed by `git prune` to achieve similar results within `git gc`.
By introducing `--expire-to=<dir>` directly into `git gc`,
we simplify the process for users who wish to manage their
repository's cleanup more efficiently. This change involves
passing the `--expire-to=<dir>` parameter through to `git repack`,
making it easier for users to set up a backup location for cruft
packs that will be pruned.
Due to the original `git gc --prune=now` deleting all unreachable
objects by passing the `-a` parameter to git repack. With the
addition of the `--cruft` and `--expire-to` options, it is necessary
to modify this default behavior: instead of deleting these
unreachable objects, they should be merged into a cruft pack and
collected in a specified directory. Therefore, we do not pass `-a`
to the repack command but instead pass `--cruft`, `--expire-to`,
and `--cruft-expiration=now` to repack.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The trailer.* configuration variables are currently only described in
git-interpret-trailers(1) but affect git-commit and git-tag as well.
Move that section into its own config/trailer.txt file and also include
it in git-config(1).
Signed-off-by: Julian Prein <julian@druckdev.xyz>
Acked-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Back when Git was in its infancy, remotes were configured via separate
files in "branches/" (back in 2005). This mechanism was replaced later
that year with the "remotes/" directory. Both mechanisms have eventually
been replaced by config-based remotes, and it is very unlikely that
anybody still uses these directories to configure their remotes.
Both of these directories have been marked as deprecated, one in 2005
and the other one in 2011. Follow through with the deprecation and
finally announce the removal of these features in Git 3.0.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: with a small tweak to the help message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Doc update.
* jc/cli-doc-option-and-config:
gitcli: document that command line trumps config and env
|
|
Document that it is insecure to use Personal Access Tokens, which
some hosting providers take as username/password, embedded in URLs.
* mh/doc-credential-helpers-with-pat:
docs: discuss caching personal access tokens
docs: list popular credential helpers
|
|
The "instaweb" bound only to local IP address without "--local" and
to all addresses with "--local", which was the other way around, when
using Python's http.server class, which has been corrected.
* ak/instaweb-python-port-binding-fix:
instaweb: fix ip binding for the python http.server
|
|
The meson build procedure for Documentation/technical/ hierarchy was
missing necessary dependencies, which has been corrected.
* sj/meson-doc-technical-dependency-fix:
meson: fix missing deps for technical articles
|
|
The meson build procedure looked for the 'version-def.h' file in a
wrong directory, which has been corrected.
* tc/meson-use-our-version-def-h:
meson: ensure correct version-def.h is used
|
|
Extended SHA-1 expression parser did not work well when a branch
with an unusual name (e.g. "foo{bar") is involved.
* en/object-name-with-funny-refname-fix:
object-name: be more strict in parsing describe-like output
object-name: fix resolution of object names containing curly braces
|
|
Now that all callers have been converted from:
the_hash_algo->unsafe_init_fn();
to
unsafe_hash_algo(the_hash_algo)->init_fn();
and similar, we can remove the scaffolding for the unsafe_ function
variants and force callers to use the new unsafe_hash_algo() mechanic
instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1
backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with
unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to
initialize a hashfile_checkpoint with the same hash function
implementation as is used by the hashfile it is used to checkpoint.
While both 106140a99f and 9218c0bfe1 work around the immediate crash,
changing the hash function implementation within the hashfile API to,
for example, the non-unsafe variant would re-introduce the crash. This
is a result of the tight coupling between initializing hashfiles and
hashfile_checkpoints.
Introduce and use a new function which ensures that both parts of a
hashfile and hashfile_checkpoint pair use the same hash function
implementation to avoid such crashes.
A few things worth noting:
- In the change to builtin/fast-import.c::stream_blob(), we can see
that by removing the explicit reference to
'the_hash_algo->unsafe_init_fn()', we are hardened against the
hashfile API changing away from the_hash_algo (or its unsafe
variant) in the future.
- The bulk-checkin code no longer needs to explicitly zero-initialize
the hashfile_checkpoint, since it is now done as a result of calling
'hashfile_checkpoint_init()'.
- Also in the bulk-checkin code, we add an additional call to
prepare_to_stream() outside of the main loop in order to initialize
'state->f' so we know which hash function implementation to use when
calling 'hashfile_checkpoint_init()'.
This is OK, since subsequent 'prepare_to_stream()' calls are noops.
However, we only need to call 'prepare_to_stream()' when we have the
HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling
'prepare_to_stream()' does not assign 'state->f', so we have nothing
to initialize.
- Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are
appropriately guarded.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove a series of conditionals within the shared cmd_hash_impl() helper
that powers the 'sha1' and 'sha1-unsafe' helpers.
Instead, replace them with a single conditional that transforms the
specified hash algorithm into its unsafe variant. Then all subsequent
calls can directly use whatever function it wants to call without having
to decide between the safe and unsafe variants.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Instead of calling the unsafe_ hash function variants directly, make use
of the shared 'algop' pointer by initializing it to:
f->algop = unsafe_hash_algo(the_hash_algo);
, thus making all calls use the unsafe variants directly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants,
2024-09-26), we introduced "unsafe" variants of the SHA-1 hashing
functions by introducing new functions like "unsafe_init_fn()" and so
on.
This approach has a major shortcoming that callers must remember to
consistently use one variant or the other. Failing to consistently use
(or not use) the unsafe variants can lead to crashes at best, or subtle
memory corruption issues at worst.
In the hashfile API, this isn't difficult to achieve, but verifying that
all callers consistently use the unsafe variants is somewhat of a chore
given how spread out all of the callers are. In the sha1 and sha1-unsafe
test helpers, all of the calls to various hash functions are guarded by
an "if (unsafe)" conditional, which is repetitive and cumbersome.
Address these issues by introducing a new pattern whereby one
'git_hash_algo' can return a pointer to another 'git_hash_algo' that
represents the unsafe version of itself. So instead of having something
like:
if (unsafe)
the_hash_algo->init_fn(...);
the_hash_algo->update_fn(...);
the_hash_algo->final_fn(...);
else
the_hash_algo->unsafe_init_fn(...);
the_hash_algo->unsafe_update_fn(...);
the_hash_algo->unsafe_final_fn(...);
we can instead write:
struct git_hash_algo *algop = the_hash_algo;
if (unsafe)
algop = unsafe_hash_algo(algop);
algop->init_fn(...);
algop->update_fn(...);
algop->final_fn(...);
This removes the existing shortcoming by no longer forcing the caller to
"remember" which variant of the hash functions it wants to call, only to
hold onto a 'struct git_hash_algo' pointer that is initialized once.
Similarly, while there currently is still a way to "mix" safe and unsafe
functions, this too will go away after subsequent commits remove all
direct calls to the unsafe_ variants.
Note that hash_algo_by_ptr() needs an adjustment to allow passing in the
unsafe variant of a hash function. All other query functions on the
hash_algos array will continue to return the safe variants of any
function.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Perform a similar transformation as in the previous commit, but focused
instead on hashfile_checksum_valid(). This function does not work with a
hashfile structure itself, and instead validates the raw contents of a
file written using the hashfile API.
We'll want to be prepared for a similar change to this function in the
future, so prepare ourselves for that by extracting 'the_hash_algo' into
its own field for use within this function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Throughout the hashfile API, we rely on a reference to 'the_hash_algo',
and call its _unsafe function variants directly.
Prepare for a future change where we may use a different 'git_hash_algo'
pointer (instead of just relying on 'the_hash_algo' throughout) by
making the 'git_hash_algo' pointer a member of the 'hashfile' structure
itself.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With the new "unsafe" SHA-1 build knob, it is convenient to have a
test-tool that can exercise Git's unsafe SHA-1 wrappers for testing,
similar to 't/helper/test-tool sha1'.
Implement that helper by altering the implementation of that test-tool
(in cmd_hash_impl(), which is generic and parameterized over different
hash functions) to conditionally run the unsafe variants of the chosen
hash function, and expose the new behavior via a new 'sha1-unsafe' test
helper.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When TRACE2 analytics is enabled, a configuration variable set to
"valueless true" causes a segfault.
Steps to Reproduce
GIT_TRACE2=true GIT_TRACE2_CONFIG_PARAMS=status.*
git -c status.relativePaths version
Expected Result
git version 2.46.0
Actual Result
zsh: segmentation fault GIT_TRACE2=true
Add checks to prevent the segfault and instead show that the
variable without value.
Signed-off-by: Adam Murray <ad@canva.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The commit 297c09eabb (refs: allow multiple reflog entries for the
same refname, 2024-12-16) added logic to exit early in
`lock_ref_for_update()` after obtaining the required lock. This was
added as a performance optimization on a false assumption that no
further processing was required for reflog-only updates.
However the assumption was wrong. For a symref's reflog entry, the
update needs to be populated with the old_oid value, but the early
exit skipped this necessary step.
This caused a bug in Git 2.48 in the files backend where target
references of symrefs being updated would create a corrupted reflog
entry for the symref since the old_oid is not populated.
Everything the early exit skipped in the code path is necessary for
both regular and symbolic ref, so eliminate the mistaken
optimization, and also add a test to ensure that such an issue
doesn't arise in the future.
Reported-by: Nika Layzell <nika@thelayzells.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This call to parse_tree() was flagged by Coverity for ignoring the
return value. But if we look a little further up the function, we can
see that there is already a call to parse_tree_gently(), and we'll
return early if that fails. So by this point the tree will always be
parsed, and the call is redundant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/build-meson-fixes:
ci: wire up Visual Studio build with Meson
ci: raise error when Meson generates warnings
meson: fix compilation with Visual Studio
meson: make the CSPRNG backend configurable
meson: wire up fuzzers
meson: wire up generation of distribution archive
meson: wire up development environments
meson: fix dependencies for generated headers
meson: populate project version via GIT-VERSION-GEN
GIT-VERSION-GEN: allow running without input and output files
GIT-VERSION-GEN: simplify computing the dirty marker
|
|
Add a new job to GitHub Actions and GitLab CI that builds and tests
Meson-based builds with Visual Studio.
A couple notes:
- While the build job is mandatory, the test job is marked as "manual"
on GitLab so that it doesn't run by default. We already have a bunch
of Windows-based jobs, and the computational overhead that these
cause is simply out of proportion to run the test suite twice.
The same isn't true for GitHub as I could not find a way to make a
subset of jobs manually triggered.
- We disable Perl. This is because we pick up Perl from Git for
Windows, which outputs different paths ("/c/" instead of "C:\") than
what we expect in our tests.
- We don't use the Git for Windows SDK. Instead, the build only
depends on Visual Studio, Meson and Git for Windows. All the other
dependencies like curl, pcre2 and zlib get pulled in and compiled
automatically by Meson and thus do not have to be provided by the
system.
- We open-code "ci/run-test-slice.sh". This is because we only have
direct access to PowerShell, so we manually implement the logic.
There is an upstream pull request for the Meson build system [1] to
implement test slicing in Meson directly.
- We don't process test artifacts for failed CI jobs. This is done to
keep down prerequisites to a minimum.
All tests are passing.
[1]: https://github.com/mesonbuild/meson/pull/14092
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Meson prints warnings in several cases, like for example when using a
feature supported by the current version of Meson, but not yet supported
by the minimum required version as declared by the project. These
warnings will not cause the setup to fail by default, which makes it
quite easy to miss them.
Improve this by passing `--fatal-meson-warnings` to `meson setup` so
that our CI jobs will fail on warnings.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The Visual Studio compiler defaults to C89 unless explicitly asked to
use a different version of the C standard. We don't specify any C
standard at all though in our Meson build, and consequently compiling
Git fails:
...\git\git-compat-util.h(14): fatal error C1189: #error: "Required C99 support is in a test phase. Please see git-compat-util.h for more details."
Fix the issue by specifying the project's C standard. Funny enough,
specifying C99 does not work because apparently, `__STDC_VERSION__` is
not getting defined in that version at all. Instead, we have to specify
C11 as the project's C standard, which is also done in our CMake build
instructions.
We don't want to generally enforce C11 though, as our requiremets only
state that a C99 compiler is required. In fact, we don't even require
plain C99, but rather the GNU variant thereof.
Meson allows us to handle this case rather easily by specifying
"gnu99,c11", which will cause it to fall back to C11 in case GNU C99 is
unsupported. This feature has only been introduced with Meson 1.3.0
though, and we support 0.61.0 and newer. In case we use such an oldish
version though we fall back to requiring GNU99 unconditionally. This
means that Windows essentially requires Meson 1.3.0 and newer when using
Visual Studio, but I doubt that this is ever going to be a real problem.
Tested-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The CSPRNG backend is not configurable in Meson and isn't quite
discoverable, either. Make it configurable and add the actual backend
used to the summary.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Meson does not yet know to build our fuzzers. Introduce a new build
option "fuzzers" and wire up the fuzzers in case it is enabled. Adapt
our CI jobs so that they build the fuzzers by default.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Meson knows to generate distribution archives via `meson dist`. In
addition to generating the archive itself, this target also knows to
compile and execute tests from that archive, which helps to ensure that
the result is an adequate drop-in replacement for the versioned project.
While this already works as-is, one omission is that we don't propagate
the commit that this is built from into the resulting archive. This can
be fixed though by adding a distribution script that propagates the
version into the "version" file, which GIT-VERSION-GEN knows to read if
present.
Use GIT-VERSION-GEN to populate that file. As the script is executed in
the build directory, not in the directory where we generate the archive,
we have to use a shell to resolve the "MESON_DIST_ROOT" environment
variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The Meson build system is able to wire up development environments. The
intent is to make build artifacts of the project available. This is
typically used to export e.g. paths to linkable libraries, which isn't
all that interesting in our context given that we don't have an official
library interface.
But what we can use this mechanism for is to expose the built Git
executables as well as the build directory. This allows users to play
around with the built Git version in the devenv, and allows them to
execute our test scripts directly with the built distribution.
Wire up this feature, which can then be used via `meson devenv` in the
build directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We generate a couple of headers from our documentation. These headers
are added to the libgit sources, but two of them aren't used by the
library, but instead by our builtins. This can cause parallel builds to
fail because the builtin object may be compiled before the header was
generated.
Fix the issue by adding both "config-list.h" and "hook-list.h" to the
list of builtin sources. While "command-list.h" is generated similarly,
it is used by "help.c" and thus part of the libgit sources indeed.
Reported-by: Evan Martin <evan.martin@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The Git version for Meson is currently wired up manually. It can thus
grow (and already has grown) stale quite easily, as having multiple
sources of truth is never a good idea. This issue is mostly of cosmetic
nature as we don't use the project version anywhere, and instead use the
GIT-VERSION-GEN script to propagate the correct version into our build.
But it is somewhat puzzling when `meson setup` announces to build an old
Git release.
There are a couple of alternatives for how to solve this:
- We can keep the version undefined, but this makes Meson output
"undefined" for the version, as well.
- We can use GIT-VERSION-GEN to generate the version for us. At the
point of configuring the project we haven't yet figured out host
details though, and thus we didn't yet set up the shell environment.
While not an issue for Unix-based systems, this would be an issue in
Windows, where the shell typically gets provided via Git for Windows
and thus requires some special setup.
- We can pull the default version out of GIT-VERSION-GEN and move it
into its own file. This likely requires some adjustments for scripts
that bump the version, but allows Meson to read the version from
that file trivially.
Pick the second option and use GIT-VERSION-GEN as it gives us the most
accurate version. In order to fix the bootstrapping issue on Windows
systems we simply set the version to 'unknown' in case no shell was
found. As the version is only of cosmetic value this isn't really much
of an issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The GIT-VERSION-GEN script requires an input file containing formatting
directives to be replaced as well as an output file that will get
overwritten in case the file contents have changed. When computing the
project version for Meson we don't want to have either though:
- We only want to compute the version without anything else, but don't
have an input file that would match that exact format. While we
could of course introduce a new file just for that usecase, it feels
suboptimal to add another file every time we want to have a slightly
different format for versioned data.
- The computed version needs to be read from stdout so that Meson can
wire it up for the project.
Extend the script to handle both usecases by recognizing `--format=` as
alternative to providing an input path and by writing to stdout in case
no output file was given.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The GIT-VERSION-GEN script computes the version that Git is being built
from. When building from a commit with an unclean worktree it knows to
append "-dirty" to that version to indicate that there were custom
changes applied and that it isn't the exact same as that commit.
The dirtiness check is done manually via git-diff-index(1), which is
somewhat puzzling though: we already use git-describe(1) to compute the
version, which also knows to compute dirtiness via the "--dirty" flag.
But digging back in history explains why: the "-dirty" suffix was added
in 31e0b2ca81 (GIT 1.5.4.3, 2008-02-23), and git-describe(1) didn't yet
have support for "--dirty" back then.
Refactor the script to use git-describe(1). Despite being simpler, it
also results in a small speedup:
Benchmark 1: git describe --dirty --match "v[0-9]*"
Time (mean ± σ): 12.5 ms ± 0.3 ms [User: 6.3 ms, System: 8.8 ms]
Range (min … max): 12.0 ms … 13.5 ms 200 runs
Benchmark 2: git describe --match "v[0-9]*" HEAD && git update-index -q --refresh && git diff-index --name-only HEAD --
Time (mean ± σ): 17.9 ms ± 1.1 ms [User: 8.8 ms, System: 14.4 ms]
Range (min … max): 17.0 ms … 30.6 ms 148 runs
Summary
git describe --dirty --match "v[0-9]*" ran
1.43 ± 0.09 times faster than git describe --match "v[0-9]*" && git update-index -q --refresh && git diff-index --name-only HEAD --
While the speedup doesn't really matter on Unix-based systems, where
filesystem operations are typically fast, they do matter on Windows
where the commands take a couple hundred milliseconds. A quick and dirty
check on that system shows a speedup from ~800ms to ~400ms.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The git-pack-redundant(1) subcommand has been castrated to require
the "--i-still-use-this" option to do anything since 4406522b
(pack-redundant: escalate deprecation warning to an error,
2023-03-23), which appeared in Git 2.41 and was announced for
removal with 53a92c9552 (Documentation/BreakingChanges: announce
removal of git-pack-redundant(1), 2024-09-02). Stop compiling the
subcommand in case the `WITH_BREAKING_CHANGES` build flag is set.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "linux-gcc" job isn't all that interesting by itself and can be
considered more or less the "standard" job: it is running with a
reasonably up-to-date image and uses GCC as a compiler, both of which we
already cover in other jobs.
There is one exception though: we change the default branch to be "main"
instead of "master", so it is forging ahead a bit into the future to
make sure that this change does not cause havoc. So let's expand on this
a bit and also add the new "WITH_BREAKING_CHANGES" flag to the mix.
Rename the job to "linux-breaking-changes" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "linux-gcc-default" job is mostly doing the same as the "linux-gcc"
job, except for a couple of minor differences:
- We use an explicit GCC version instead of the default version
provided by the distribution. We have other jobs that test with
"gcc-8", making this distinction pointless.
- We don't set up the Python version explicitly, and instead use the
default Python version. Python 2 has been end-of-life for quite a
while now though, making this distinction less interesting.
- We set up the default branch name to be "main" in "linux-gcc". We
have other testcases that don't and also some that explicitly use
"master".
- We use "ubuntu:20.04" in one job and "ubuntu:latest" in another. We
already have a couple other jobs testing these respectively.
So overall, the job does not add much to our test coverage.
Drop the "linux-gcc-default" job and adapt "linux-gcc" to start using
the default GCC compiler, effectively merging those two jobs into one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With 57ec9254eb (docs: introduce document to announce breaking changes,
2024-06-14), we have introduced a new document that tracks upcoming
breaking changes in the Git project. In 2454970930 (BreakingChanges:
early adopter option, 2024-10-11) we have amended the document a bit to
mention that any introduced breaking changes must be accompanied by
logic that allows us to enable the breaking change at compile-time.
While we already have two breaking changes lined up, neither of them has
such a switch because they predate those instructions.
Introduce the proposed `WITH_BREAKING_CHANGES` preprocessor macro and
wire it up with both our Makefiles and Meson. This does not yet wire up
the build flag for existing deprecations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 246cebe320 (refs: add support for migrating reflogs, 2024-12-16) we
have added support to git-refs(1) to migrate reflogs between reference
backends. It was reported [1] though that not we don't migrate reflogs
for a subset of references, most importantly "refs/stash".
This issue is caused by us still honoring "core.logAllRefUpdates" when
trying to migrate reflogs: we do queue the updates, but depending on the
value of that config we may decide to just skip writing the reflog entry
altogether. And given that:
- The default for "core.logAllRefUpdates" is to only create reflogs
for branches, remotes, note refs and "HEAD"
- "refs/stash" is neither of these ref types.
We end up skipping the reflog creation for that particular reference.
Fix the bug by setting `REF_FORCE_CREATE_REFLOG`, which instructs the
ref backends to create the reflog entry regardless of the config or any
preexisting state.
[1]: <Z5BTQRlsOj1sygun@tapette.crustytoothpaste.net>
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Address the last couple of trivial -Wsign-compare warnings in the
reftable library and remove the DISABLE_SIGN_COMPARE_WARNINGS macro that
we have in "reftable/system.h".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `block_source_read_block()` function and its implementations return
an integer as a result that reflects either the number of bytes read, or
an error. As such its return type, a signed integer, isn't wrong, but it
doesn't give the reader a good hint what it actually returns.
Refactor the function to return an `ssize_t` instead, which is typical
for functions similar to read(3p) and should thus give readers a better
signal what they can expect as a result.
Adjust callers to better handle the returned value to avoid warnings
with -Wsign-compare. One of these callers is `reader_get_block()`, whose
return value is only ever used by its callers to figure out whether or
not the read was successful. So instead of bubbling up the `ssize_t`
there, too, we adapt it to only indicate success or errors.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The block length is used to track the number of bytes available in a
specific block. As such, it is never set to a negative value, but is
still represented by a signed integer.
Adjust the type of the variable to be `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The restart length is tracked as a positive integer even though it
cannot ever be negative. Furthermore, it is effectively capped via the
MAX_RESTARTS variable.
Adjust the type of the variable to be `uint32_t`. While this type is
excessive given that MAX_RESTARTS fits into an `uint16_t`, other places
already use 32 bit integers for restarts, so this type is being more
consistent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The functions `header_size()` and `footer_size()` return a positive
integer representing the size of the header and footer, respectively,
dependent on the version of the reftable format. Similar to the
preceding commit, these functions return a signed integer though, which
is nonsensical given that there is no way for these functions to return
negative.
Adapt the functions to return a `size_t` instead to fix a couple of sign
comparison warnings.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `hash_size()` function returns the number of bytes used by the hash
function. Weirdly enough though, it returns a signed integer for its
size even though the size obviously cannot ever be negative. The only
case where it could be negative is if the function returned an error
when asked for an unknown hash, but we assert(3p) instead.
Adjust the type of `hash_size()` to be `uint32_t` and adapt all places
that use signed integers for the hash size to follow suit. This also
allows us to get rid of a couple asserts that we had which verified that
the size was indeed positive, which further stresses the point that this
refactoring makes sense.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `common_prefix_size()` function computes the length of the common
prefix between two buffers. As such its return value will always be an
unsigned integer, as the length cannot be negative. Regardless of that,
the function returns a signed integer, which is nonsensical and causes a
couple of -Wsign-compare warnings all over the place.
Adjust the function to return a `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The logic to decode varints isn't able to detect integer overflows: as
long as the buffer still has more data available, and as long as the
current byte has its 0x80 bit set, we'll continue to add up these values
to the result. This will eventually cause the `uint64_t` to overflow, at
which point we'll return an invalid result.
Refactor the function so that it is able to detect such overflows. The
implementation is basically copied from Git's own `decode_varint()`,
which already knows to handle overflows. The only adjustment is that we
also take into account the string view's length in order to not overrun
it. The reftable documentation explicitly notes that those two encoding
schemas are supposed to be the same:
Varint encoding
^^^^^^^^^^^^^^^
Varint encoding is identical to the ofs-delta encoding method used
within pack files.
Decoder works as follows:
....
val = buf[ptr] & 0x7f
while (buf[ptr] & 0x80) {
ptr++
val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
}
....
While at it, refactor `put_var_int()` in the same way by copying over
the implementation of `encode_varint()`. While `put_var_int()` doesn't
have an issue with overflows, it generates warnings with -Wsign-compare.
The implementation of `encode_varint()` doesn't, is battle-tested and at
the same time way simpler than what we currently have.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 42c424d69d (t/helper: inline printing of reftable records,
2024-08-22) we stopped using the `print` function of the reftable record
vtable and instead moved its implementation into the single user of it.
We didn't remove the function itself from the vtable though. Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 4f9264b0cd (config.mak.dev: drop `-Wno-sign-compare`, 2024-12-06) we
have started an effort to make our codebase compile with -Wsign-compare.
But while we removed the -Wno-sign-compare flag from "config.mak.dev",
we didn't adjust the Meson build instructions in the same way.
Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In e7fb2ca945 (builtin/blame: fix out-of-bounds write with blank
boundary commits, 2025-01-10), we have introduced two new tests that
expect a certain amount of padding. This padding is generated via
printf using the "%0.s" conversion specification. That directive is
ambiguous because it might be interpreted as field width (most shells)
or 0-padding flag for numeric fields (coreutils).
Fix this issue by using "%${N}s" instead, which is already being
used in other tests (i.e. t5300, t0450) and is unambiguous.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jan Palus <jpalus@fastmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The internal functions `write_rev_trailer()`, `write_rev_trailer()`,
`write_mtimes_header()` and write_mtimes_trailer()` use the global
`the_hash_algo` variable to access the repository's hash function. Pass
the hash_algo down from callers, all of which already have access to the
variable.
This removes all global variables from the 'pack-write.c' file, so
remove the 'USE_THE_REPOSITORY_VARIABLE' macro.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `write_rev_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above. Also modify children functions
`write_rev_file_order()` and `write_rev_header()` to accept
'the_hash_algo'.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
However, in `midx-write.c`, since all usage of global variables is
removed, don't reintroduce them and instead use the `repo` available in
the context.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `write_idx_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above.
Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls
`write_idx_file()`, update it to accept a `struct git_hash_algo` as a
parameter and pass it through to the callee.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `index_pack_lockfile()` function uses the global `the_repository`
variable to access the repository. To avoid global variable usage, pass
the repository from the layers above.
Altough the layers above could have access to the repository internally,
simply pass in `the_repository`. This avoids any compatibility issues
and bubbles up global variable usage to upper layers which can be
eventually resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `fixup_pack_header_footer()` function uses the global
`the_hash_algo` variable to access the repository's hash function. To
avoid global variable usage, pass a hash_algo from the layers above.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that ref_format_clear() no longer releases any memory we don't need
it anymore. Remove it and its counterpart, ref_format_init().
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The string_list "is_base_tips" in struct ref_format stores the
committish part of "is-base:<committish>". It has the same problems
that its sibling string_list "bases" had. Fix them the same way as the
previous commit did for the latter, by replacing the string_list with
fields in "used_atom".
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
verify_ref_format() parses a ref-filter format string and stores
recognized items in the static array "used_atom". For
"ahead-behind:<committish>" it stores the committish part in a
string_list member "bases" of struct ref_format.
ref_sorting_options() also parses bare ref-filter format items and
stores stores recognized ones in "used_atom" as well. The committish
parts go to a dummy struct ref_format in parse_sorting_atom(), though,
and are leaked and forgotten.
If verify_ref_format() is called before ref_sorting_options(), like in
git for-each-ref, then all works well if the sort key is included in the
format string. If it isn't then sorting cannot work as the committishes
are missing.
If ref_sorting_options() is called first, like in git branch, then we
have the additional issue that if the sort key is included in the format
string then filter_ahead_behind() can't see its committish, will not
generate any results for it and thus it will be expanded to an empty
string.
Fix those issues by replacing the string_list with a field in used_atom
for storing the committish. This way it can be shared for handling both
ref-filter format strings and sorting options in the same command.
Reported-by: Ross Goldberg <ross.goldberg@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Test update.
* sk/unit-test-hash:
t/unit-tests: convert hash to use clar test framework
|
|
Doc markup fix.
* mh/gitattr-doc-markup-fix:
docs: fix typesetting of merge driver placeholders
|
|
Completion script updates for zsh
* dk/zsh-config-completion-fix:
completion: repair config completion for Zsh
|
|
Docfix.
* aj/difftool-config-doc-fix:
difftool docs: restore correct position of tool list
|
|
More code paths have a repository passed through the callchain,
instead of assuming the primary the_repository object.
* ps/the-repository:
match-trees: stop using `the_repository`
graph: stop using `the_repository`
add-interactive: stop using `the_repository`
tmp-objdir: stop using `the_repository`
resolve-undo: stop using `the_repository`
credential: stop using `the_repository`
mailinfo: stop using `the_repository`
diagnose: stop using `the_repository`
server-info: stop using `the_repository`
send-pack: stop using `the_repository`
serve: stop using `the_repository`
trace: stop using `the_repository`
pager: stop using `the_repository`
progress: stop using `the_repository`
|
|
A misconfigured "fsck.skiplist" configuration variable was not
diagnosed as an error, which has been corrected.
* jt/fsck-skiplist-parse-fix:
fsck: reject misconfigured fsck.skipList
|
|
The code to compute "unique" name used git_rand() which can fail or
get stuck; the callsite does not require cryptographic security.
Introduce the "insecure" mode and use it appropriately.
* ps/reftable-get-random-fix:
reftable/stack: accept insecure random bytes
wrapper: allow generating insecure random bytes
|
|
Test clean-up.
* jk/t7407-use-test-grep:
t7407: use test_grep
|
|
The code to check LSan results has been simplified and made more
robust.
* jk/lsan-race-ignore-false-positive:
test-lib: add a few comments to LSan log checking
test-lib: simplify lsan results check
test-lib: invert return value of check_test_results_san_file_empty
|
|
When parsing --pack_header=, we manually skip 14 bytes to the data.
Let's use skip_prefix() to do this automatically.
Note that we overwrite our pointer to the front of the string, so we
have to add more context to the error message. We could avoid this by
declaring an extra pointer to hold the value, but I think the modified
message is actually preferable; it should give translators a bit more
context.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Both of these commands read the incoming pack into a static unsigned
char buffer in BSS, and then parse it by casting the start of the buffer
to a struct pack_header. This can result in SIGBUS on some platforms if
the compiler doesn't place the buffer in a position that is properly
aligned for 4-byte integers.
This reportedly happens with unpack-objects (but not index-pack) on
sparc64 when compiled with clang (but not gcc). But we are definitely in
the wrong in both spots; since the buffer's type is unsigned char, we
can't depend on larger alignment. When it works it is only because we
are lucky.
We'll fix this by switching to get_be32() to read the headers (just like
the last few commits similarly switched us to put_be32() for writing
into the same buffer).
It would be nice to factor this out into a common helper function, but
the interface ends up quite awkward. Either the caller needs to hardcode
how many bytes we'll need, or it needs to pass us its fill()/use()
functions as pointers. So I've just fixed both spots in the same way;
this is not code that is likely to be repeated a third time (most of the
pack reading code uses an mmap'd buffer, which should be properly
aligned).
I did make one tweak to the shared code: our pack_version_ok() macro
expects us to pass the big-endian value we'd get by casting. We can
introduce a "native" variant which uses the host integer ordering.
Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In order to recreate a pack header in our in-memory buffer, we cast the
buffer to a "struct pack_header" and assign the individual fields. This
is reported to cause SIGBUS on sparc64 due to alignment issues.
We can work around this by using put_be32() which will write individual
bytes into the buffer.
Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Both index-pack and unpack-objects accept a --pack_header argument. This
is an undocumented internal argument used by receive-pack and fetch to
pass along information about the header of the pack, which they've
already read from the incoming stream.
In preparation for a bugfix, let's factor the duplicated code into a
common helper.
The callers are still responsible for identifying the option. While this
could likewise be factored out, it is more flexible this way (e.g., if
they ever started using parse-options and wanted to handle both the
stuck and unstuck forms).
Likewise, the callers are responsible for reporting errors, though they
both just call die(). I've tweaked unpack-objects to match index-pack in
marking the error for translation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In put_be32(), we right-shift a uint32_t value various amounts and then
assign the low 8-bits to individual "unsigned char" bytes, throwing away
the high bits. For shifts smaller than 24 bits, those thrown away bits
will be arbitrary bits from the original uint32_t.
This works exactly as we want, but if you feed a constant, then sparse
complains. For example if we write this (which we plan to do in a future
patch):
put_be32(hdr, PACK_SIGNATURE);
then "make sparse" produces:
compat/bswap.h:175:22: error: cast truncates bits from constant value (5041 becomes 41)
compat/bswap.h:176:22: error: cast truncates bits from constant value (504143 becomes 43)
compat/bswap.h:177:22: error: cast truncates bits from constant value (5041434b becomes 4b)
And the same issue exists in the other put_be*() functions, when used
with a constant.
We can silence this warning by explicitly masking off the truncated
bits. The compiler is smart enough to know the result is the same, and
the asm generated by gcc (with both -O0 and -O2) is identical.
Curiously this line already exists:
put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);
in the fsmonitor.c file, but it does not get flagged because the CPP
macro expands to a small integer (2).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapts reftable tree test script to clar framework by using clar
assertions where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert the prio-queue test script to clar framework by using clar
assertions where necessary. Test functions are created as a standalone
to test different cases.
update the type of the variable `j` from int to `size_t`, this ensures
compatibility with the type used for result_size, which is also size_t,
preventing a potential warning or error caused by comparisons between
signed and unsigned integers.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapt the mem-pool test script to use clar framework by using clar
assertions where necessary.Test functions are created as a standalone to
test different test cases.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"generate-clar-decls.sh" script is designed to extract function
signatures that match a specific pattern derived from the unit test
file's name. The script does not know to massage file names with dashes,
which will make it search for functions that look like, for example,
`test_mem-pool_*`. Having dashes in function names is not allowed
though, so these patterns won't ever match a legal function name.
Adapt script to translate dashes (`-`) in test suite filenames to
underscores (`_`) to correctly extract the function signatures and run
the corresponding tests. This will be used by subsequent commits which
follows the same construct.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using the show_usage_and_exit_if_asked() helper we introduced
earlier, fix callers of usage() that want to show the help text when
explicitly asked by the end-user. The help text now goes to the
standard output stream for them.
These are the bog standard "if we got only '-h', then that is a
request for help" callers. Their
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(message);
are simply replaced with
show_usage_and_exit_if_asked(argc, argv, message);
With this, the built-ins tested by t0012 all send their help text to
their standard output stream, so the check in t0012 that was half
tightened earlier is now fully tightened to insist on standard error
stream being empty.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using the show_usage_if_asked() helper we introduced earlier, fix
callers of usage() that want to show the help text when explicitly
asked by the end-user. The help text now goes to the standard
output stream for them.
The callers in this step are oddballs in that their invocations of
usage() are *not* guarded by
if (argc == 2 && !strcmp(argv[1], "-h")
usage(...);
There are (unnecessarily) being clever ones that do things like
if (argc != 2 || !strcmp(argv[1], "-h")
usage(...);
to say "I know I take only one argument, so argc != 2 is always an
error regardless of what is in argv[]. Ah, by the way, even if argc
is 2, "-h" is a request for usage text, so we do the same".
Some like "git var -h" just do not treat "-h" any specially, and let
it take the same error code paths as a parameter error.
Now we cannot do the same, so these callers are rewrittin to do the
show_usage_and_exit_if_asked() first and then handle the usage error
the way they used to.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using the show_usage_with_options_if_asked() helper we introduced
earlier, fix callers of usage_with_options() that want to show the
help text when explicitly asked by the end-user. The help text now
goes to the standard output stream for them.
The test in t7600 for "git merge -h" may want to be retired, as the
same is covered by t0012 already, but it is specifically testing that
the "-h" option gets a response even with a corrupt index file, so
for now let's leave it there.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some commands call usage() when they are asked to give the help
message with "git cmd -h", but this has the same problem as we
fixed with callers of usage_with_options() for the same purpose.
Introduce a helper function that captures the common pattern
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(usage);
and replaces it with
show_usage_if_asked(argc, argv, usage);
to help correct these code paths.
Note that this helper function still exits with status 129, and
t0012 insists on it. After converting all the mistaken callers of
usage_with_options() to call this new helper, we may want to address
it---the end user is asking us to give the help text, and we are
doing exactly as asked, so there is no reason to exit with non-zero
status.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Many commands call usage_with_options() when they are asked to give
the help message, but it sends the help text to the standard error
stream. When the user asked for it with "git cmd -h", the help
message is the primary output from the command, hence we should send
it to the standard output stream, instead.
Introduce a helper function that captures the common pattern
if (argc == 2 && !strcmp(argv[1], "-h"))
usage_with_options(usage, options);
and replaces it with
show_usage_with_options_if_asked(argc, argv, usage, options);
to help correct code paths.
Note that this helper function still exits with status 129, and
t0012 insists on it. After converting all the mistaken callers of
usage_with_options() to call this new helper, we may want to address
it---the end user is asking us to give the help text, and we are
doing exactly as asked, so there is no reason to exit with non-zero
status.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For most commands, "git foo -h" will send the help output to stdout, as
this is what parse-options.c does. But some commands send it to stderr
instead. This is usually because they call usage_with_options(), and
should be switched to show_usage_help_and_exit_if_asked().
Currently t0012 is permissive and allows either behavior. We'd like it
to eventually enforce that help goes to stdout, and teaching it to do so
identifies the commands that need to be changed. But during the
transition period, we don't want to enforce that for most test runs.
So let's introduce a flag that will let most test runs use the
permissive behavior, and people interested in converting commands can
run:
GIT_TEST_HELP_MUST_BE_STDOUT=1 ./t0012-help.sh
to see the failures. Eventually (when all builtins have been converted)
we'll remove this flag entirely and always check the strict behavior.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We centrally explain that "--no-whatever" is the way to countermand
the "--whatever" option. Explain that a configured default and the
value specified by an environment variable can be overridden by the
corresponding command line option, too.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Wire up the git-subtree(1) command, which is part of "contrib/". Note
that we have to move around the exact location where we include the
"contrib/" subdirectory so that it comes after building the docs so that
we have access to some of the common functionality.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We unconditionally wire up building command completion present in the
"contrib/" directory. This may or may not be what users want, and we
don't provide a way to disable it.
Introduce a new "contrib" build option. This option is introduced as an
array so that users can manually pick which exact features they want to
include from the "contrib" directory. By default, we build and install
shell completions, which is a commonly used feature and also the current
default.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06), we have refactored how we build our documentation by
injecting the Git version into the Asciidoc and AsciiDoctor config
files instead of doing so via arguments. As such, the original config
files were removed, where the expectation is that they get generated via
`GIT-VERSION-GEN` now.
Whie the git-subtree(1) command part of "contrib/" also builds docs
using these same config files, its Makefile wasn't adjusted accordingly
and thus building the docs is broken.
Fix this by using `GIT-VERSION-GEN` to generate those files.
Reported-by: Renato Botelho <garga@FreeBSD.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Most of the warnings were about loop variables being declared as ints
with a condition using a size_t, whereby switching the variable to
size_t fixes the warning.
One other case was comparing the result of strlen to an int passed
as an argument, which turns out could just as well be passed as a
size_t, albeit trickling to other functions.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Test modernization.
* mb/t7110-use-test-path-helper:
t7110: replace `test -f` with `test_path_is_*` helpers
|
|
meson-based build now supports the unsafe-sha1 build knob.
* ps/meson-weak-sha1-build:
meson: provide a summary of configured backends
meson: wire up unsafe SHA1 backend
meson: add missing dots for build options
meson: simplify conditions for HTTPS and SHA1 dependencies
meson: require SecurityFramework when it's used as SHA1 backend
meson: deduplicate access to SHA1/SHA256 backend options
meson: consistenlty spell 'CommonCrypto'
|
|
More -Wsign-compare fixes.
* ps/more-sign-compare:
sign-compare: avoid comparing ptrdiff with an int/unsigned
commit-reach: use `size_t` to track indices when computing merge bases
shallow: fix -Wsign-compare warnings
builtin/log: fix remaining -Wsign-compare warnings
builtin/log: use `size_t` to track indices
commit-reach: use `size_t` to track indices in `get_reachable_subset()`
commit-reach: use `size_t` to track indices in `remove_redundant()`
commit-reach: fix type of `min_commit_date`
commit-reach: fix index used to loop through unsigned integer
prio-queue: fix type of `insertion_ctr`
|
|
CI jobs gave sporadic failures, which turns out that that the
object finalization code was giving an error when it did not have
to.
* ps/object-collision-check:
object-file: retry linking file into place when occluding file vanishes
object-file: don't special-case missing source file in collision check
object-file: rename variables in `check_collision()`
object-file: fix race in object collision check
|
|
Tweak the help text used for the option value placeholders by
parse-options API so that translations can customize the "<>"
placeholder signal (e.g. "--option=<value>").
* as/long-option-help-i18n:
parse-options: localize mark-up of placeholder text in the short help
|
|
"git submodule" learned various ways to spell the same option,
e.g. "--branch=B" can be spelled "--branch B" or "-bB".
* re/submodule-parse-opt:
git-submodule.sh: rename some variables
git-submodule.sh: improve variables readability
git-submodule.sh: add some comments
git-submodule.sh: get rid of unused variable
git-submodule.sh: get rid of isnumber
git-submodule.sh: improve parsing of short options
git-submodule.sh: improve parsing of some long options
|
|
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Also prevent git-commit manpage to refer to itself in the config
description by using a variable.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|