| Age | Commit message (Collapse) | Author | Files | Lines |
|
Adapt oid-array test script to clar framework by using clar assertions
where necessary. Remove descriptions from macros to reduce
redundancy, and move test input arrays to global scope for reuse across
multiple test functions. Introduce `test_oid_array__initialize()` to
explicitly initialize the hash algorithm.
These changes streamline the test suite, making individual tests
self-contained and reducing redundant code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
`get_oid_arbitrary_hex()` and `init_hash_algo()` are both required for
oid-related tests to run without errors. In the current implementation,
both functions are defined and declared in the
`t/unit-tests/lib-oid.{c,h}` which is utilized by oid-related tests in
the homegrown unit tests structure.
Adapt functions in lib-oid.{c,h} to use clar. Both these functions
become available for oid-related test files implemented using the clar
testing framework, which requires them. This will be used by subsequent
commits.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Test clean-up.
* lo/t7603-path-is-file-update:
t7603: replace test -f by test_path_is_file
|
|
"git rev-list --missing=" learned to accept "print-info" that gives
known details expected of the missing objects, like path and type.
* jt/rev-list-missing-print-info:
rev-list: extend print-info to print missing object type
rev-list: add print-info action to print missing object path
|
|
"git push --atomic --porcelain" used to ignore failures from the
other side, losing the error status from the child process, which
has been corrected.
* ps/send-pack-unhide-error-in-atomic-push:
send-pack: gracefully close the connection for atomic push
t5543: atomic push reports exit code failure
send-pack: new return code "ERROR_SEND_PACK_BAD_REF_STATUS"
t5548: add porcelain push test cases for dry-run mode
t5548: add new porcelain test cases
t5548: refactor test cases by resetting upstream
t5548: refactor to reuse setup_upstream() function
t5504: modernize test by moving heredocs into test bodies
|
|
Lazy-loading missing files in a blobless clone on demand is costly
as it tends to be one-blob-at-a-time. "git backfill" is introduced
to help bulk-download necessary files beforehand.
* ds/backfill:
backfill: assume --sparse when sparse-checkout is enabled
backfill: add --sparse option
backfill: add --min-batch-size=<n> option
backfill: basic functionality and tests
backfill: add builtin boilerplate
|
|
Code clean-up.
* kn/reflog-migration-fix-followup:
reftable: prevent 'update_index' changes after adding records
refs: use 'uint64_t' for 'ref_update.index'
refs: mark `ref_transaction_update_reflog()` as static
|
|
Fetching into a bare repository incorrectly assumed it always used
a mirror layout when deciding to update remote-tracking HEAD, which
has been corrected.
* bf/fetch-set-head-fix:
fetch set_head: fix non-mirror remotes in bare repositories
fetch set_head: refactor to use remote directly
|
|
Going into a secondary worktree and asking "is the main worktree
bare?" did not work correctly when per-worktree configuration
option was in use, which has been corrected.
* op/worktree-is-main-bare-fix:
worktree: detect from secondary worktree if main worktree is bare
|
|
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.
* tc/clone-single-revision:
builtin/clone: teach git-clone(1) the --revision= option
parse-options: introduce die_for_incompatible_opt2()
clone: introduce struct clone_opts in builtin/clone.c
clone: add tags refspec earlier to fetch refspec
clone: refactor wanted_peer_refs()
clone: make it possible to specify --tags
clone: cut down on global variables in clone.c
|
|
"git -c help.autocorrect=0 psuh" shows the suggested typofix,
unlike the previous attempt in the base topic.
* da/help-autocorrect-one-fix:
help: add "show" as a valid configuration value
help: show the suggested command when help.autocorrect is false
|
|
"git gc" learned the "--expire-to" option and passes it down to
underlying "git repack".
* zh/gc-expire-to:
gc: add `--expire-to` option
|
|
Foreign language interface for Rust into our code base has been added.
* js/libgit-rust:
libgit: add higher-level libgit crate
libgit-sys: also export some config_set functions
libgit-sys: introduce Rust wrapper for libgit.a
common-main: split init and exit code into new files
|
|
Test clean-up.
* ac/t5401-use-test-path-is-file:
t5401: prefer test_path_is_* helper function
|
|
Test clean-up.
* ac/t6423-unhide-git-exit-status:
t6423: fix suppression of Git’s exit code in tests
|
|
"git repack --keep-unreachable" to send unreachable objects to the
main pack "git repack -ad" produces did not work when there is no
existing packs, which has been corrected.
* ps/repack-keep-unreachable-in-unpacked-repo:
builtin/repack: fix `--keep-unreachable` when there are no packs
|
|
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
|
|
Convert a handful of unit tests to work with the clar framework.
* sk/unit-tests-0130:
t/unit-tests: convert strcmp-offset test to use clar test framework
t/unit-tests: convert strbuf test to use clar test framework
t/unit-tests: adapt example decorate test to use clar test framework
t/unit-tests: convert hashmap test to use clar test framework
|
|
Further code clean-up on the use of hash functions. Now the
context object knows what hash function it is working with.
* ps/hash-cleanup:
global: adapt callers to use generic hash context helpers
hash: provide generic wrappers to update hash contexts
hash: stop typedeffing the hash context
hash: convert hashing context to a structure
|
|
"git apply" internally uses unsigned long for line numbers and uses
strtoul() to parse numbers on the hunk headers. It however forgot
to check parse errors.
* pw/apply-ulong-overflow-check:
apply: detect overflow when parsing hunk header
|
|
"git init" to reinitialize a repository that already exists cannot
change the hash function and ref backends; such a request is
silently ignored now.
* ps/setup-reinit-fixes:
setup: fix reinit of repos with incompatible GIT_DEFAULT_HASH
setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT
t0001: remove duplicate test
|
|
`test_path_is_file` provides a better output when asserting whether a
file exists. Replace the occurrences of `test -f` in t7603 with it,
facilitating the trace of possible test failures.
Signed-off-by: Lucas Oshiro <lucasseikioshiro@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
CI updates (containerization, dropping stale ones, etc.).
* ps/ci-misc-updates:
ci: remove stale code for Azure Pipelines
ci: use latest Ubuntu release
ci: stop special-casing for Ubuntu 16.04
gitlab-ci: add linux32 job testing against i386
gitlab-ci: remove the "linux-old" job
github: simplify computation of the job's distro
github: convert all Linux jobs to be containerized
github: adapt containerized jobs to be rootless
t7422: fix flaky test caused by buffered stdout
t0060: fix EBUSY in MinGW when setting up runtime prefix
|
|
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.
Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).
Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a hexadecimal commit ID. This enables the user
to clone and check out any revision they want. `--revision` can be used
in conjunction with `--depth` to do a minimal clone that only contains
the blob and tree for a single revision. This can be useful for
automated tests running in CI systems.
Using option `--branch` and `--single-branch` together is a similar
scenario, but serves a different purpose. Using these two options, a
singlet remote tracking branch is created and the fetch refspec is set
up so git-fetch(1) will receive updates on that branch from the remote.
This allows the user work on that single branch.
Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When extensions.worktreeConfig is true and the main worktree is
bare -- that is, its config.worktree file contains core.bare=true
-- commands run from secondary worktrees incorrectly see the main
worktree as not bare. As such, those commands incorrectly think
that the repository's default branch (typically "main" or
"master") is checked out in the bare repository even though it's
not. This makes it impossible, for instance, to checkout or delete
the default branch from a secondary worktree, among other
shortcomings.
This problem occurs because, when extensions.worktreeConfig is
true, commands run in secondary worktrees only consult
$commondir/config and $commondir/worktrees/<id>/config.worktree,
thus they never see the main worktree's core.bare=true setting in
$commondir/config.worktree.
Fix this problem by consulting the main worktree's config.worktree
file when checking whether it is bare. (This extra work is
performed only when running from a secondary worktree.)
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Olga Pilipenco <olga.pilipenco@shopify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Additional information about missing objects found in git-rev-list(1)
can be printed by specifying the `print-info` missing action for the
`--missing` option. Extend this action to also print missing object type
information inferred from its containing object. This token follows the
form `type=<type>` and specifies the expected object type of the missing
object.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Missing objects identified through git-rev-list(1) can be printed by
setting the `--missing=print` option. Additional information about the
missing object, such as its path and type, may be present in its
containing object.
Add the `print-info` missing action for the `--missing` option that,
when set, prints additional insight about the missing object inferred
from its containing object. Each line of output for a missing object is
in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs
containing additional information are separated from each other by a SP.
The value is encoded in a token specific fashion, but SP or LF contained
in value are always expected to be represented in such a way that the
resulting encoded value does not have either of these two problematic
bytes. This format is kept generic so it can be extended in the future
to support additional information.
For now, only a missing object path info is implemented. It follows the
form `path=<path>` and specifies the full path to the object from the
top-level tree. A path containing SP or special characters is enclosed
in double-quotes in the C style as needed. In a subsequent commit,
missing object type info will also be added.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.
The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:
- `has_existing_non_kept_packs()` checks whether there are existing
packfiles. This condition makes sense to guard "--keep-pack=",
"--unpack-unreachable" and "--keep-unreachable", because all of
these flags only make sense in combination with existing packfiles.
But it does not make sense to disable `--pack-loose-unreachable`
when there aren't any preexisting packfiles, as loose objects can be
packed into the new packfile regardless of that.
- `delete_redundant` checks whether we want to delete any objects or
packs that are about to become redundant. The documentation of
`--keep-unreachable` explicitly says that `git repack -ad` needs to
be executed for the flag to have an effect.
It is not immediately obvious why such redundant objects need to be
deleted in order for "--pack-unreachable-objects" to be effective.
But as things are working as documented this is nothing we'll change
for now.
- `pack_everything & PACK_CRUFT` checks that we're not creating a
cruft pack. This condition makes sense in the context of
"--pack-loose-unreachable", as unreachable objects would end up in
the cruft pack anyway.
So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.
Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous change introduced the '--[no-]sparse' option for the 'git
backfill' command, but did not assume it as enabled by default. However,
this is likely the behavior that users will most often want to happen.
Without this default, users with a small sparse-checkout may be confused
when 'git backfill' downloads every version of every object in the full
history.
However, this is left as a separate change so this decision can be reviewed
independently of the value of the '--[no-]sparse' option.
Add a test of adding the '--sparse' option to a repo without sparse-checkout
to make it clear that supplying it without a sparse-checkout is an error.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.
However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.
Note that this is distinctly different from the '--filter=sparse:<oid>'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.
This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.
Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.
To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are
1. git clone --filter=blob:none --sparse <url>
2. git sparse-checkout set <dir1> ... <dirN>
3. git backfill --sparse
For the Git repository with the 'builtin' directory in the
sparse-checkout, we get these results for various batch sizes:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|-------|
| (Initial clone) | 3 | 110 MB | |
| 10K | 12 | 192 MB | 17.2s |
| 15K | 9 | 192 MB | 15.5s |
| 20K | 8 | 192 MB | 15.5s |
| 25K | 7 | 192 MB | 14.7s |
This case matters less because a full clone of the Git repository from
GitHub is currently at 277 MB.
Using a copy of the Linux repository with the 'kernel/' directory in the
sparse-checkout, we get these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|------|
| (Initial clone) | 2 | 1,876 MB | |
| 10K | 11 | 2,187 MB | 46s |
| 25K | 7 | 2,188 MB | 43s |
| 50K | 5 | 2,194 MB | 44s |
| 100K | 4 | 2,194 MB | 48s |
This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.
Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Users may want to specify a minimum batch size for their needs. This is only
a minimum: the path-walk API provides a list of OIDs that correspond to the
same path, and thus it is optimal to allow delta compression across those
objects in a single server request.
We could consider limiting the request to have a maximum batch size in the
future. For now, we let the path-walk API batches determine the
boundaries.
To get a feeling for the value of specifying the --min-batch-size parameter,
I tested a number of open source repositories available on GitHub. The
procedure was generally:
1. git clone --filter=blob:none <url>
2. git backfill
Checking the number of packfiles and the size of the .git/objects/pack
directory helps to identify the effects of different batch sizes.
For the Git repository, we get these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|-------|
| (Initial clone) | 2 | 119 MB | |
| 25K | 8 | 290 MB | 24s |
| 50K | 5 | 290 MB | 24s |
| 100K | 4 | 290 MB | 29s |
Other than the packfile counts decreasing as we need fewer batches, the
size and time required is not changing much for this small example.
For the nodejs/node repository, we see these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2 | 330 MB | |
| 25K | 19 | 1,222 MB | 1m 22s |
| 50K | 11 | 1,221 MB | 1m 24s |
| 100K | 7 | 1,223 MB | 1m 40s |
| 250K | 4 | 1,224 MB | 2m 23s |
| 500K | 3 | 1,216 MB | 4m 38s |
Here, we don't have much difference in the size of the repo, though the
500K batch size results in a few MB gained. That comes at a cost of a
much longer time. This extra time is due to server-side delta
compression happening as the on-disk deltas don't appear to be reusable
all the time. But for smaller batch sizes, the server is able to find
reasonable deltas partly because we are asking for objects that appear
in the same region of the directory tree and include all versions of a
file at a specific path.
To contrast this example, I tested the microsoft/fluentui repo, which
has been known to have inefficient packing due to name hash collisions.
These results are found before GitHub had the opportunity to repack the
server with more advanced name hash versions:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2 | 105 MB | |
| 5K | 53 | 348 MB | 2m 26s |
| 10K | 28 | 365 MB | 2m 22s |
| 15K | 19 | 407 MB | 2m 21s |
| 20K | 15 | 393 MB | 2m 28s |
| 25K | 13 | 417 MB | 2m 06s |
| 50K | 8 | 509 MB | 1m 34s |
| 100K | 5 | 535 MB | 1m 56s |
| 250K | 4 | 698 MB | 1m 33s |
| 500K | 3 | 696 MB | 1m 42s |
Here, a larger variety of batch sizes were chosen because of the great
variation in results. By asking the server to download small batches
corresponding to fewer paths at a time, the server is able to provide
better compression for these batches than it would for a regular clone.
A typical full clone for this repository would require 738 MB.
This example justifies the choice to batch requests by path name,
leading to improved communication with a server that is not optimally
packed.
Finally, the same experiment for the Linux repository had these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|---------|
| (Initial clone) | 2 | 2,153 MB | |
| 25K | 63 | 6,380 MB | 14m 08s |
| 50K | 58 | 6,126 MB | 15m 11s |
| 100K | 30 | 6,135 MB | 18m 11s |
| 250K | 14 | 6,146 MB | 18m 22s |
| 500K | 8 | 6,143 MB | 33m 29s |
Even in this example, where the default name hash algorithm leads to
decent compression of the Linux kernel repository, there is value for
selecting a smaller batch size, to a limit. The 25K batch size has the
fastest time, but uses 250 MB more than the 50K batch size. The 500K
batch size took much more time due to server compression time and thus
we should avoid large batch sizes like this.
Based on these experiments, a batch size of 50,000 was chosen as the
default value.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The default behavior of 'git backfill' is to fetch all missing blobs that
are reachable from HEAD. Document and test this behavior.
The implementation is a very simple use of the path-walk API, initializing
the revision walk at HEAD to start the path-walk from all commits reachable
from HEAD. Ignore the object arrays that correspond to tree entries,
assuming that they are all present already.
The path-walk API provides lists of objects in batches according to a
common path, but that list could be very small. We want to balance the
number of requests to the server with the ability to have the process
interrupted with minimal repeated work to catch up in the next run.
Based on some experiments (detailed in the next change) a minimum batch
size of 50,000 is selected for the default.
This batch size is a _minimum_. As the path-walk API emits lists of blob
IDs, they are collected into a list of objects for a request to the
server. When that list is at least the minimum batch size, then the
request is sent to the server for the new objects. However, the list of
blob IDs from the path-walk API could be much longer than the batch
size. At this moment, it is unclear if there is a benefit to split the
list when there are too many objects at the same path.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Patrick reported an issue that the exit code of git-receive-pack(1) is
ignored during atomic push with "--porcelain" flag, and added new test
cases in t5543.
This issue originated from commit 7dcbeaa0df (send-pack: fix
inconsistent porcelain output, 2020-04-17). At that time, I chose to
ignore the exit code of "finish_connect()" without investigating the
root cause of the abnormal termination of git-receive-pack. That was an
incorrect solution.
The root cause is that an atomic push operation terminates early without
sending a flush packet to git-receive-pack. As a result,
git-receive-pack continues waiting for commands without exiting. By
sending a flush packet at the appropriate location in "send_pack()", we
ensure that the git-receive-pack process closes properly, avoiding an
erroneous exit code for git-push. At the same time, revert the changes
to the "transport.c" file made in commit 7dcbeaa0df.
Reported-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add new test cases in t5543 to avoid ignoring the exit code of
git-receive-pack(1) during atomic push with "--porcelain" flag.
We'd typically notice this case because the refs would have their error
message set. But there is an edge case when pushing refs succeeds, but
git-receive-pack(1) exits with a non-zero exit code at a later point in
time due to another error. An atomic git-push(1) would ignore that error
code, and consequently it would return successfully and not print any
error message at all.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
New dry-run test cases:
- git push --porcelain --dry-run
- git push --porcelain --dry-run --force
- git push --porcelain --dry-run --atomic
- git push --porcelain --dry-run --atomic --force
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add two more test cases exercising git-push(1) with `--procelain`, one
exercising a non-atomic and one exercising an atomic push.
Based-on-patch-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor the test cases with the following changes:
- Calling setup_upstream() to reset upstream after running each test
case.
- Change the initial branch tips of the workspace to reduce the branch
setup operations in the workspace.
- Reduced the two steps of setting up and cleaning up the pre-receive
hook by moving the operations into the corresponding test case,
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor the function setup_upstream_and_workbench(), extracting
create_upstream_template() and setup_upstream() from it. The former is
used to create the upstream repository template, while the latter is
used to rebuild the upstream repository and will be reused in subsequent
commits.
To ensure that setup_upstream() works properly in both local and HTTP
protocols, the HTTP settings have been moved to the setup_upstream() and
setup_upstream_and_workbench() functions.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have several heredocs in t5504 located outside of any particular test
bodies. Move these into the test bodies to match our modern coding
style.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some test in t6423 supress Git's exit code, which can cause test
failures go unnoticed. Specifically using git <subcommand> |
<other-command> masks potential failures of the Git command.
This commit ensures that Git's exit status is correctly propogated by:
- Avoiding pipes that suppress exit codes.
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a literal value for showing the suggested autocorrection
for consistency with the rest of the help.autocorrect options.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the handling of false boolean values for help.autocorrect
consistent with the handling of value 0 by showing the suggested
commands but not running them.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"test -f" does not provide a nice error message when we hit test
failures, so use test_path_is_file instead.
Signed-off-by: ambar chakravartty <amch9605@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix bugs in an earlier attempt to fix "git refs migration".
* kn/reflog-migration-fix-fix:
refs/reftable: fix uninitialized memory access of `max_index`
reftable: write correct max_update_index to header
|
|
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.
* ps/3.0-remote-deprecation:
remote: announce removal of "branches/" and "remotes/"
builtin/pack-redundant: remove subcommand with breaking changes
ci: repurpose "linux-gcc" job for deprecations
ci: merge linux-gcc-default into linux-gcc
Makefile: wire up build option for deprecated features
|
|
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
|
|
Adapt strcmp-offset test script to clar framework by using clar
assertions where necessary. Introduce `test_strcmp_offset__empty()` to
verify `check_strcmp_offset()` behavior when both input strings are
empty. This ensures the function correctly handles edge cases and
returns expected values.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapt strbuf test script to clar framework by using clar assertions
where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce `test_example_decorate__initialize()` to explicitly set up
object IDs and retrieve corresponding objects before tests run. This
ensures a consistent and predictable test state without relying on data
from previous tests.
Add `test_example_decorate__cleanup()` to clear decorations after each
test, preventing interference between tests and ensuring each runs in
isolation.
Adapt example decorate test script to clar framework by using clar
assertions where necessary. Previously, tests relied on data written by
earlier tests, leading to unintended dependencies between them. This
explicitly initializes the necessary state within
`test_example_decorate__readd`, ensuring it does not depend on prior
test executions.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapts hashmap test script to clar framework by using clar assertions
where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.
Drop the typedef and adapt all callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
|
|
The exact same issue as described in the preceding commit also exists
for GIT_DEFAULT_HASH. Thus, reinitializing a repository that e.g. uses
SHA1 with `GIT_DEFAULT_HASH=sha256 git init` will cause the object
format of that repository to change to SHA256. This is of course bogus
as any existing objects and refs will not be converted, thus causing
repository corruption:
$ git init repo
Initialized empty Git repository in /tmp/repo/.git/
$ cd repo/
$ git commit --allow-empty -m message
[main (root-commit) 35a7344] message
$ GIT_DEFAULT_HASH=sha256 git init
Reinitialized existing Git repository in /tmp/repo/.git/
$ git show
fatal: your current branch appears to be broken
Fix the issue by ignoring the environment variable in case the repo has
already been initialized with an object hash.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The GIT_DEFAULT_REF_FORMAT environment variable can be set to influence
the default ref format that new repostiories shall be initialized with.
While this is the expected behaviour when creating a new repository, it
is not when reinitializing a repository: we should retain the ref format
currently used by it in that case.
This doesn't work correctly right now:
$ git init --ref-format=files repo
Initialized empty Git repository in /tmp/repo/.git/
$ GIT_DEFAULT_REF_FORMAT=reftable git init repo
fatal: could not open '/tmp/repo/.git/refs/heads' for writing: Is a directory
Instead of retaining the current ref format, the reinitialization tries
to reinitialize the repository with the different format. This action
fails when git-init(1) tries to write the ".git/refs/heads" stub, which
in the context of the reftable backend is always written as a file so
that we can detect clients which inadvertently try to access the repo
with the wrong ref format. Seems like the protection mechanism works for
this case, as well.
Fix the issue by ignoring the environment variable in case the repo has
already been initialized with a ref storage format.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The test in question is an exact copy of the testcase preceding it.
Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git apply" uses strtoul() to parse the numbers in the hunk header but
silently ignores overflows. As LONG_MAX is a legitimate return value for
strtoul() we need to set errno to zero before the call to strtoul() and
check that it is still zero afterwards. The error message we display is
not particularly helpful as it does not say what was wrong. However, it
seems pretty unlikely that users are going to trigger this error in
practice and we can always improve it later if needed.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The C functions exported by libgit-sys do not provide an idiomatic Rust
interface. To make it easier to use these functions via Rust, add a
higher-level "libgit" crate, that wraps the lower-level configset API
with an interface that is more Rust-y.
This combination of $X and $X-sys crates is a common pattern for FFI in
Rust, as documented in "The Cargo Book" [1].
[1] https://doc.rust-lang.org/cargo/reference/build-scripts.html#-sys-packages
Co-authored-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "git refs migrate" command did not migrate the reflog for
refs/stash, which is the contents of the stashes, which has been
corrected.
* ps/reflog-migration-with-logall-fix:
refs: fix migration of reflogs respecting "core.logAllRefUpdates"
|
|
The trace2 code was not prepared to show a configuration variable
that is set to true using the valueless true syntax, which has been
corrected.
* am/trace2-with-valueless-true:
trace2: prevent segfault on config collection with valueless true
|
|
reflog entries for symbolic ref updates were broken, which has been
corrected.
* kn/reflog-symref-fix:
refs: fix creation of reflog entries for symrefs
|
|
"git branch --sort=..." and "git for-each-ref --format=... --sort=..."
did not work as expected with some atoms, which has been corrected.
* rs/ref-fitler-used-atoms-value-fix:
ref-filter: remove ref_format_clear()
ref-filter: move is-base tip to used_atom
ref-filter: move ahead-behind bases into used_atom
|
|
Introduce a new API to visit objects in batches based on a common
path, or by type.
* ds/path-walk-1:
path-walk: drop redundant parse_tree() call
path-walk: reorder object visits
path-walk: mark trees and blobs as UNINTERESTING
path-walk: visit tags and cached objects
path-walk: allow consumer to specify object types
t6601: add helper for testing path-walk API
test-lib-functions: add test_cmp_sorted
path-walk: introduce an object walk by path
|
|
Introduce libgit-sys, a Rust wrapper crate that allows Rust code to call
functions in libgit.a. This initial patch defines build rules and an
interface that exposes user agent string getter functions as a proof of
concept. This library can be tested with `cargo test`. In later commits,
a higher-level library containing a more Rust-friendly interface will be
added at `contrib/libgit-rs`.
Symbols in libgit can collide with symbols from other libraries such as
libgit2. We avoid this by first exposing library symbols in
public_symbol_export.[ch]. These symbols are prepended with "libgit_" to
avoid collisions and set to visible using a visibility pragma. In
build.rs, Rust builds contrib/libgit-rs/libgit-sys/libgitpub.a, which also
contains libgit.a and other dependent libraries, with
-fvisibility=hidden to hide all symbols within those libraries that
haven't been exposed with a visibility pragma.
Co-authored-by: Kyle Lippincott <spectral@google.com>
Co-authored-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Test fix.
* jp/t8002-printf-fix:
t8002: fix ambiguous printf conversion specifications
|
|
The reftable/ library code has been made -Wsign-compare clean.
* ps/reftable-sign-compare:
reftable: address trivial -Wsign-compare warnings
reftable/blocksource: adjust `read_block()` to return `ssize_t`
reftable/blocksource: adjust type of the block length
reftable/block: adjust type of the restart length
reftable/block: adapt header and footer size to return a `size_t`
reftable/basics: adjust `hash_size()` to return `uint32_t`
reftable/basics: adjust `common_prefix_size()` to return `size_t`
reftable/record: handle overflows when decoding varints
reftable/record: drop unused `print` function pointer
meson: stop disabling -Wsign-compare
|
|
The "cache" credential back-end did not handle authtype correctly,
which has been corrected.
* mh/credential-cache-authtype-request-fix:
credential-cache: respect authtype capability
|
|
Move a few more unit tests to the clar test framework.
* sk/unit-tests:
t/unit-tests: convert reftable tree test to use clar test framework
t/unit-tests: adapt priority queue test to use clar test framework
t/unit-tests: convert mem-pool test to use clar test framework
t/unit-tests: handle dashes in test suite filenames
|
|
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others. The built-in commands
have been fixed to show them on the standard output consistently.
* jc/show-usage-help:
builtin: send usage() help text to standard output
oddballs: send usage() help text to standard output
builtins: send usage_with_options() help text to standard output
usage: add show_usage_if_asked()
parse-options: add show_usage_with_options_if_asked()
t0012: optionally check that "-h" output goes to stdout
|
|
Add a new test-tool helper, name-hash, to output the value of the
name-hash algorithms for the input list of strings, one per line.
Since the name-hash values can be stored in the .bitmap files, it is
important that these hash functions do not change across Git versions.
Add a simple test to t5310-pack-bitmaps.sh to provide some testing of
the current values. Due to how these functions are implemented, it would
be difficult to change them without disturbing these values. The paths
used for this test are carefully selected to demonstrate some of the
behavior differences of the two current name hash versions, including
which conditions will cause them to collide.
Create a performance test that uses test_size to demonstrate how
collisions occur for these hash algorithms. This test helps inform
someone as to the behavior of the name-hash algorithms for their repo
based on the paths at HEAD.
My copy of the Git repository shows modest statistics around the
collisions of the default name-hash algorithm:
Test this tree
--------------------------------------------------
5314.1: paths at head 4.5K
5314.2: distinct hash value: v1 4.1K
5314.3: maximum multiplicity: v1 13
5314.4: distinct hash value: v2 4.2K
5314.5: maximum multiplicity: v2 9
Here, the maximum collision multiplicity is 13, but around 10% of paths
have a collision with another path.
In a more interesting example, the microsoft/fluentui [1] repo had these
statistics at time of committing:
Test this tree
--------------------------------------------------
5314.1: paths at head 19.5K
5314.2: distinct hash value: v1 8.2K
5314.3: maximum multiplicity: v1 279
5314.4: distinct hash value: v2 17.8K
5314.5: maximum multiplicity: v2 44
[1] https://github.com/microsoft/fluentui
That demonstrates that of the nearly twenty thousand path names, they
are assigned around eight thousand distinct values. 279 paths are
assigned to a single value, leading the packing algorithm to sort
objects from those paths together, by size.
With the v2 name hash function, the maximum multiplicity lowers to 44,
leaving some room for further improvement.
In a more extreme example, an internal monorepo had a much worse
collision rate:
Test this tree
--------------------------------------------------
5314.1: paths at head 227.3K
5314.2: distinct hash value: v1 72.3K
5314.3: maximum multiplicity: v1 14.4K
5314.4: distinct hash value: v2 166.5K
5314.5: maximum multiplicity: v2 138
Here, we can see that the v2 name hash function provides somem
improvements, but there are still a number of collisions that could lead
to repacking problems at this scale.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As custom options are added to 'git pack-objects' and 'git repack' to
adjust how compression is done, use this new performance test script to
demonstrate their effectiveness in performance and size.
The recently-added --name-hash-version option allows for testing
different name hash functions. Version 2 intends to preserve some of the
locality of version 1 while more often breaking collisions due to long
filenames.
Distinguishing objects by more of the path is critical when there are
many name hash collisions and several versions of the same path in the
full history, giving a significant boost to the full repack case. The
locality of the hash function is critical to compressing something like
a shallow clone or a thin pack representing a push of a single commit.
This can be seen by running pt5313 on the open source fluentui
repository [1]. Most commits will have this kind of output for the thin
and big pack cases, though certain commits (such as [2]) will have
problematic thin pack size for other reasons.
[1] https://github.com/microsoft/fluentui
[2] a637a06df05360ce5ff21420803f64608226a875
Checked out at the parent of [2], I see the following statistics:
Test HEAD
---------------------------------------------------------------
5313.2: thin pack with version 1 0.37(0.44+0.02)
5313.3: thin pack size with version 1 1.2M
5313.4: big pack with version 1 2.04(7.77+0.23)
5313.5: big pack size with version 1 20.4M
5313.6: shallow fetch pack with version 1 1.41(2.94+0.11)
5313.7: shallow pack size with version 1 34.4M
5313.8: repack with version 1 95.70(676.41+2.87)
5313.9: repack size with version 1 439.3M
5313.10: thin pack with version 2 0.12(0.12+0.06)
5313.11: thin pack size with version 2 22.0K
5313.12: big pack with version 2 2.80(5.43+0.34)
5313.13: big pack size with version 2 25.9M
5313.14: shallow fetch pack with version 2 1.77(2.80+0.19)
5313.15: shallow pack size with version 2 33.7M
5313.16: repack with version 2 33.68(139.52+2.58)
5313.17: repack size with version 2 160.5M
To make comparisons easier, I will reformat this output into a different
table style:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.37 s | 0.12 s | 1.2 M | 22.0 K |
| Big Pack | 2.04 s | 2.80 s | 20.4 M | 25.9 M |
| Shallow Pack | 1.41 s | 1.77 s | 34.4 M | 33.7 M |
| Repack | 95.70 s | 33.68 s | 439.3 M | 160.5 M |
The v2 hash function successfully differentiates the CHANGELOG.md files
from each other, which leads to significant improvements in the thin
pack (simulating a push of this commit) and the full repack. There is
some bloat in the "big pack" scenario and essentially the same results
for the shallow pack.
In the case of the Git repository, these numbers show some of the issues
with this approach:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.02 s | 0.02 s | 1.1 K | 1.1 K |
| Big Pack | 1.69 s | 1.95 s | 13.5 M | 14.5 M |
| Shallow Pack | 1.26 s | 1.29 s | 12.0 M | 12.2 M |
| Repack | 29.51 s | 29.01 s | 237.7 M | 238.2 M |
Here, the attempts to remove conflicts in the v2 function seem to cause
slight bloat to these sizes. This shows that the Git repository benefits
a lot from cross-path delta pairs.
The results are similar with the nodejs/node repo:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.02 s | 0.02 s | 1.6 K | 1.6 K |
| Big Pack | 4.61 s | 3.26 s | 56.0 M | 52.8 M |
| Shallow Pack | 7.82 s | 7.51 s | 104.6 M | 107.0 M |
| Repack | 88.90 s | 73.75 s | 740.1 M | 764.5 M |
Here, the v2 name-hash causes some size bloat more often than it reduces
the size, but it also universally improves performance time, which is an
interesting reversal. This must mean that it is helping to short-circuit
some delta computations even if it is not finding the most efficient
ones. The performance improvement cannot be explained only due to the
I/O cost of writing the resulting packfile.
The Linux kernel repository was the initial target of the default name
hash value, and its naming conventions are practically build to take the
most advantage of the default name hash values:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|----------|----------|---------|---------|
| Thin Pack | 0.17 s | 0.07 s | 4.6 K | 4.6 K |
| Big Pack | 17.88 s | 12.35 s | 201.1 M | 159.1 M |
| Shallow Pack | 11.05 s | 22.94 s | 269.2 M | 273.8 M |
| Repack | 727.39 s | 566.95 s | 2.5 G | 2.5 G |
Here, the thin and big packs gain some performance boosts in time, with
a modest gain in the size of the big pack. The shallow pack, however, is
more expensive to compute, likely because similarly-named files across
different directories are farther apart in the name hash ordering in v2.
The repack also gains benefits in computation time but no meaningful
change to the full size.
Finally, an internal Javascript repo of moderate size shows significant
gains when repacking with --name-hash-version=2 due to it having many name
hash collisions. However, it's worth noting that only the full repack
case has significant differences from the v1 name hash:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|-----------|-----------|----------|---------|---------|
| Thin Pack | 8.28 s | 7.28 s | 16.8 K | 16.8 K |
| Big Pack | 12.81 s | 11.66 s | 29.1 M | 29.1 M |
| Shallow | 4.86 s | 4.06 s | 42.5 M | 44.1 M |
| Repack | 3126.50 s | 496.33 s | 6.2 G | 855.6 M |
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new environment variable to opt-in to different values of the
--name-hash-version=<n> option in 'git pack-objects'. This allows for
extra testing of the feature without repeating all of the test
scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add
this to the linux-TEST-vars CI build as that test run is already
overloaded. The behavior exposed by this test variable is of low risk
and should be sufficient to allow manual testing when an issue arises.
But this option isn't free. There are a few tests that change behavior
with the variable enabled.
First, there are a few tests that are very sensitive to certain delta
bases being picked. These are both involving the generation of thin
bundles and then counting their objects via 'git index-pack --fix-thin'
which pulls the delta base into the new packfile. For these tests,
disable the option as a decent long-term option.
Second, there are some tests that compare the exact output of a 'git
pack-objects' process when using bitmaps. The warning that ignores the
--name-hash-version=2 and forces version 1 causes these tests to fail.
Disable the environment variable to get around this issue.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The new '--name-hash-version' option for 'git repack' is a simple
pass-through to the underlying 'git pack-objects' subcommand. However,
this subcommand may have other options and a temporary filename as part
of the subcommand execution that may not be predictable or could change
over time.
The existing test_subcommand method requires an exact list of arguments
for the subcommand. This is too rigid for our needs here, so create a
new method, test_subcommand_flex. Use it to check that the
--name-hash-version option is passing through.
Since we are modifying the 'git repack' command, let's bring its usage
in line with the Documentation's synopsis. This removes it from the
allow list in t0450 so it will remain in sync in the future.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous change introduced a new pack_name_hash_v2() function that
intends to satisfy much of the hash locality features of the existing
pack_name_hash() function while also distinguishing paths with similar
final components of their paths.
This change adds a new --name-hash-version option for 'git pack-objects'
to allow users to select their preferred function version. This use of
an integer version allows for future expansion and a direct way to later
store a name hash version in the .bitmap format.
For now, let's consider how effective this mechanism is when repacking a
repository with different name hash versions. Specifically, we will
execute 'git pack-objects' the same way a 'git repack -adf' process
would, except we include --name-hash-version=<n> for testing.
On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 260 MB | N/A |
| --name-hash-version=1 | 127 MB | 129s |
| --name-hash-version=2 | 127 MB | 112s |
This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics for both versions.
Let's consider some repositories that are hitting too many collisions
with version 1. First, let's explore the kinds of paths that are
commonly causing these collisions:
* "/CHANGELOG.json" is 15 characters, and is created by the beachball
[1] tool. Only the final character of the parent directory can
differentiate different versions of this file, but also only the two
most-significant digits. If that character is a letter, then this is
always a collision. Similar issues occur with the similar
"/CHANGELOG.md" path, though there is more opportunity for
differences In the parent directory.
* Localization files frequently have common filenames but
differentiates via parent directories. In C#, the name
"/strings.resx.lcl" is used for these localization files and they
will all collide in name-hash.
[1] https://github.com/microsoft/beachball
I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.
One open-source example is the fluentui [2] repo, which uses beachball
to generate CHANGELOG.json and CHANGELOG.md files, and these files have
very poor delta characteristics when comparing against versions across
parent directories.
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 694 MB | N/A |
| --name-hash-version=1 | 438 MB | 728s |
| --name-hash-version=2 | 168 MB | 142s |
[2] https://github.com/microsoft/fluentui
In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.
Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:
| Repo | --name-hash-version=1 | --name-hash-version=2 |
|----------|-----------------------|-----------------------|
| fluentui | 440 MB | 161 MB |
| Repo B | 6,248 MB | 856 MB |
| Repo C | 37,278 MB | 6,755 MB |
| Repo D | 131,204 MB | 7,463 MB |
Future changes could include making --name-hash-version implied by a config
value or even implied by default during a full repack.
It is important to point out that the name hash value is stored in the
.bitmap file format, so we must force --name-hash-version=1 when bitmaps
are being read or written. Later, the bitmap format could be updated to
be aware of the name hash version so deltas can be quickly computed
across the bitmapped/not-bitmapped boundary. To promote the safety of
this parameter, the validate_name_hash_version() method will die() if
the given name-hash version is incorrect and will disable newer versions
if not yet compatible with other features, such as --write-bitmap-index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In b1b713f722 (fetch set_head: handle mirrored bare repositories,
2024-11-22) it was implicitly assumed that all remotes will be mirrors
in a bare repository, thus fetching a non-mirrored remote could lead to
HEAD pointing to a non-existent reference. Make sure we only overwrite
HEAD if we are in a bare repository and fetching from a mirror.
Otherwise, proceed as normally, and create
refs/remotes/<nonmirrorremote>/HEAD instead.
Reported-by: Christian Hesse <list@eworm.de>
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This commit extends the functionality of `git gc`
by adding a new option, `--expire-to=<dir>`. Previously,
this feature was implemented in 91badeba32 (builtin/repack.c:
implement `--expire-to` for storing pruned objects, 2022-10-24),
which allowing users to specify a directory where unreachable
and expired cruft packs are stored during garbage collection.
However, users had to run `git repack --cruft --expire-to=<dir>`
followed by `git prune` to achieve similar results within `git gc`.
By introducing `--expire-to=<dir>` directly into `git gc`,
we simplify the process for users who wish to manage their
repository's cleanup more efficiently. This change involves
passing the `--expire-to=<dir>` parameter through to `git repack`,
making it easier for users to set up a backup location for cruft
packs that will be pruned.
Due to the original `git gc --prune=now` deleting all unreachable
objects by passing the `-a` parameter to git repack. With the
addition of the `--cruft` and `--expire-to` options, it is necessary
to modify this default behavior: instead of deleting these
unreachable objects, they should be merged into a cruft pack and
collected in a specified directory. Therefore, we do not pass `-a`
to the repack command but instead pass `--cruft`, `--expire-to`,
and `--cruft-expiration=now` to repack.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Back when Git was in its infancy, remotes were configured via separate
files in "branches/" (back in 2005). This mechanism was replaced later
that year with the "remotes/" directory. Both mechanisms have eventually
been replaced by config-based remotes, and it is very unlikely that
anybody still uses these directories to configure their remotes.
Both of these directories have been marked as deprecated, one in 2005
and the other one in 2011. Follow through with the deprecation and
finally announce the removal of these features in Git 3.0.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: with a small tweak to the help message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Extended SHA-1 expression parser did not work well when a branch
with an unusual name (e.g. "foo{bar") is involved.
* en/object-name-with-funny-refname-fix:
object-name: be more strict in parsing describe-like output
object-name: fix resolution of object names containing curly braces
|
|
Remove a series of conditionals within the shared cmd_hash_impl() helper
that powers the 'sha1' and 'sha1-unsafe' helpers.
Instead, replace them with a single conditional that transforms the
specified hash algorithm into its unsafe variant. Then all subsequent
calls can directly use whatever function it wants to call without having
to decide between the safe and unsafe variants.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With the new "unsafe" SHA-1 build knob, it is convenient to have a
test-tool that can exercise Git's unsafe SHA-1 wrappers for testing,
similar to 't/helper/test-tool sha1'.
Implement that helper by altering the implementation of that test-tool
(in cmd_hash_impl(), which is generic and parameterized over different
hash functions) to conditionally run the unsafe variants of the chosen
hash function, and expose the new behavior via a new 'sha1-unsafe' test
helper.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When TRACE2 analytics is enabled, a configuration variable set to
"valueless true" causes a segfault.
Steps to Reproduce
GIT_TRACE2=true GIT_TRACE2_CONFIG_PARAMS=status.*
git -c status.relativePaths version
Expected Result
git version 2.46.0
Actual Result
zsh: segmentation fault GIT_TRACE2=true
Add checks to prevent the segfault and instead show that the
variable without value.
Signed-off-by: Adam Murray <ad@canva.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The commit 297c09eabb (refs: allow multiple reflog entries for the
same refname, 2024-12-16) added logic to exit early in
`lock_ref_for_update()` after obtaining the required lock. This was
added as a performance optimization on a false assumption that no
further processing was required for reflog-only updates.
However the assumption was wrong. For a symref's reflog entry, the
update needs to be populated with the old_oid value, but the early
exit skipped this necessary step.
This caused a bug in Git 2.48 in the files backend where target
references of symrefs being updated would create a corrupted reflog
entry for the symref since the old_oid is not populated.
Everything the early exit skipped in the code path is necessary for
both regular and symbolic ref, so eliminate the mistaken
optimization, and also add a test to ensure that such an issue
doesn't arise in the future.
Reported-by: Nika Layzell <nika@thelayzells.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The git-pack-redundant(1) subcommand has been castrated to require
the "--i-still-use-this" option to do anything since 4406522b
(pack-redundant: escalate deprecation warning to an error,
2023-03-23), which appeared in Git 2.41 and was announced for
removal with 53a92c9552 (Documentation/BreakingChanges: announce
removal of git-pack-redundant(1), 2024-09-02). Stop compiling the
subcommand in case the `WITH_BREAKING_CHANGES` build flag is set.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With 57ec9254eb (docs: introduce document to announce breaking changes,
2024-06-14), we have introduced a new document that tracks upcoming
breaking changes in the Git project. In 2454970930 (BreakingChanges:
early adopter option, 2024-10-11) we have amended the document a bit to
mention that any introduced breaking changes must be accompanied by
logic that allows us to enable the breaking change at compile-time.
While we already have two breaking changes lined up, neither of them has
such a switch because they predate those instructions.
Introduce the proposed `WITH_BREAKING_CHANGES` preprocessor macro and
wire it up with both our Makefiles and Meson. This does not yet wire up
the build flag for existing deprecations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 246cebe320 (refs: add support for migrating reflogs, 2024-12-16) we
have added support to git-refs(1) to migrate reflogs between reference
backends. It was reported [1] though that not we don't migrate reflogs
for a subset of references, most importantly "refs/stash".
This issue is caused by us still honoring "core.logAllRefUpdates" when
trying to migrate reflogs: we do queue the updates, but depending on the
value of that config we may decide to just skip writing the reflog entry
altogether. And given that:
- The default for "core.logAllRefUpdates" is to only create reflogs
for branches, remotes, note refs and "HEAD"
- "refs/stash" is neither of these ref types.
We end up skipping the reflog creation for that particular reference.
Fix the bug by setting `REF_FORCE_CREATE_REFLOG`, which instructs the
ref backends to create the reflog entry regardless of the config or any
preexisting state.
[1]: <Z5BTQRlsOj1sygun@tapette.crustytoothpaste.net>
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function `reftable_writer_set_limits()` allows updating the
'min_update_index' and 'max_update_index' of a reftable writer. These
values are written to both the writer's header and footer.
Since the header is written during the first block write, any subsequent
changes to the update index would create a mismatch between the header
and footer values. The footer would contain the newer values while the
header retained the original ones.
To protect against this bug, prevent callers from updating these values
after any record is written. To do this, modify the function to return
an error whenever the limits are modified after any record adds. Check
for record adds within `reftable_writer_set_limits()` by checking the
`last_key` and `next` variable. The former is updated after each record
added, but is reset at certain points. The latter is set after writing
the first block.
Modify all callers of the function to anticipate a return type and
handle it accordingly. Add a unit test to also ensure the function
returns the error as expected.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The functions `header_size()` and `footer_size()` return a positive
integer representing the size of the header and footer, respectively,
dependent on the version of the reftable format. Similar to the
preceding commit, these functions return a signed integer though, which
is nonsensical given that there is no way for these functions to return
negative.
Adapt the functions to return a `size_t` instead to fix a couple of sign
comparison warnings.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `hash_size()` function returns the number of bytes used by the hash
function. Weirdly enough though, it returns a signed integer for its
size even though the size obviously cannot ever be negative. The only
case where it could be negative is if the function returned an error
when asked for an unknown hash, but we assert(3p) instead.
Adjust the type of `hash_size()` to be `uint32_t` and adapt all places
that use signed integers for the hash size to follow suit. This also
allows us to get rid of a couple asserts that we had which verified that
the size was indeed positive, which further stresses the point that this
refactoring makes sense.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `common_prefix_size()` function computes the length of the common
prefix between two buffers. As such its return value will always be an
unsigned integer, as the length cannot be negative. Regardless of that,
the function returns a signed integer, which is nonsensical and causes a
couple of -Wsign-compare warnings all over the place.
Adjust the function to return a `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The logic to decode varints isn't able to detect integer overflows: as
long as the buffer still has more data available, and as long as the
current byte has its 0x80 bit set, we'll continue to add up these values
to the result. This will eventually cause the `uint64_t` to overflow, at
which point we'll return an invalid result.
Refactor the function so that it is able to detect such overflows. The
implementation is basically copied from Git's own `decode_varint()`,
which already knows to handle overflows. The only adjustment is that we
also take into account the string view's length in order to not overrun
it. The reftable documentation explicitly notes that those two encoding
schemas are supposed to be the same:
Varint encoding
^^^^^^^^^^^^^^^
Varint encoding is identical to the ofs-delta encoding method used
within pack files.
Decoder works as follows:
....
val = buf[ptr] & 0x7f
while (buf[ptr] & 0x80) {
ptr++
val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
}
....
While at it, refactor `put_var_int()` in the same way by copying over
the implementation of `encode_varint()`. While `put_var_int()` doesn't
have an issue with overflows, it generates warnings with -Wsign-compare.
The implementation of `encode_varint()` doesn't, is battle-tested and at
the same time way simpler than what we currently have.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In e7fb2ca945 (builtin/blame: fix out-of-bounds write with blank
boundary commits, 2025-01-10), we have introduced two new tests that
expect a certain amount of padding. This padding is generated via
printf using the "%0.s" conversion specification. That directive is
ambiguous because it might be interpreted as field width (most shells)
or 0-padding flag for numeric fields (coreutils).
Fix this issue by using "%${N}s" instead, which is already being
used in other tests (i.e. t5300, t0450) and is unambiguous.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jan Palus <jpalus@fastmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The string_list "is_base_tips" in struct ref_format stores the
committish part of "is-base:<committish>". It has the same problems
that its sibling string_list "bases" had. Fix them the same way as the
previous commit did for the latter, by replacing the string_list with
fields in "used_atom".
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
verify_ref_format() parses a ref-filter format string and stores
recognized items in the static array "used_atom". For
"ahead-behind:<committish>" it stores the committish part in a
string_list member "bases" of struct ref_format.
ref_sorting_options() also parses bare ref-filter format items and
stores stores recognized ones in "used_atom" as well. The committish
parts go to a dummy struct ref_format in parse_sorting_atom(), though,
and are leaked and forgotten.
If verify_ref_format() is called before ref_sorting_options(), like in
git for-each-ref, then all works well if the sort key is included in the
format string. If it isn't then sorting cannot work as the committishes
are missing.
If ref_sorting_options() is called first, like in git branch, then we
have the additional issue that if the sort key is included in the format
string then filter_ahead_behind() can't see its committish, will not
generate any results for it and thus it will be expanded to an empty
string.
Fix those issues by replacing the string_list with a field in used_atom
for storing the committish. This way it can be shared for handling both
ref-filter format strings and sorting options in the same command.
Reported-by: Ross Goldberg <ross.goldberg@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Test update.
* sk/unit-test-hash:
t/unit-tests: convert hash to use clar test framework
|
|
More code paths have a repository passed through the callchain,
instead of assuming the primary the_repository object.
* ps/the-repository:
match-trees: stop using `the_repository`
graph: stop using `the_repository`
add-interactive: stop using `the_repository`
tmp-objdir: stop using `the_repository`
resolve-undo: stop using `the_repository`
credential: stop using `the_repository`
mailinfo: stop using `the_repository`
diagnose: stop using `the_repository`
server-info: stop using `the_repository`
send-pack: stop using `the_repository`
serve: stop using `the_repository`
trace: stop using `the_repository`
pager: stop using `the_repository`
progress: stop using `the_repository`
|
|
A misconfigured "fsck.skiplist" configuration variable was not
diagnosed as an error, which has been corrected.
* jt/fsck-skiplist-parse-fix:
fsck: reject misconfigured fsck.skipList
|
|
The code to compute "unique" name used git_rand() which can fail or
get stuck; the callsite does not require cryptographic security.
Introduce the "insecure" mode and use it appropriately.
* ps/reftable-get-random-fix:
reftable/stack: accept insecure random bytes
wrapper: allow generating insecure random bytes
|
|
Test clean-up.
* jk/t7407-use-test-grep:
t7407: use test_grep
|
|
The code to check LSan results has been simplified and made more
robust.
* jk/lsan-race-ignore-false-positive:
test-lib: add a few comments to LSan log checking
test-lib: simplify lsan results check
test-lib: invert return value of check_test_results_san_file_empty
|
|
* kn/reflog-migration-fix:
reftable: write correct max_update_index to header
|
|
Adapts reftable tree test script to clar framework by using clar
assertions where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert the prio-queue test script to clar framework by using clar
assertions where necessary. Test functions are created as a standalone
to test different cases.
update the type of the variable `j` from int to `size_t`, this ensures
compatibility with the type used for result_size, which is also size_t,
preventing a potential warning or error caused by comparisons between
signed and unsigned integers.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapt the mem-pool test script to use clar framework by using clar
assertions where necessary.Test functions are created as a standalone to
test different test cases.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"generate-clar-decls.sh" script is designed to extract function
signatures that match a specific pattern derived from the unit test
file's name. The script does not know to massage file names with dashes,
which will make it search for functions that look like, for example,
`test_mem-pool_*`. Having dashes in function names is not allowed
though, so these patterns won't ever match a legal function name.
Adapt script to translate dashes (`-`) in test suite filenames to
underscores (`_`) to correctly extract the function signatures and run
the corresponding tests. This will be used by subsequent commits which
follows the same construct.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using the show_usage_and_exit_if_asked() helper we introduced
earlier, fix callers of usage() that want to show the help text when
explicitly asked by the end-user. The help text now goes to the
standard output stream for them.
These are the bog standard "if we got only '-h', then that is a
request for help" callers. Their
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(message);
are simply replaced with
show_usage_and_exit_if_asked(argc, argv, message);
With this, the built-ins tested by t0012 all send their help text to
their standard output stream, so the check in t0012 that was half
tightened earlier is now fully tightened to insist on standard error
stream being empty.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using the show_usage_with_options_if_asked() helper we introduced
earlier, fix callers of usage_with_options() that want to show the
help text when explicitly asked by the end-user. The help text now
goes to the standard output stream for them.
The test in t7600 for "git merge -h" may want to be retired, as the
same is covered by t0012 already, but it is specifically testing that
the "-h" option gets a response even with a corrupt index file, so
for now let's leave it there.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For most commands, "git foo -h" will send the help output to stdout, as
this is what parse-options.c does. But some commands send it to stderr
instead. This is usually because they call usage_with_options(), and
should be switched to show_usage_help_and_exit_if_asked().
Currently t0012 is permissive and allows either behavior. We'd like it
to eventually enforce that help goes to stdout, and teaching it to do so
identifies the commands that need to be changed. But during the
transition period, we don't want to enforce that for most test runs.
So let's introduce a flag that will let most test runs use the
permissive behavior, and people interested in converting commands can
run:
GIT_TEST_HELP_MUST_BE_STDOUT=1 ./t0012-help.sh
to see the failures. Eventually (when all builtins have been converted)
we'll remove this flag entirely and always check the strict behavior.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Test modernization.
* mb/t7110-use-test-path-helper:
t7110: replace `test -f` with `test_path_is_*` helpers
|
|
More -Wsign-compare fixes.
* ps/more-sign-compare:
sign-compare: avoid comparing ptrdiff with an int/unsigned
commit-reach: use `size_t` to track indices when computing merge bases
shallow: fix -Wsign-compare warnings
builtin/log: fix remaining -Wsign-compare warnings
builtin/log: use `size_t` to track indices
commit-reach: use `size_t` to track indices in `get_reachable_subset()`
commit-reach: use `size_t` to track indices in `remove_redundant()`
commit-reach: fix type of `min_commit_date`
commit-reach: fix index used to loop through unsigned integer
prio-queue: fix type of `insertion_ctr`
|
|
In 297c09eabb (refs: allow multiple reflog entries for the same refname,
2024-12-16), the reftable backend learned to handle multiple reflog
entries within the same transaction. This was done modifying the
`update_index` for reflogs with multiple indices. During writing the
logs, the `max_update_index` of the writer was modified to ensure the
limits were raised to the modified `update_index`s.
However, since ref entries are written before the modification to the
`max_update_index`, if there are multiple blocks to be written, the
reftable backend writes the header with the old `max_update_index`. When
all logs are finally written, the footer will be written with the new
`min_update_index`. This causes a mismatch between the header and the
footer and causes the reftable file to be corrupted. The existing tests
only spawn a single block and since headers are lazily written with the
first block, the tests didn't capture this bug.
To fix the issue, the appropriate `max_update_index` limit must be set
even before the first block is written. Add a `max_index` field to the
transaction which holds the `max_index` within all its updates, then
propagate this value to the reftable backend, wherein this is used to
the set the `max_update_index` correctly.
Add a test which creates a few thousand reference updates with multiple
reflog entries, which should trigger the bug.
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git 2.47.2
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE4fA2sf7nIh/HeOzvsLXohpav5ssFAmdkT1sACgkQsLXohpav
# 5svdhRAAq0WoZIg+33vYNNVSTm3Ux9RJslmXs3lQuhuUJ61hK/28drSLU29GH7x7
# 3nmmjp1cegnXRVLBAfoYDdzPprNNrQFQEHQEzgG/GDZw0OXn+WTZuNyrrUYoa+sd
# QSLlElRj2qrpHIMOsMIBKBSNB+qjJHOMGdxcBAS768TfnQpGIpc1KJa24TxsVBzC
# ScP4uvrFfPyQrqFUgiUhCeqLnO/6T5i/QAn/8cS5a1+zor5ZHSlw28TZTOxN2odo
# Rulp/FtehiDEzmRowgD3M4fImAPY6Ib6VORCYASqpJFFla30tu2bQqEi6raOMTec
# hg5Ibkmj6fHFONaYvoTMRkYHmtUnNgIPU/CYPwswNk8w1+PPQfJ+TYjBXOQgdTLW
# F0azHBHh7NRmEHVydiF9CqjgNVRzjO4IEZfGqXNFPPMvR6UUzDaIkrpYbwXBFMin
# GNPV3QISeXj9ROjJoCv0nclXETwWemykjZlD6b5krXn5TaJlFb+69qJvXrCLq5WY
# EoevSqKkB9HVK9si7P8Sh1cPGOr3kfiFPmMNKFVI8l0+iDFgBywOomWNS/JEzqu1
# nN142DKdL1W/rkeMUhbX2h11CZNvHKIOy3iaA4MTOing8/eMzyUUQ73Ck7odYs4f
# rZ0tTXKJhxojPvBpTxYe9SxM0bDLREiOv0zX76+sIuhbAQCmk0o=
# =MNNf
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 19 Dec 2024 08:52:43 AM PST
# gpg: using RSA key E1F036B1FEE7221FC778ECEFB0B5E88696AFE6CB
# gpg: Good signature from "Junio C Hamano <gitster@pobox.com>" [ultimate]
# gpg: aka "Junio C Hamano <junio@pobox.com>" [ultimate]
# gpg: aka "Junio C Hamano <jch@google.com>" [ultimate]
* tag 'v2.47.2':
Git 2.47.2
Git 2.46.3
Git 2.45.3
Git 2.44.3
Git 2.43.6
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
|
|
From Documentation/revisions.txt:
'<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
Output from `git describe`; i.e. a closest tag, optionally
followed by a dash and a number of commits, followed by a dash, a
'g', and an abbreviated object name.
which means that output of the format
${REFNAME}-${INTEGER}-g${HASH}
should parse to fully expanded ${HASH}. This is fine. However, we
currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
-g${HASH} and assume the rest is valid. That is problematic, since it
breaks things like
git cat-file -p branchname:path/to/file/named/i-gaffed
which, when commit (or tree or blob) affed exists, will not return us
information about the file we are looking for but will instead
erroneously tell us about object affed.
A few additional notes:
- This is a slight backward incompatibility break, because we used
to allow ${GARBAGE}-g${HASH} as a way to spell ${HASH}. However,
a backward incompatible break is necessary, because there is no
other way for someone to be more specific and disambiguate that they
want the blob master:path/to/who-gabbed instead of the object abbed.
- There is a possibility that check_refname_format() rules change in
the future. However, we can only realistically loosen the rules
for what that function accepts rather than tighten. If we were to
tighten the rules, some real world repositories may already have
refnames that suddenly become unacceptable and we break those
repositories. As such, any describe-like syntax of the form
${VALID_FOR_A_REFNAME}-${INTEGER}-g${HASH} that is valid with the
changes in this commit will remain valid in the future.
- The fact that check_refname_format() rules could loosen in the
future is probably also an important reason to make this change. If
the rules loosen, there might be additional cases within
${GARBAGE}-g${HASH} that become ambiguous in the future. While
abbreviated hashes can be disambiguated by abbreviating less, it may
well be that these alternative object names have no way of being
disambiguated (much like pathnames cannot be). Accepting all random
${GARBAGE} thus makes it difficult for us to allow future
extensions to object naming.
So, tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
present in the string, and would be considered a valid ref and
non-negative integer.
Also, add a few tests for git describe using object names of the form
${REVISION_NAME}${MODIFIERS}
since an early version of this patch failed on constructs like
git describe v2.48.0-rc2-161-g6c2274cdbc^0
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Given a branch name of 'foo{bar', commands like
git cat-file -p foo{bar:README.md
should succeed (assuming that branch had a README.md file, of course).
However, the change in cce91a2caef9 (Change 'master@noon' syntax to
'master@{noon}'., 2006-05-19) presumed that curly braces would always
come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
to entirely miss the ':' and assume there's no object being referenced.
In short, git would report:
fatal: Not a valid object name foo{bar:README.md
Change the parsing to only make the assumption of paired curly braces
immediately after either a '@' or '^' character appears.
Add tests for this, as well as for a few other test cases that initial
versions of this patch broke:
* 'foo@@{...}'
* 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'
Note that we'd prefer not duplicating the special logic for "@^" characters
here, because if get_oid_basic() or interpret_nth_prior_checkout() or
get_oid_basic() or similar gain extra methods of using curly braces,
then the logic in get_oid_with_context_1() would need to be updated as
well. But it's not clear how to refactor all of these to have a simple
common callpoint with the specialized logic.
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Helped-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Last-minute fix for a regression in "git blame --abbrev=<length>"
when insane <length> is specified; we used to correctly cap it to
the hash output length but broke it during the cycle.
* ps/build-sign-compare:
builtin/blame: fix out-of-bounds write with blank boundary commits
builtin/blame: fix out-of-bounds read with excessive `--abbrev`
|
|
One test in t7422 asserts that `git submodule status --recursive`
properly handles SIGPIPE. This test is flaky though and may sometimes
not see a SIGPIPE at all:
expecting success of 7422.18 'git submodule status --recursive propagates SIGPIPE':
{ git submodule status --recursive 2>err; echo $?>status; } |
grep -q X/S &&
test_must_be_empty err &&
test_match_signal 13 "$(cat status)"
++ git submodule status --recursive
++ grep -q X/S
++ echo 0
++ test_must_be_empty err
++ test 1 -ne 1
++ test_path_is_file err
++ test 1 -ne 1
++ test -f err
++ test -s err
+++ cat status
++ test_match_signal 13 0
++ test 0 = 141
++ test 0 = 269
++ return 1
error: last command exited with $?=1
not ok 18 - git submodule status --recursive propagates SIGPIPE
The issue is caused by a race between git-submodule(1) and grep(1):
1. git-submodule(1) (or its child process) writes the first X/S line
we're trying to match.
2. grep(1) matches the line.
3a. grep(1) exits, closing the pipe.
3b. git-submodule(1) (or its child process) writes the rest of its
lines.
Steps 3a and 3b happen at the same time without any guarantees. If 3a
happens first, we get SIGPIPE. Otherwise, we don't and the test fails.
Fix the issue by generating a couple thousand nested submodules and
matching on the first nested submodule. This ensures that the recursive
git-submodule(1) process completely fills its stdout buffer, which makes
subsequent writes block until the downstream consumer of the pipe either
reads more or closes it.
To verify that this works as expected one can apply the following patch
to the preimage of this commit, which used to reliably trigger the race:
diff --git a/t/t7422-submodule-output.sh b/t/t7422-submodule-output.sh
index 3c5177cc30..df6001f8a0 100755
--- a/t/t7422-submodule-output.sh
+++ b/t/t7422-submodule-output.sh
@@ -202,7 +202,7 @@ test_expect_success !MINGW 'git submodule status --recursive propagates SIGPIPE'
cd repo &&
GIT_ALLOW_PROTOCOL=file git submodule add "$(pwd)"/../submodule &&
{ git submodule status --recursive 2>err; echo $?>status; } |
- grep -q recursive-submodule-path-1 &&
+ { sleep 1 && grep -q recursive-submodule-path-1 && sleep 1; } &&
test_must_be_empty err &&
test_match_signal 13 "$(cat status)"
)
With the pipe-stuffing workaround the test runs successfully.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Two of our tests in t0060 verify that the runtime prefix functionality
works as expected by creating a separate directory hierarchy, copying
the Git executable in there and then creating scripts relative to that
executable.
These tests fail quite regularly in GitLab CI with the following error:
expecting success of 0060.218 '%(prefix)/ works':
mkdir -p pretend/bin &&
cp "$GIT_EXEC_PATH"/git$X pretend/bin/ &&
git config yes.path "%(prefix)/yes" &&
GIT_EXEC_PATH= ./pretend/bin/git config --path yes.path >actual &&
echo "$(pwd)/pretend/yes" >expect &&
test_cmp expect actual
++ mkdir -p pretend/bin
++ cp /c/GitLab-Runner/builds/gitlab-org/git/git.exe pretend/bin/
cp: cannot create regular file 'pretend/bin/git.exe': Device or resource busy
error: last command exited with $?=1
not ok 218 - %(prefix)/ works
Seemingly, the "git.exe" binary we are trying to overwrite is still
being held open. It is somewhat puzzling why exactly that is: while the
preceding test _does_ write to and execute the same path, it should have
exited and shouldn't keep any backgrounded processes around. So it must
be held open by something else, either in MinGW or in Windows itself.
While the root cause is puzzling, the workaround is trivial enough:
instead of writing the file twice we simply pull the common setup into a
separate test case so that we won't observe EBUSY in the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When passing the `-b` flag to git-blame(1), then any blamed boundary
commits which were marked as uninteresting will not get their actual
commit ID printed, but will instead be replaced by a couple of spaces.
The flag can lead to an out-of-bounds write as though when combined with
`--abbrev=` when the abbreviation length is longer than `GIT_MAX_HEXSZ`
as we simply use memset(3p) on that array with the user-provided length
directly. The result is most likely that we segfault.
An obvious fix would be to cull `length` to `GIT_MAX_HEXSZ` many bytes.
But when the underlying object ID is SHA1, and if the abbreviated length
exceeds the SHA1 length, it would cause us to print more bytes than
desired, and the result would be misaligned.
Instead, fix the bug by computing the length via strlen(3p). This makes
us write as many bytes as the formatted object ID requires and thus
effectively limits the length of what we may end up printing to the
length of its hash. If `--abbrev=` asks us to abbreviate to something
shorter than the full length of the underlying hash function it would be
handled by the call to printf(3p) correctly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 6411a0a896 (builtin/blame: fix type of `length` variable when
emitting object ID, 2024-12-06) we have fixed the type of the `length`
variable. In order to avoid a cast from `size_t` to `int` in the call to
printf(3p) with the "%.*s" formatter we have converted the code to
instead use fwrite(3p), which accepts the length as a `size_t`.
It was reported though that this makes us read over the end of the OID
array when the provided `--abbrev=` length exceeds the length of the
object ID. This is because fwrite(3p) of course doesn't stop when it
sees a NUL byte, whereas printf(3p) does.
Fix the bug by reverting back to printf(3p) and culling the provided
length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an
`int`.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Previously, credential-cache populated authtype regardless whether
"get" request had authtype capability. As documented in
git-credential.txt, authtype "should not be sent unless the appropriate
capability ... is provided".
Add test. Without this change, the test failed because "credential fill"
printed an incomplete credential with only protocol and host attributes
(the unexpected authtype attribute was discarded by credential.c).
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adapt the hash test functions to clar framework by using clar
assertions where necessary. Following the consensus to convert
the unit-tests scripts found in the t/unit-tests folder to clar driven by
Patrick Steinhardt. Test functions are structured as a standalone to
test individual hash string and literal case.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Last-minute fix to a recent update.
* js/reftable-realloc-errors-fix:
t-reftable-basics: allow for `malloc` to be `#define`d
|
|
As indicated by the `#undef malloc` line in `reftable/basics.h`, it is
quite common to use allocators other than the default one by defining
`malloc` constants and friends.
This pattern is used e.g. in Git for Windows, which uses the powerful
and performant `mimalloc` allocator.
Furthermore, in `reftable/basics.c` this `#undef malloc` is
_specifically_ disabled by virtue of defining the
`REFTABLE_ALLOW_BANNED_ALLOCATORS` constant before including
`reftable/basic.h`, to ensure that such a custom allocator is also used
in the reftable code.
However, in 8db127d43f5b (reftable: avoid leaks on realloc error,
2024-12-28) and in 2cca185e8517 (reftable: fix allocation count on
realloc error, 2024-12-28), `reftable_set_alloc()` function calls were
introduced that pass `malloc`, `realloc` and `free` function pointers as
parameters _after_ `reftable/basics.h` ensured that they were no longer
`#define`d. This would override the custom allocator and re-set it to
the default allocator provided by, say, libc or MSVCRT.
This causes problems because those calls happen after the initial
allocator has already been used to initialize an array, which is
subsequently resized using the overridden default `realloc()` allocator.
You cannot mix and match allocators like that, which leads to a
`STATUS_HEAP_CORRUPTION` (C0000374) on Windows, and when running this
unit test through shell and/or `prove` (which only support 7-bit status
codes), it surfaces as exit code 127.
It is actually unnecessary to use those function pointers to
`malloc`/`realloc`/`free`, though: The `reftable` code goes out of its
way to fall back to the initial allocator when passing `NULL` parameters
instead. So let's do that instead of causing heap corruptions.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In Git, fsck operations can ignore known broken objects via the
`fsck.skipList` configuration. This option expects a path to a file with
the list of object names. When the configuration is specified without a
path, an error message is printed, but the command continues as if the
configuration was not set. Configuring `fsck.skipList` without a value
is a misconfiguration so config parsing should be more strict and reject
it.
Update `git_fsck_config()` to no longer ignore misconfiguration of
`fsck.skipList`. The same behavior is also present for
`fetch.fsck.skipList` and `receive.fsck.skipList` so the configuration
parsers for these are updated to ensure the related operations remain
consistent.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `csprng_bytes()` function generates randomness and writes it into a
caller-provided buffer. It abstracts over a couple of implementations,
where the exact one that is used depends on the platform.
These implementations have different guarantees: while some guarantee to
never fail (arc4random(3)), others may fail. There are two significant
failures to distinguish from one another:
- Systemic failure, where e.g. opening "/dev/urandom" fails or when
OpenSSL doesn't have a provider configured.
- Entropy failure, where the entropy pool is exhausted, and thus the
function cannot guarantee strong cryptographic randomness.
While we cannot do anything about the former, the latter failure can be
acceptable in some situations where we don't care whether or not the
randomness can be predicted.
Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt
into weak cryptographic randomness. The exact behaviour of the flag
depends on the underlying implementation:
- `arc4random_buf()` never returns an error, so it doesn't change.
- `getrandom()` pulls from "/dev/urandom" by default, which never
blocks on modern systems even when the entropy pool is empty.
- `getentropy()` seems to block when there is not enough randomness
available, and there is no way of changing that behaviour.
- `GtlGenRandom()` doesn't mention anything about its specific
failure mode.
- The fallback reads from "/dev/urandom", which also returns bytes in
case the entropy pool is drained in modern Linux systems.
That only leaves OpenSSL with `RAND_bytes()`, which returns an error in
case the returned data wouldn't be cryptographically safe. This function
is replaced with a call to `RAND_pseudo_bytes()`, which can indicate
whether or not the returned data is cryptographically secure via its
return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag
is set, then we ignore the insecurity and return the data regardless.
It is somewhat questionable whether we really need the flag in the first
place, or whether we wouldn't just ignore the potentially-insecure data.
But the risk of doing that is that we might have or grow callsites that
aren't aware of the potential insecureness of the data in places where
it really matters. So using a flag to opt-in to that behaviour feels
like the more secure choice.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are a few grep calls here that can benefit from test_grep, which
produces more user-friendly output when it fails.
One of these calls also passes "-sq", which is curious. The "-q" option
suppresses the matched output. But test output is either already
redirected to /dev/null in non-verbose mode, and in verbose mode it's
better to see the output. The "-s" option suppresses errors opening
files, but we are just grepping in the "expected" file we just
generated, so it should not be needed. Neither of these was really
hurting anything, but they are not a style we'd like to see emulated. So
get rid of them.
(It is also curious to grep in the expected file in the first place, but
that is because we are auto-generating the expectation from a Git
command. So this is double-checking it did what we wanted).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit b119a687d4 (test-lib: ignore leaks in the sanitizer's thread
code, 2025-01-01) added code to suppress a false positive in the leak
checker. But if you're just reading the code, the obscure grep call is a
bit of a head-scratcher. Let's add a brief comment explaining what's
going on (and anybody digging further can find this commit or that one
for all the details).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We want to know if there are any leaks logged by LSan in the results
directory, so we run "find" on the containing directory and pipe it to
xargs. We can accomplish the same thing by just globbing in the shell
and passing the result to grep, which has a few advantages:
- it's one fewer process to run
- we can glob on the TEST_RESULTS_SAN_FILE pattern, which is what we
checked at the beginning of the function, and is the same glob used
to show the logs in check_test_results_san_file_
- this correctly handles the case where TEST_OUTPUT_DIRECTORY has a
space in it. For example doing:
mkdir "/tmp/foo bar"
TEST_OUTPUT_DIRECTORY="/tmp/foo bar" make SANITIZE=leak test
would yield a lot of:
grep: /tmp/foo: No such file or directory
grep: bar/test-results/t0006-date.leak/trace.test-tool.582311: No such file or directory
when there are leaks. We could do the same thing with "xargs
--null", but that isn't portable.
We are now subject to command-line length limits, but that is also true
of the globbing cat used to show the logs themselves. This hasn't been a
problem in practice.
We do need to use "grep -s" for the case that the glob does not expand
(i.e., there are not any log files at all). This option is in POSIX, and
has been used in t7407 for several years without anybody complaining.
This also also naturally handles the case where the surrounding
directory has already been removed (in which case there are likewise no
files!), dropping the need to comment about it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have a function to check whether LSan logged any leaks. It returns
success for no leaks, and non-zero otherwise. This is the simplest thing
for its callers, who want to say "if no leaks then return early". But
because it's implemented as a shell pipeline, you end up with the
awkward:
! find ... |
xargs grep leaks |
grep -v false-positives
where the "!" is actually negating the final grep. Switch the return
value (and name) to return success when there are leaks. This should
make the code a little easier to read, and the negation in the callers
still reads pretty naturally.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
`test -f` and `! test -f` do not provide clear error messages when they fail.
To enhance debuggability, use `test_path_is_file` and `test_path_is_missing`,
which instead provide more informative error messages.
Note that `! test -f` checks if a path is not a file, while
`test_path_is_missing` verifies that a path does not exist. In this specific
case the tests are meant to check the absence of the path, making
`test_path_is_missing` a valid replacement.
Signed-off-by: Matteo Bagnolini <matteobagnolini2003@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The build procedure based on meson learned to generate HTML
documention pages.
* ps/build-meson-html:
Documentation: wire up sanity checks for Meson
t/Makefile: make "check-meson" work with Dash
meson: install static files for HTML documentation
meson: generate articles
Documentation: refactor "howto-index.sh" for out-of-tree builds
Documentation: refactor "api-index.sh" for out-of-tree builds
meson: generate user manual
Documentation: inline user-manual.conf
meson: generate HTML pages for all man page categories
meson: fix generation of merge tools
meson: properly wire up dependencies for our docs
meson: wire up support for AsciiDoctor
|
|
CI jobs that run threaded programs under LSan has been giving false
positives from time to time, which has been worked around.
This is an alternative to the jk/lsan-race-with-barrier topic with
much smaller change to the production code.
* jk/lsan-race-ignore-false-positive:
test-lib: ignore leaks in the sanitizer's thread code
test-lib: check leak logs for presence of DEDUP_TOKEN
test-lib: simplify leak-log checking
test-lib: rely on logs to detect leaks
Revert barrier-based LSan threading race workaround
|
|
Our CI jobs sometimes see false positive leaks like this:
=================================================================
==3904583==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
#1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
#2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
#3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
#4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
#5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
#6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
#7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
This is not a leak in our code, but appears to be a race between one
thread calling exit() while another one is in LSan's stack setup code.
You can reproduce it easily by running t0003 or t5309 with --stress
(these trigger it because of the threading in git-grep and index-pack
respectively).
This may be a bug in LSan, but regardless of whether it is eventually
fixed, it is useful to work around it so that we stop seeing these false
positives.
We can recognize it by the mention of the sanitizer functions in the
DEDUP_TOKEN line. With this patch, the scripts mentioned above should
run with --stress indefinitely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we check the leak logs, our original strategy was to check for any
non-empty log file produced by LSan. We later amended that to ignore
noisy lines in 370ef7e40d (test-lib: ignore uninteresting LSan output,
2023-08-28).
This makes it hard to ignore noise which is more than a single line;
we'd have to actually parse the file to determine the meaning of each
line.
But there's an easy line-oriented solution. Because we always pass the
dedup_token_length option, the output will contain a DEDUP_TOKEN line
for each leak that has been found. So if we invert our strategy to stop
ignoring useless lines and only look for useful ones, we can just count
the number of DEDUP_TOKEN lines. If it's non-zero, then we found at
least one leak (it would even give us a count of unique leaks, but we
really only care if it is non-zero).
This should yield the same outcome, but will help us build more false
positive detection on top.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have a function to count the number of leaks found (actually, it is
the number of processes which produced a log file). Once upon a time we
cared about seeing if this number increased between runs. But we
simplified that away in 95c679ad86 (test-lib: stop showing old leak
logs, 2024-09-24), and now we only care if it returns any results or
not.
In preparation for refactoring it further, let's drop the counting
function entirely, and roll it into the "is it empty" check. The outcome
should be the same, but we'll be free to return a boolean "did we find
anything" without worrying about somebody adding a new call to the
counting function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we run with sanitizers, we set abort_on_error=1 so that the tests
themselves can detect problems directly (when the buggy program exits
with SIGABRT). This has one blind spot, though: we don't always check
the exit codes for all programs (e.g., helpers like upload-pack invoked
behind the scenes).
For ASan and UBSan this is mostly fine; they exit as soon as they see an
error, so the unexpected abort of the program causes the test to fail
anyway.
But for LSan, the program runs to completion, since we can only check
for leaks at the end. And in that case we could miss leak reports. And
thus we started checking LSan logs in faececa53f (test-lib: have the
"check" mode for SANITIZE=leak consider leak logs, 2022-07-28).
Originally the logs were optional, but logs are generated (and checked)
always as of 8c1d6691bc (test-lib: GIT_TEST_SANITIZE_LEAK_LOG enabled by
default, 2024-07-11). And we even check them for each test snippet, as
of cf1464331b (test-lib: check for leak logs after every test,
2024-09-24).
So now aborting on error is superfluous for LSan! We can get everything
we need by checking the logs. And checking the logs is actually
preferable, since it gives us more control over silencing false
positives (something we do not yet do, but will soon).
So let's tell LSan to just exit normally, even if it finds leaks. We can
do so with exitcode=0, which also suppresses the abort_on_error flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
CI jobs that run threaded programs under LSan has been giving false
positives from time to time, which has been worked around.
* jk/lsan-race-with-barrier:
grep: work around LSan threading race with barrier
index-pack: work around LSan threading race with barrier
thread-utils: introduce optional barrier type
Revert "index-pack: spawn threads atomically"
test-lib: use individual lsan dir for --stress runs
|
|
The custom allocator code in the reftable library did not handle
failing realloc() very well, which has been addressed.
* rs/reftable-realloc-errors:
t-reftable-merged: handle realloc errors
reftable: handle realloc error in parse_names()
reftable: fix allocation count on realloc error
reftable: avoid leaks on realloc error
|
|
Test modernization.
* ms/t7611-test-path-is-file:
t7611: replace test -f with test_path_is* helpers
|
|
When storing output in test-results/, we usually give each numbered run
in a --stress set its own output file. But we don't do that for storing
LSan logs, so something like:
./t0003-attributes.sh --stress
will have many scripts simultaneously creating, writing to, and deleting
the test-results/t0003-attributes.leak directory. This can cause logs
from one run to be attributed to another, spurious failures when
creation and deletion race, and so on.
This has always been broken, but nobody noticed because it's rare to do
a --stress run with LSan (since the point is for the code to run quickly
many times in order to hit races). But if you're trying to find a race
in the leak sanitizing code, it makes sense to use these together.
We can fix it by using $TEST_RESULTS_BASE, which already incorporates
the stress job suffix.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Check reallocation errors in unit tests, like everywhere else.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When realloc(3) fails, it returns NULL and keeps the original allocation
intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and
the allocation count variable in that case, simultaneously leaking the
original allocation and misrepresenting the number of storable items.
parse_names() avoids the leak by keeping the original pointer if
reallocation fails, but still increase the allocation count in such a
case as if it succeeded. That's OK, because the error handling code
just frees everything and doesn't look at names_cap anymore.
reftable_buf_add() does the same, but here it is a problem as it leaves
the reftable_buf in a broken state, with ->alloc being roughly twice as
big as the actually allocated memory, allowing out-of-bounds writes in
subsequent calls.
Reimplement REFTABLE_ALLOC_GROW to avoid leaks, keep allocation counts
in sync and still signal failures to callers while avoiding code
duplication in callers. Make it an expression that evaluates to 0 if no
reallocation is needed or it succeeded and 1 on failure while keeping
the original pointer and allocation counter values.
Adjust REFTABLE_ALLOC_GROW_OR_NULL to the new calling convention for
REFTABLE_ALLOC_GROW, but keep its support for non-size_t alloc variables
for now.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When realloc(3) fails, it returns NULL and keeps the original allocation
intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and
the allocation count variable in that case, simultaneously leaking the
original allocation and misrepresenting the number of storable items.
parse_names() and reftable_buf_add() avoid leaking by restoring the
original pointer value on failure, but all other callers seem to be OK
with losing the old allocation. Add a new variant of the macro,
REFTABLE_ALLOC_GROW_OR_NULL, which plugs the leak and zeros the
allocation counter. Use it for those callers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "check-meson" target uses process substitution to check whether
extracted contents from "meson.build" match expected contents. Process
substitution is unportable though and thus the target will fail when
using for example Dash.
Fix this by writing data into a temporary directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Building our "gitweb" interface is optional in our Makefile and in Meson
and not wired up at all with CMake, but disabling it causes a couple of
tests in the t950* range that pull in "t/lib-gitweb.sh". This is because
the test library knows to execute gitweb-tests based on whether or not
Perl is available, but we may have Perl available and still end up not
building gitweb e.g. with `make test NO_GITWEB=YesPlease`.
Fix this issue by wiring up a new "NO_GITWEB" build option so that we
can skip these tests in case gitweb is not built.
Note that this new build option requires us to move the configuration of
GIT-BUILD-OPTIONS to a later point in our Meson build instructions. But
as that file is only consumed by our tests at runtime this change does
not cause any issues.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Replace `test -f` and `test ! -f` with `test_path_is_file` and
`test_path_is_missing` for better debuggability.
While `test -f` ensures that the file exists and is a regular file,
`test_path_is_file` provides clearer error messages on failure.
On the other hand, `test ! -f` checks either the absence of a regular
file or the presence of any other filesystem object, but looking at
them in the test individually, all of them should've said `test ! -e`,
i.e. "there shouldn't be anything at given path on filesystem."
Replace these cases with `test_path_is_missing` for better
debuggability.
Helped-by: karthik nayak <karthik.188@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The functions `repo_get_merge_bases_many()` and friends accepts an array
of commits as well as a parameter that indicates how large that array
is. This parameter is using a signed integer, which leads to a couple of
warnings with -Wsign-compare.
Refactor the code to use `size_t` to track indices instead and adapt
callers accordingly. While most callers are trivial, there are two
callers that require a bit more scrutiny:
- builtin/merge-base.c:show_merge_base() subtracts `1` from the
`rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if
the variable was `0` it would wrap. This code is fine though because
its only caller will execute that code only when `argc >= 2`, and it
follows that `rev_nr >= 2`, as well.
- bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`.
Again, there is only a single caller that populates `rev_nr` with
`good_revs.nr`. And because a bisection always requires at least one
good revision it follws that `rev_nr >= 1`.
Mark the file as -Wsign-compare-clean.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git refs migrate" learned to also migrate the reflog data across
backends.
* kn/reflog-migration:
refs: mark invalid refname message for translation
refs: add support for migrating reflogs
refs: allow multiple reflog entries for the same refname
refs: introduce the `ref_transaction_update_reflog` function
refs: add `committer_info` to `ref_transaction_add_update()`
refs: extract out refname verification in transactions
refs/files: add count field to ref_lock
refs: add `index` field to `struct ref_udpate`
refs: include committer info in `ref_update` struct
|
|
The meson-build procedure is integrated into CI to catch and
prevent bitrotting.
* ps/ci-meson:
ci: wire up Meson builds
t: introduce compatibility options to clar-based tests
t: fix out-of-tree tests for some git-p4 tests
Makefile: detect missing Meson tests
meson: detect missing tests at configure time
t/unit-tests: rename clar-based unit tests to have a common prefix
Makefile: drop -DSUPPRESS_ANNOTATED_LEAKS
ci/lib: support custom output directories when creating test artifacts
|
|
"git range-diff" learned to optionally show and compare merge
commits in the ranges being compared, with the --diff-merges
option.
* js/range-diff-diff-merges:
range-diff: introduce the convenience option `--remerge-diff`
range-diff: optionally include merge commits' diffs in the analysis
|
|
Regression fix for 'show-index' when run outside of a repository.
* as/show-index-uninitialized-hash:
t5300: add test for 'show-index --object-format'
show-index: fix uninitialized hash function
|
|
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
|
|
Reftable backend adds check for upper limit of log's update_index.
* kn/reftable-writer-log-write-verify:
reftable/writer: ensure valid range for log's update_index
|
|
The path-walk API currently uses a stack-based approach to recursing
through the list of paths within the repository. This guarantees that
after a tree path is explored, all paths contained within that tree path
will be explored before continuing to explore siblings of that tree
path.
The initial motivation of this depth-first approach was to minimize
memory pressure while exploring the repository. A breadth-first approach
would have too many "active" paths being stored in the paths_to_lists
map.
We can take this approach one step further by making sure that blob
paths are visited before tree paths. This allows the API to free the
memory for these blob objects before continuing to perform the
depth-first search. This modifies the order in which we visit siblings,
but does not change the fact that we are performing depth-first search.
To achieve this goal, use a priority queue with a custom sorting method.
The sort needs to handle tags, blobs, and trees (commits are handled
slightly differently). When objects share a type, we can sort by path
name. This will keep children of the latest path to leave the stack be
preferred over the rest of the paths in the stack, since they agree in
prefix up to and including a directory separator. When the types are
different, we can prefer tags over other types and blobs over trees.
This causes significant adjustments to t6601-path-walk.sh to rearrange
the order of the visited paths.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the input rev_info has UNINTERESTING starting points, we want to be
sure that the UNINTERESTING flag is passed appropriately through the
objects. To match how this is done in places such as 'git pack-objects', we
use the mark_edges_uninteresting() method.
This method has an option for using the "sparse" walk, which is similar in
spirit to the path-walk API's walk. To be sure to keep it independent, add a
new 'prune_all_uninteresting' option to the path_walk_info struct.
To check how the UNINTERSTING flag is spread through our objects, extend the
'test-tool path-walk' command to output whether or not an object has that
flag. This changes our tests significantly, including the removal of some
objects that were previously visited due to the incomplete implementation.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The rev_info that is specified for a path-walk traversal may specify
visiting tag refs (both lightweight and annotated) and also may specify
indexed objects (blobs and trees). Update the path-walk API to walk
these objects as well.
When walking tags, we need to peel the annotated objects until reaching
a non-tag object. If we reach a commit, then we can add it to the
pending objects to make sure we visit in the commit walk portion. If we
reach a tree, then we will assume that it is a root tree. If we reach a
blob, then we have no good path name and so add it to a new list of
"tagged blobs".
When the rev_info includes the "--indexed-objects" flag, then the
pending set includes blobs and trees found in the cache entries and
cache-tree. The cache entries are usually blobs, though they could be
trees in the case of a sparse index. The cache-tree stores
previously-hashed tree objects but these are cleared out when staging
objects below those paths. We add tests that demonstrate this.
The indexed objects come with a non-NULL 'path' value in the pending
item. This allows us to prepopulate the 'path_to_lists' strmap with
lists for these paths.
The tricky thing about this walk is that we will want to combine the
indexed objects walk with the commit walk, especially in the future case
of walking objects during a command like 'git repack'.
Whenever possible, we want the objects from the index to be grouped with
similar objects in history. We don't want to miss any paths that appear
only in the index and not in the commit history.
Thus, we need to be careful to let the path stack be populated initially
with only the root tree path (and possibly tags and tagged blobs) and go
through the normal depth-first search. Afterwards, if there are other
paths that are remaining in the paths_to_lists strmap, we should then
iterate through the stack and visit those objects recursively.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We add the ability to filter the object types in the path-walk API so
the callback function is called fewer times.
This adds the ability to ask for the commits in a list, as well. We
re-use the empty string for this set of objects because these are passed
directly to the callback function instead of being part of the
'path_stack'.
Future changes will add the ability to visit annotated tags.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add some tests based on the current behavior, doing interesting checks
for different sets of branches, ranges, and the --boundary option. This
sets a baseline for the behavior and we can extend it as new options are
introduced.
Store and output a 'batch_nr' value so we can demonstrate that the paths are
grouped together in a batch and not following some other ordering. This
allows us to test the depth-first behavior of the path-walk API. However, we
purposefully do not test the order of the objects in the batch, so the
output is compared to the expected output through a sort.
It is important to mention that the behavior of the API will change soon as
we start to handle UNINTERESTING objects differently, but these tests will
demonstrate the change in behavior.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This test helper will be helpful to reduce repeated logic in
t6601-path-walk.sh, but may be helpful elsewhere, too.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git bundle create" with an annotated tag on the positive end of
the revision range had a workaround code for older limitation in
the revision walker, which has become unnecessary.
* tc/bundle-with-tag-remove-workaround:
bundle: remove unneeded code
|
|
"git log -p --remerge-diff --reverse" was completely broken.
* js/log-remerge-keep-ancestry:
log: --remerge-diff needs to keep around commit parents
|
|
"git fetch" honors "remote.<remote>.followRemoteHEAD" settings to
tweak the remote-tracking HEAD in "refs/remotes/<remote>/HEAD".
* bf/fetch-set-head-config:
remote set-head: set followRemoteHEAD to "warn" if "always"
fetch set_head: add warn-if-not-$branch option
fetch set_head: move warn advice into advise_if_enabled
fetch: add configuration for set_head behaviour
|
|
"git fetch" from a configured remote learned to update a missing
remote-tracking HEAD but it asked the remote about their HEAD even
when it did not need to, which has been corrected. Incidentally,
this also corrects "git fetch --tags $URL" which was broken by the
new feature in an unspecified way.
* jc/set-head-symref-fix:
fetch: do not ask for HEAD unnecessarily
|
|
When "git fetch $remote" notices that refs/remotes/$remote/HEAD is
missing and discovers what branch the other side points with its
HEAD, refs/remotes/$remote/HEAD is updated to point to it.
* bf/set-head-symref:
fetch set_head: handle mirrored bare repositories
fetch: set remote/HEAD if it does not exist
refs: add create_only option to refs_update_symref_extended
refs: add TRANSACTION_CREATE_EXISTS error
remote set-head: better output for --auto
remote set-head: refactor for readability
refs: atomically record overwritten ref in update_symref
refs: standardize output of refs_read_symbolic_ref
t/t5505-remote: test failure of set-head
t/t5505-remote: set default branch to main
|
|
Stop using `the_repository` in the "serve" subsystem by passing in a
repository when advertising capabilities or serving requests.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop using `the_repository` in the "progress" subsystem by passing in a
repository when initializing `struct progress`. Furthermore, store a
pointer to the repository in that struct so that we can pass it to the
trace2 API when logging information.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
|
|
The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.
One limitation of the command was that it didn't support migrating
repositories which contained reflogs. A previous commit, added support
for adding reflog updates in ref transactions. Using the added
functionality bake in reflog support for `git refs migrate`.
To ensure that the order of the reflogs is maintained during the
migration, we add the index for each reflog update as we iterate over
the reflogs from the old reference backend. This is to ensure that the
order is maintained in the new backend.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `git log` command already offers support for including diffs for
merges, via the `--diff-merges=<format>` option.
Let's add corresponding support for `git range-diff`, too. This makes it
more convenient to spot differences between commit ranges that contain
merges.
This is especially true in scenarios with non-trivial merges, i.e.
merges introducing changes other than, or in addition to, what merge ORT
would have produced. Merging a topic branch that changes a function
signature into a branch that added a caller of that function, for
example, would require the merge commit itself to adjust that caller to
the modified signature.
In my code reviews, I found the `--diff-merges=remerge` option
particularly useful.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* js/log-remerge-keep-ancestry:
log: --remerge-diff needs to keep around commit parents
|
|
Build procedure update plus introduction of Meson based builds.
* ps/build: (24 commits)
Introduce support for the Meson build system
Documentation: add comparison of build systems
t: allow overriding build dir
t: better support for out-of-tree builds
Documentation: extract script to generate a list of mergetools
Documentation: teach "cmd-list.perl" about out-of-tree builds
Documentation: allow sourcing generated includes from separate dir
Makefile: simplify building of templates
Makefile: write absolute program path into bin-wrappers
Makefile: allow "bin-wrappers/" directory to exist
Makefile: refactor generators to be PWD-independent
Makefile: extract script to generate gitweb.js
Makefile: extract script to generate gitweb.cgi
Makefile: extract script to massage Python scripts
Makefile: extract script to massage Shell scripts
Makefile: use "generate-perl.sh" to massage Perl library
Makefile: extract script to massage Perl scripts
Makefile: consistently use PERL_PATH
Makefile: generate doc versions via GIT-VERSION-GEN
Makefile: generate "git.rc" via GIT-VERSION-GEN
...
|
|
The syntax ":/<text>" to name the latest commit with the matching
text was broken with a recent change, which has been corrected.
* ps/commit-with-message-syntax-fix:
object-name: fix reversed ordering with ":/<text>" revisions
|
|
Correct strvec_splice() that misbehaved when the strvec is empty.
* rj/strvec-splice-fix:
strvec: `strvec_splice()` to a statically initialized vector
|
|
The advice messages now tell the newer 'git config set' command to
set the advice.token configuration variable to squelch a message.
* bf/explicit-config-set-in-advice-messages:
advice: suggest using subcommand "git config set"
|
|
"git tag" has been taught to refuse to create refs/tags/HEAD
as such a tag will be confusing in the context of UI provided by
the Git Porcelain commands.
* jc/forbid-head-as-tagname:
tag: "git tag" refuses to use HEAD as a tagname
t5604: do not expect that HEAD can be a valid tagname
refs: drop strbuf_ prefix from helpers
refs: move ref name helpers around
|
|
"git describe" optimization.
* jk/describe-perf:
describe: split "found all tags" and max_candidates logic
describe: stop traversing when we run out of names
describe: stop digging for max_candidates+1
t/perf: add tests for git-describe
t6120: demonstrate weakness in disjoint-root handling
|
|
* kn/reftable-writer-log-write-verify:
reftable/writer: ensure valid range for log's update_index
|
|
Yet another "pass the repository through the callchain" topic.
* kn/midx-wo-the-repository:
midx: inline the `MIDX_MIN_SIZE` definition
midx: pass down `hash_algo` to functions using global variables
midx: pass `repository` to `load_multi_pack_index`
midx: cleanup internal usage of `the_repository` and `the_hash_algo`
midx-write: pass down repository to `write_midx_file[_only]`
write-midx: add repository field to `write_midx_context`
midx-write: use `revs->repo` inside `read_refs_snapshot`
midx-write: pass down repository to static functions
packfile.c: remove unnecessary prepare_packed_git() call
midx: add repository to `multi_pack_index` struct
config: make `packed_git_(limit|window_size)` non-global variables
config: make `delta_base_cache_limit` a non-global variable
packfile: pass down repository to `for_each_packed_object`
packfile: pass down repository to `has_object[_kept]_pack`
packfile: pass down repository to `odb_pack_name`
packfile: pass `repository` to static function in the file
packfile: use `repository` from `packed_git` directly
packfile: add repository to struct `packed_git`
|
|
Introduce a new repository extension to prevent older Git versions
from mis-interpreting worktrees created with relative paths.
* cw/worktree-extension:
worktree: refactor `repair_worktree_after_gitdir_move()`
worktree: add relative cli/config options to `repair` command
worktree: add relative cli/config options to `move` command
worktree: add relative cli/config options to `add` command
worktree: add `write_worktree_linking_files()` function
worktree: refactor infer_backlink return
worktree: add `relativeWorktrees` extension
setup: correctly reinitialize repository version
|
|
"git fast-import" learned to reject paths with ".." and "." as
their components to avoid creating invalid tree objects.
* en/fast-import-verify-path:
t9300: test verification of renamed paths
fast-import: disallow more path components
fast-import: disallow "." and ".." path components
|
|
"git bundle --unbundle" and "git clone" running on a bundle file
both learned to trigger fsck over the new objects with configurable
fck check levels.
* jt/bundle-fsck:
transport: propagate fsck configuration during bundle fetch
fetch-pack: split out fsck config parsing
bundle: support fsck message configuration
bundle: add bundle verification options type
|
|
To show a remerge diff, the merge needs to be recreated. For that to
work, the merge base(s) need to be found, which means that the commits'
parents have to be traversed until common ancestors are found (if any).
However, one optimization that hails all the way back to cb115748ec0d
(Some more memory leak avoidance, 2006-06-17) is to release the commit's
list of parents immediately after showing it _and to set that parent
list to `NULL`_. This can break the merge base computation.
This problem is most obvious when traversing the commits in reverse: In
that instance, if a parent of a merge commit has been shown as part of
the `git log` command, by the time the merge commit's diff needs to be
computed, that parent commit's list of parent commits will have been set
to `NULL` and as a result no merge base will be found (even if one
should be found).
Traversing commits in reverse is far from the only circumstance in which
this problem occurs, though. There are many avenues to traversing at
least one commit in the revision walk that will later be part of a merge
base computation, for example when not even walking any revisions in
`git show <merge1> <merge2>` where `<merge1>` is part of the commit
graph between the parents of `<merge2>`.
Another way to force a scenario where a commit is traversed before it
has to be traversed again as part of a merge base computation is to
start with two revisions (where the first one is reachable from the
second but not in a first-parent ancestry) and show the commit log with
`--topo-order` and `--first-parent`.
Let's fix this by special-casing the `remerge_diff` mode, similar to
what we did with reflogs in f35650dff6a4 (log: do not free parents when
walking reflog, 2017-07-07).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Our unit tests that don't yet use the clar unit testing framework ignore
any option that they do not understand. It is thus fine to just pass
test options we set up globally to those unit tests as they are simply
ignored. This makes our life easier because we don't have to special
case those options with Meson, where test options are set up globally
via `meson test --test-args=`.
But our clar-based unit testing framework is way stricter here and will
fail in case it is passed an unknown option. Stub out these options with
no-ops to make our life a bit easier.
Note that this also requires us to remove the `-x` short option for
`--exclude`. This is because `-x` has another meaning in our integration
tests, as it enables shell tracing. I doubt there are a lot of people
out there using it as we only got a small hand full of clar tests in the
first place. So better change it now so that we can in the long run
improve compatibility between the two different test drivers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Both t9835 and t9836 exercise git-p4, but one exercises Python 2 whereas
the other one uses Python 3. These tests do not exercise "git p4", but
instead they use "git p4.py". This calls the unbuilt version of
"git-p4.py" that still has the "#!/usr/bin/env python" shebang, which
allows the test to modify which Python version comes first in $PATH,
making it possible to force a Python version.
But "git-p4.py" is not in our PATH during out-of-tree builds, and thus
we cannot locate "git-p4.py". The tests thus break with CMake and Meson.
Fix this by instead manually setting up script wrappers that invoke the
respective Python interpreter directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the preceding commit, we have introduced consistency checks to Meson
to detect any discrepancies with missing or extraneous tests in its
build instructions. These checks only get executed in Meson though, so
any users of our Makefiles wouldn't be alerted of the fact that they
have to modify the Meson build instructions in case they add or remove
any tests.
Add a comparable test target to our Makefile to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is quite easy for the list of integration tests to go out-of-sync
without anybody noticing. Introduce a new configure-time check that
verifies that all tests are wired up properly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
All of the code files for unit tests using the self-grown unit testing
framework have a "t-" prefix to their name. This makes it easy to
identify them and use globbing in our Makefile and in other places. On
the other hand though, our clar-based unit tests have no prefix at all
and thus cannot easily be discerned from other files in the unit test
directory.
Introduce a new "u-" prefix for clar-based unit tests. This prefix will
be used in a subsequent commit to easily identify such tests.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The changes in commit c06793a4ed (allow git-bundle to create bottomless
bundle, 2007-08-08) ensure annotated tags are properly preserved when
creating a bundle using a revision range operation.
At the time the range notation would peel the ends to their
corresponding commit, meaning ref v2.0 would point to the v2.0^0 commit.
So the above workaround was introduced. This code looks up the ref
before it's written to the bundle, and if the ref doesn't point to the
object we expect (for tags this would be a tag object), we skip the ref
from the bundle. Instead, when the ref is a tag that's the positive end
of the range (e.g. v2.0 from the range "v1.0..v2.0"), then that ref is
written to the bundle instead.
Later, in 895c5ba3c1 (revision: do not peel tags used in range notation,
2013-09-19), the behavior of parsing ranges was changed and the problem
was fixed at the cause. But the workaround in bundle.c was not reverted.
Now it seems this workaround can cause a race condition. git-bundle(1)
uses setup_revisions() to parse the input into `struct rev_info`. Later,
in write_bundle_refs(), it uses this info to write refs to the bundle.
As mentioned at this point each ref is looked up again and checked
whether it points to the object we expect. If not, the ref is not
written to the bundle. But, when creating a bundle in a heavy traffic
repository (a repo with many references, and frequent ref updates) it's
possible a branch ref was updated between setup_revisions() and
write_bundle_refs() and thus the extra check causes the ref to be
skipped.
The workaround was originally added to deal with tags, but the code path
also gets hit by non-tag refs, causing this race condition. Because it's
no longer needed, remove it and fix the possible race condition.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/build: (24 commits)
Introduce support for the Meson build system
Documentation: add comparison of build systems
t: allow overriding build dir
t: better support for out-of-tree builds
Documentation: extract script to generate a list of mergetools
Documentation: teach "cmd-list.perl" about out-of-tree builds
Documentation: allow sourcing generated includes from separate dir
Makefile: simplify building of templates
Makefile: write absolute program path into bin-wrappers
Makefile: allow "bin-wrappers/" directory to exist
Makefile: refactor generators to be PWD-independent
Makefile: extract script to generate gitweb.js
Makefile: extract script to generate gitweb.cgi
Makefile: extract script to massage Python scripts
Makefile: extract script to massage Shell scripts
Makefile: use "generate-perl.sh" to massage Perl library
Makefile: extract script to massage Perl scripts
Makefile: consistently use PERL_PATH
Makefile: generate doc versions via GIT-VERSION-GEN
Makefile: generate "git.rc" via GIT-VERSION-GEN
...
|
|
* ps/build: (24 commits)
Introduce support for the Meson build system
Documentation: add comparison of build systems
t: allow overriding build dir
t: better support for out-of-tree builds
Documentation: extract script to generate a list of mergetools
Documentation: teach "cmd-list.perl" about out-of-tree builds
Documentation: allow sourcing generated includes from separate dir
Makefile: simplify building of templates
Makefile: write absolute program path into bin-wrappers
Makefile: allow "bin-wrappers/" directory to exist
Makefile: refactor generators to be PWD-independent
Makefile: extract script to generate gitweb.js
Makefile: extract script to generate gitweb.cgi
Makefile: extract script to massage Python scripts
Makefile: extract script to massage Shell scripts
Makefile: use "generate-perl.sh" to massage Perl library
Makefile: extract script to massage Perl scripts
Makefile: consistently use PERL_PATH
Makefile: generate doc versions via GIT-VERSION-GEN
Makefile: generate "git.rc" via GIT-VERSION-GEN
...
|
|
* cw/worktree-extension:
worktree: refactor `repair_worktree_after_gitdir_move()`
worktree: add relative cli/config options to `repair` command
worktree: add relative cli/config options to `move` command
worktree: add relative cli/config options to `add` command
worktree: add `write_worktree_linking_files()` function
worktree: refactor infer_backlink return
worktree: add `relativeWorktrees` extension
setup: correctly reinitialize repository version
|
|
Optimize reading random references out of the reftable backend by
allowing reuse of iterator objects.
* ps/reftable-iterator-reuse:
refs/reftable: reuse iterators when reading refs
reftable/merged: drain priority queue on reseek
reftable/stack: add mechanism to notify callers on reload
refs/reftable: refactor reflog expiry to use reftable backend
refs/reftable: refactor reading symbolic refs to use reftable backend
refs/reftable: read references via `struct reftable_backend`
refs/reftable: figure out hash via `reftable_stack`
reftable/stack: add accessor for the hash ID
refs/reftable: handle reloading stacks in the reftable backend
refs/reftable: encapsulate reftable stack
|
|
Isolates the reftable subsystem from the rest of Git's codebase by
using fewer pieces of Git's infrastructure.
* ps/reftable-detach:
reftable/system: provide thin wrapper for lockfile subsystem
reftable/stack: drop only use of `get_locked_file_path()`
reftable/system: provide thin wrapper for tempfile subsystem
reftable/stack: stop using `fsync_component()` directly
reftable/system: stop depending on "hash.h"
reftable: explicitly handle hash format IDs
reftable/system: move "dir.h" to its only user
|
|
Loosen overly strict ownership check introduced in the recent past,
to keep the promise "cloning a suspicious repository is a safe
first step to inspect it".
* bc/allow-upload-pack-from-other-people:
Allow cloning from repositories owned by another user
|
|
End-user experience of "git mergetool" when the command errors out
has been improved.
* pb/mergetool-errors:
git-difftool--helper.sh: exit upon initialize_merge_tool errors
git-mergetool--lib.sh: add error message for unknown tool variant
git-mergetool--lib.sh: add error message if 'setup_user_tool' fails
git-mergetool--lib.sh: use TOOL_MODE when erroring about unknown tool
completion: complete '--tool-help' in 'git mergetool'
|
|
We use a singleton empty array to initialize a `struct strvec`;
similar to the empty string singleton we use to initialize a `struct
strbuf`.
Note that an empty strvec instance (with zero elements) does not
necessarily need to be an instance initialized with the singleton.
Let's refer to strvec instances initialized with the singleton as
"empty-singleton" instances.
As a side note, this is the current `strvec_pop()`:
void strvec_pop(struct strvec *array)
{
if (!array->nr)
return;
free((char *)array->v[array->nr - 1]);
array->v[array->nr - 1] = NULL;
array->nr--;
}
So, with `strvec_pop()` an instance can become empty but it does
not going to be the an "empty-singleton".
This "empty-singleton" circumstance requires us to be careful when
adding elements to instances. Specifically, when adding the first
element: when we detach the strvec instance from the singleton and
set the internal pointer in the instance to NULL. After this point we
apply `realloc()` on the pointer. We do this in
`strvec_push_nodup()`, for example.
The recently introduced `strvec_splice()` API is expected to be
normally used with non-empty strvec's. However, it can also end up
being used with "empty-singleton" strvec's:
struct strvec arr = STRVEC_INIT;
int a = 0, b = 0;
... no modification to arr, a or b ...
const char *rep[] = { "foo" };
strvec_splice(&arr, a, b, rep, ARRAY_SIZE(rep));
So, we'll try to add elements to an "empty-singleton" strvec instance.
Avoid misapplying `realloc()` to the singleton in `strvec_splice()` by
adding a special case for strvec's initialized with the singleton.
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Recently it was reported [1] that "look for the youngest commit
reachable from any ref with log message that match the given
pattern" syntax (i.e. ':/<text>') started to return results in
reverse recency order. This regression was introduced in Git v2.47.0
and is caused by a memory leak fix done in 57fb139b5e (object-name:
fix leaking commit list items, 2024-08-01).
The intent of the identified commit is to stop modifying the commit list
provided by the caller such that the caller can properly free all commit
list items, including those that the called function might potentially
remove from the list. This was done by creating a copy of the passed-in
commit list and modifying this copy instead of the caller-provided list.
We already knew to create such a copy beforehand with the `backup` list,
which was used to clear the `ONELINE_SEEN` commit mark after we were
done. So the refactoring simply renamed that list to `copy` and started
to operate on that list instead. There is a gotcha though: the backup
list, and thus now also the copied list, is always being prepended to,
so the resulting list is in reverse order! The end result is that we
pop commits from the wrong end of the commit list, returning commits in
reverse recency order.
Fix the bug by appending to the list instead.
[1]: <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com>
Reported-by: Aarni Koskela <aarni@valohai.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 3f763ddf28 (fetch: set remote/HEAD if it does not exist,
2024-11-22), git-fetch learned to opportunistically set $REMOTE/HEAD
when fetching by always asking for remote HEAD, in the hope that it
will help setting refs/remotes/<name>/HEAD if missing.
But it is not needed to always ask for remote HEAD. When we are
fetching from a remote, for which we have remote-tracking branches,
we do need to know about HEAD. But if we are doing one-shot fetch,
e.g.,
$ git fetch --tags https://github.com/git/git
we do not even know what sub-hierarchy of refs/remotes/<remote>/
we need to adjust the remote HEAD for. There is no need to ask for
HEAD in such a case.
Incidentally, because the unconditional request to list "HEAD"
affected the number of ref-prefixes requested in the ls-remote
request, this affected how the requests for tags are added to the
same ls-remote request, breaking "git fetch --tags $URL" performed
against a URL that is not configured as a remote.
Reported-by: Josh Steadmon <steadmon@google.com>
[jc: tests are also borrowed from Josh's patch]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each reftable addition has an associated update_index. While writing
refs, the update_index is verified to be within the range of the
reftable writer, i.e. `writer.min_update_index <= ref.update_index` and
`writer.max_update_index => ref.update_index`.
The corresponding check for reflogs in `reftable_writer_add_log` is
however missing. Add a similar check, but only check for the upper
limit. This is because reflogs are treated a bit differently than refs.
Each reflog entry in reftable has an associated update_index and we also
allow expiring entries in the middle, which is done by simply writing a
new reflog entry with the same update_index. This means, writing reflog
entries with update_index lesser than the writer's update_index is an
expected scenario.
Add a new unit test to check for the limits and fix some of the existing
tests, which were setting arbitrary values for the update_index by
ensuring they stay within the now checked limits.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce support for the Meson build system, a "modern" meta build
system that supports many different platforms, including Linux, macOS,
Windows and BSDs. Meson supports different backends, including Ninja,
Xcode and Microsoft Visual Studio. Several common IDEs provide an
integration with it.
The biggest contender compared to Meson is probably CMake as outlined in
our "Documentation/technical/build-systems.txt" file. Based on my own
personal experience from working with both build systems extensively I
strongly favor Meson over CMake. In my opinion, it feels significantly
easier to use with a syntax that feels more like a "real" programming
language. The second big reason is that Meson supports Rust natively,
which may prove to be important given that the project may pick up Rust
as another language eventually.
Using Meson is rather straight-forward. An example:
```
# Meson uses out-of-tree builds. You can set up multiple build
# directories, how you name them is completely up to you.
$ mkdir build
$ cd build
$ meson setup .. -Dprefix=/tmp/git-installation
# Build the project. This also provides several other targets like
e.g. `install` or `test`.
$ ninja
# Meson has been wired up to support execution of our test suites.
# Both our unit tests and our integration tests are supported.
# Running `meson test` without any arguments will execute all tests,
# but the syntax supports globbing to select only some tests.
$ meson test 't-*'
# Execute single test interactively to allow for debugging.
$ meson test 't0000-*' --interactive --test-args=-ix
```
The build instructions have been successfully tested on the following
systems, tests are passing:
- Apple macOS 10.15.
- FreeBSD 14.1.
- NixOS 24.11.
- OpenBSD 7.6.
- Ubuntu 24.04.
- Windows 10 with Cygwin.
- Windows 10 with MinGW64, except for t9700, which is also broken with
our Makefile.
- Windows 10 with Visual Studio 2022 toolchain, using the Native Tools
Command Prompt with `meson setup --vsenv`. Tests pass, except for
t9700.
- Windows 10 with Visual Studio 2022 solution, using the Native Tools
Command Prompt with `meson setup --backend vs2022`. Tests pass,
except for t9700.
- Windows 10 with VS Code, using the Meson plug-in.
It is expected that there will still be rough edges in the current
version. If this patch lands the expectation is that it will coexist
with our other build systems for a while. Like this, distributions can
slowly migrate over to Meson and report any findings they have to us
such that we can continue to iterate. A potential cutoff date for other
build systems may be Git 3.0.
Some notes:
- The installed distribution is structured somewhat differently than
how it used to be the case. All of our binaries are installed into
`$libexec/git-core`, while all binaries part of `$bindir` are now
symbolic links pointing to the former. This rule is consistent in
itself and thus easier to reason about.
- We do not install dashed binaries into `$libexec/git-core` anymore,
so there won't e.g. be a symlink for git-add(1). These are not
required by modern Git and there isn't really much of a use case for
those anymore. By not installing those symlinks we thus start the
deprecation of this layout.
- We're targeting Meson 1.3.0, which has been released relatively
recently November 2023. The only feature we use from that version is
`fs.relative_to()`, which we could replace if necessary. If so, we
could start to target Meson 1.0.0 and newer, released in December
2022.
- The whole build instructions count around 3300 lines, half of which
is listing all of our code and test files. Our Makefiles are around
5000 lines, autoconf adds another 1300 lines. CMake in comparison
has only 1200 linescode, but it avoids listing individual files and
does not wire up auto-configuration as extensively as the Meson
instructions do.
- We bundle a set of subproject wrappers for curl, expat, openssl,
pcre2 and zlib. This allows developers to build Git without these
dependencies preinstalled, and Meson will fetch and build them
automatically. This is especially helpful on Windows.
Helped-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Our "test-lib.sh" assumes that our build directory is the parent
directory of "t/". While true when using our Makefile, it's not when
using build systems that support out-of-tree builds.
In commit ee9e66e4e7 (cmake: avoid editing t/test-lib.sh, 2022-10-18),
we have introduce support for overriding the GIT_BUILD_DIR by creating
the file "$GIT_BUILD_DIR/GIT-BUILD-DIR" with its contents pointing to
the location of the build directory. The intent was to stop modifying
"t/test-lib.sh" with the CMake build systems while allowing out-of-tree
builds. But "$GIT_BUILD_DIR" is somewhat misleadingly named, as it in
fact points to the _source_ directory. So while that commit solved part
of the problem for out-of-tree builds, CMake still has to write files
into the source tree.
Solve the second part of the problem, namely not having to write any
data into the source directory at all, by also supporting an environment
variable that allows us to point to a different build directory. This
allows us to perform properly self-contained out-of-tree builds.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|