| Age | Commit message (Collapse) | Author | Files | Lines |
|
A part of code paths that deals with loose objects has been cleaned
up.
* ps/object-source-loose:
object-file: refactor writing objects via a stream
object-file: rename `write_object_file()`
object-file: refactor freshening of objects
object-file: rename `has_loose_object()`
object-file: read objects via the loose object source
object-file: move loose object map into loose source
object-file: hide internals when we need to reprepare loose sources
object-file: move loose object cache into loose source
object-file: introduce `struct odb_source_loose`
object-file: move `fetch_if_missing`
odb: adjust naming to free object sources
odb: introduce `odb_source_new()`
odb: fix subtle logic to check whether an alternate is usable
|
|
Some ref backend storage can hold not just the object name of an
annotated tag, but the object name of the object the tag points at.
The code to handle this information has been streamlined.
* ps/ref-peeled-tags:
t7004: do not chdir around in the main process
ref-filter: fix stale parsed objects
ref-filter: parse objects on demand
ref-filter: detect broken tags when dereferencing them
refs: don't store peeled object IDs for invalid tags
object: add flag to `peel_object()` to verify object type
refs: drop infrastructure to peel via iterators
refs: drop `current_ref_iter` hack
builtin/show-ref: convert to use `reference_get_peeled_oid()`
ref-filter: propagate peeled object ID
upload-pack: convert to use `reference_get_peeled_oid()`
refs: expose peeled object ID via the iterator
refs: refactor reference status flags
refs: fully reset `struct ref_iterator::ref` on iteration
refs: introduce `.ref` field for the base iterator
refs: introduce wrapper struct for `each_ref_fn`
|
|
The `each_ref_fn` callback function type is used across our code base
for several different functions that iterate through reference. There's
a bunch of callbacks implementing this type, which makes any changes to
the callback signature extremely noisy. An example of the required churn
is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a
single argument required us to change 48 files.
It was already proposed back then [1] that we might want to introduce a
wrapper structure to alleviate the pain going forward. While this of
course requires the same kind of global refactoring as just introducing
a new parameter, it at least allows us to more change the callback type
afterwards by just extending the wrapper structure.
One counterargument to this refactoring is that it makes the structure
more opaque. While it is obvious which callsites need to be fixed up
when we change the function type, it's not obvious anymore once we use
a structure. That being said, we only have a handful of sites that
actually need to populate this wrapper structure: our ref backends,
"refs/iterator.c" as well as very few sites that invoke the iterator
callback functions directly.
Introduce this wrapper structure so that we can adapt the iterator
interfaces more readily.
[1]: <ZmarVcF5JjsZx0dl@tanuki>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Our loose objects use a cache that (optionally) stores all objects for
each of the opened sharding directories. This cache is located in the
`struct odb_source`, but now that we have `struct odb_source_loose` it
makes sense to move it into the latter structure so that all state that
relates to loose objects is entirely self-contained.
Do so. While at it, rename corresponding functions to have a prefix that
relates to `struct odb_source_loose`.
Note that despite this prefix, the functions still accept a `struct
odb_source` as input. This is done intentionally: once we introduce
pluggable object databases, we will continue to accept this struct but
then do a cast inside these functions to `struct odb_source_loose`. This
design is similar to how we do it for our ref backends.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have a bunch of different sites that want to iterate through all
packs of a given `struct packfile_store`. This pattern is somewhat
verbose and repetitive, which makes it somewhat cumbersome.
Introduce a new macro `repo_for_each_pack()` that removes some of the
boilerplate.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When searching for abbreviated or when trying to disambiguate object IDs
we do this in two steps:
1. We search through the multi-pack index.
2. We search through all packfiles not part of any multi-pack index.
The second step uses `packfile_store_get_packs()`, which knows to skip
loading any packfiles that are indexed by an MIDX; this is exactly what
we want.
But that function is somewhat problematic, as its behaviour is stateful
and is influenced by `packfile_store_get_all_packs()`. This function
basically does the same as `packfile_store_get_packs()`, but in addition
it also loads all packfiles indexed by an MIDX. The problem here is that
both of these functions act on the same linked list of packfiles, and
thus depending on whether or not `get_all_packs()` was called the result
returned by `get_packs()` will be different. Consequently, all callers
of `get_packs()` need to be prepared to see MIDX'd packs even though
these should in theory be excluded.
This interface is confusing and thus potentially dangerous, which is why
we're converting all callers of `get_packs()` to use `get_all_packs()`
instead.
Do so for the above functions in "object-name.c". As explained, we
already know to skip any MIDX'd packs in both `find_abbrev_len_packed()`
and `find_short_packed_object()`, so it's fine to start loading MIDX'd
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up around the in-core list of all the pack files and
object database(s).
* ps/packfile-store:
packfile: refactor `get_packed_git_mru()` to work on packfile store
packfile: refactor `get_all_packs()` to work on packfile store
packfile: refactor `get_packed_git()` to work on packfile store
packfile: move `get_multi_pack_index()` into "midx.c"
packfile: introduce function to load and add packfiles
packfile: refactor `install_packed_git()` to work on packfile store
packfile: split up responsibilities of `reprepare_packed_git()`
packfile: refactor `prepare_packed_git()` to work on packfile store
packfile: reorder functions to avoid function declaration
odb: move kept cache into `struct packfile_store`
odb: move MRU list of packfiles into `struct packfile_store`
odb: move packfile map into `struct packfile_store`
odb: move initialization bit into `struct packfile_store`
odb: move list of packfiles into `struct packfile_store`
packfile: introduce a new `struct packfile_store`
|
|
* ps/packfile-store:
packfile: refactor `get_packed_git_mru()` to work on packfile store
packfile: refactor `get_all_packs()` to work on packfile store
packfile: refactor `get_packed_git()` to work on packfile store
packfile: move `get_multi_pack_index()` into "midx.c"
packfile: introduce function to load and add packfiles
packfile: refactor `install_packed_git()` to work on packfile store
packfile: split up responsibilities of `reprepare_packed_git()`
packfile: refactor `prepare_packed_git()` to work on packfile store
packfile: reorder functions to avoid function declaration
odb: move kept cache into `struct packfile_store`
odb: move MRU list of packfiles into `struct packfile_store`
odb: move packfile map into `struct packfile_store`
odb: move initialization bit into `struct packfile_store`
odb: move list of packfiles into `struct packfile_store`
packfile: introduce a new `struct packfile_store`
|
|
The `get_packed_git()` function prepares the packfile store and then
returns its packfiles. Refactor it to accept a packfile store instead of
a repository to clarify its scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `reprepare_packed_git()` we perform a couple of operations:
- We reload alternate object directories.
- We clear the loose object cache.
- We reprepare packfiles.
While the logic is hosted in "packfile.c", it clearly reaches into other
subsystems that aren't related to packfiles.
Split up the responsibility and introduce `odb_reprepare()` which now
becomes responsible for repreparing the whole object database. The
existing `reprepare_packed_git()` function is refactored accordingly and
only cares about reloading the packfile store now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* rs/get-oid-with-flags-cleanup:
use repo_get_oid_with_flags()
|
|
"git rev-parse --short" and friends failed to disambiguate two
objects with object names that share common prefix longer than 32
characters, which has been fixed.
* jc/longer-disambiguation-fix:
abbrev: allow extending beyond 32 chars to disambiguate
|
|
Code clean-up.
* rs/object-name-extend-abbrev-len-update:
object-name: declare pointer type of extend_abbrev_len()'s 2nd parameter
|
|
get_oid_with_context() allows specifying flags and reports object
details via a passed-in struct object_context. Some callers just want
to specify flags, but don't need any details back. Convert them to
repo_get_oid_with_flags(), which provides just that and frees them from
dealing with the context structure.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Expose the expected type of the second parameter of extend_abbrev_len()
instead of casting a void pointer internally. Just a single caller
passes in a void pointer, the rest pass the correct type. Let the
compiler help keeping it that way.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git remote rename origin upstream" failed to move origin/HEAD to
upstream/HEAD when origin/HEAD is unborn and performed other
renames extremely inefficiently, which has been corrected.
* ps/remote-rename-fix:
builtin/remote: only iterate through refs that are to be renamed
builtin/remote: rework how remote refs get renamed
builtin/remote: determine whether refs need renaming early on
builtin/remote: fix sign comparison warnings
refs: simplify logic when migrating reflog entries
refs: pass refname when invoking reflog entry callback
|
|
When you have two or more objects with object names that share more
than 32 letters in an SHA-1 repository, find_unique_abbrev() fails
to show disambiguation.
To see how many leading letters of a given full object name is
sufficiently unambiguous, the algorithm starts from a initial
length, guessed based on the estimated number of objects in the
repository, and see if another object that shares the prefix, and
keeps extending the abbreviation. The loop stops at GIT_MAX_RAWSZ,
which is counted as the number of bytes, since 5b20ace6 (sha1_name:
unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that
change, it extended up to GIT_SHA1_HEXSZ, which meant to stop at the
end of hexadecimal SHA-1 object name.
Because the hexadecimal object name passed to the function is
NUL-terminated, and this fact is used to correctly terminate the
loop that scans for the first difference earlier in the function,
use it to make sure we do not increment the .cur_len member beyond
the end of the string.
Noticed-by: Jon Forrest <nobozo@gmail.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With `refs_for_each_reflog_ent()` callers can iterate through all the
reflog entries for a given reference. The callback that is being invoked
for each such entry does not receive the name of the reference that we
are currently iterating through. This isn't really a limiting factor, as
callers can simply pass the name via the callback data.
But this layout sometimes does make for a bit of an awkward calling
pattern. One example: when iterating through all reflogs, and for each
reflog we iterate through all refnames, we have to do some extra book
keeping to track which reference name we are currently yielding reflog
entries for.
Change the signature of the callback function so that the reference name
of the reflog gets passed through to it. Adapt callers accordingly and
start using the new parameter in trivial cases. The next commit will
refactor the reference migration logic to make use of this parameter so
that we can simplify its logic a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Redefine where the multi-pack-index sits in the object subsystem,
which recently was restructured to allow multiple backends that
support a single object source that belongs to one repository. A
midx does span mulitple "object sources".
* ps/object-store-midx:
midx: remove now-unused linked list of multi-pack indices
packfile: stop using linked MIDX list in `get_all_packs()`
packfile: stop using linked MIDX list in `find_pack_entry()`
packfile: refactor `get_multi_pack_index()` to work on sources
midx: stop using linked list when closing MIDX
packfile: refactor `prepare_packed_git_one()` to work on sources
midx: start tracking per object database source
|
|
The pop_most_recent_commit() function can have quite expensive
worst case performance characteristics, which has been optimized by
using prio-queue data structure.
* rs/pop-recent-commit-with-prio-queue:
commit: use prio_queue_replace() in pop_most_recent_commit()
prio-queue: add prio_queue_replace()
commit: convert pop_most_recent_commit() to prio_queue
|
|
pop_most_recent_commit() calls commit_list_insert_by_date() for parent
commits, which is itself called in a loop. This can lead to quadratic
complexity if there are many merges. Replace the commit_list with a
prio_queue to ensure logarithmic worst case complexity and convert all
three users.
Add a performance test that exercises one of them using a pathological
history that consists of 50% merges and 50% root commits to demonstrate
the speedup:
Test v2.50.1 HEAD
----------------------------------------------------------------------
1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9%
Alas, sane histories don't benefit from the conversion much, and
traversing Git's own history takes a 1% performance hit on my machine:
$ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision'
Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s]
Range (min … max): 1.067 s … 1.078 s 10 runs
Benchmark 2: ./git rev-parse :/^Initial.revision
Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s]
Range (min … max): 1.074 s … 1.083 s 10 runs
Summary
./git_2.50.1 rev-parse :/^Initial.revision ran
1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up around object access API.
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
|
|
The function `get_multi_pack_index()` loads multi-pack indices via
`prepare_packed_git()` and then returns the linked list of multi-pack
indices that is stored in `struct object_database`. That list is in the
process of being removed though in favor of storing the MIDX as part of
the object database source it belongs to.
Refactor `get_multi_pack_index()` so that it returns the multi-pack
index for a single object source. Callers are now expected to call this
function for each source they are interested in. This requires them to
iterate through alternates, so we have to prepare alternate object
sources before doing so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
|
|
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The functions to manage alternates all depend on `the_repository`.
Refactor them to accept an object database as a parameter and adjust all
callers. The functions are renamed accordingly.
Note that right now the situation is still somewhat weird because we end
up using the object store path provided by the object store's repository
anyway. Consequently, we could have instead passed in a pointer to the
repository instead of passing in the pointer to the object store. This
will be addressed in subsequent commits though, where we will start to
use the path owned by the object store itself.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.
While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.
As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.
Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:
- `odb_backend` was discarded because it led to the association that
one object database has a single backend, but the model is that one
alternate has one backend. Furthermore, "backend" is more about the
actual backing implementation and less about the high-level concept.
- `odb_alternate` was discarded because it is a bit of a stretch to
also call the main object directory an "alternate".
Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.
In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
An interchange format for stash entries is defined, and subcommand
of "git stash" to import/export has been added.
* bc/stash-export-import:
builtin/stash: provide a way to import stashes from a ref
builtin/stash: provide a way to export stashes to a ref
builtin/stash: factor out revision parsing into a function
object-name: make get_oid quietly return an error
|
|
A reasonable person looking at the signature and usage of get_oid and
friends might conclude that in the event of an error, it always returns
-1. However, this is not the case. Instead, get_oid_basic dies if we
go too far back into the history of a reflog (or, when quiet, simply
exits).
This is not especially useful, since in many cases, we might want to
handle this error differently. Let's add a flag here to make it just
return -1 like elsewhere in these code paths.
Note that we cannot make this behavior the default, since we have many
other codepaths that rely on the existing behavior, including in tests.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We carry declarations for a couple of functions in "object-store.h" that
are not defined in "object-store.c", but in a different subsystem. Move
these declarations to the respective headers whose matching code files
carry the corresponding definition.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.
* ps/refname-avail-check-optim:
refs: reuse iterators when determining refname availability
refs/iterator: implement seeking for files iterators
refs/iterator: implement seeking for packed-ref iterators
refs/iterator: implement seeking for ref-cache iterators
refs/iterator: implement seeking for reftable iterators
refs/iterator: implement seeking for merged iterators
refs/iterator: provide infrastructure to re-seek iterators
refs/iterator: separate lifecycle from iteration
refs: stop re-verifying common prefixes for availability
refs/files: batch refname availability checks for initial transactions
refs/files: batch refname availability checks for normal transactions
refs/reftable: batch refname availability checks
refs: introduce function to batch refname availability checks
builtin/update-ref: skip ambiguity checks when parsing object IDs
object-name: allow skipping ambiguity checks in `get_oid()` family
object-name: introduce `repo_get_oid_with_flags()`
|
|
When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.
Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.
This function will be used in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
From Documentation/revisions.txt:
'<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
Output from `git describe`; i.e. a closest tag, optionally
followed by a dash and a number of commits, followed by a dash, a
'g', and an abbreviated object name.
which means that output of the format
${REFNAME}-${INTEGER}-g${HASH}
should parse to fully expanded ${HASH}. This is fine. However, we
currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
-g${HASH} and assume the rest is valid. That is problematic, since it
breaks things like
git cat-file -p branchname:path/to/file/named/i-gaffed
which, when commit (or tree or blob) affed exists, will not return us
information about the file we are looking for but will instead
erroneously tell us about object affed.
A few additional notes:
- This is a slight backward incompatibility break, because we used
to allow ${GARBAGE}-g${HASH} as a way to spell ${HASH}. However,
a backward incompatible break is necessary, because there is no
other way for someone to be more specific and disambiguate that they
want the blob master:path/to/who-gabbed instead of the object abbed.
- There is a possibility that check_refname_format() rules change in
the future. However, we can only realistically loosen the rules
for what that function accepts rather than tighten. If we were to
tighten the rules, some real world repositories may already have
refnames that suddenly become unacceptable and we break those
repositories. As such, any describe-like syntax of the form
${VALID_FOR_A_REFNAME}-${INTEGER}-g${HASH} that is valid with the
changes in this commit will remain valid in the future.
- The fact that check_refname_format() rules could loosen in the
future is probably also an important reason to make this change. If
the rules loosen, there might be additional cases within
${GARBAGE}-g${HASH} that become ambiguous in the future. While
abbreviated hashes can be disambiguated by abbreviating less, it may
well be that these alternative object names have no way of being
disambiguated (much like pathnames cannot be). Accepting all random
${GARBAGE} thus makes it difficult for us to allow future
extensions to object naming.
So, tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
present in the string, and would be considered a valid ref and
non-negative integer.
Also, add a few tests for git describe using object names of the form
${REVISION_NAME}${MODIFIERS}
since an early version of this patch failed on constructs like
git describe v2.48.0-rc2-161-g6c2274cdbc^0
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Given a branch name of 'foo{bar', commands like
git cat-file -p foo{bar:README.md
should succeed (assuming that branch had a README.md file, of course).
However, the change in cce91a2caef9 (Change 'master@noon' syntax to
'master@{noon}'., 2006-05-19) presumed that curly braces would always
come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
to entirely miss the ':' and assume there's no object being referenced.
In short, git would report:
fatal: Not a valid object name foo{bar:README.md
Change the parsing to only make the assumption of paired curly braces
immediately after either a '@' or '^' character appears.
Add tests for this, as well as for a few other test cases that initial
versions of this patch broke:
* 'foo@@{...}'
* 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'
Note that we'd prefer not duplicating the special logic for "@^" characters
here, because if get_oid_basic() or interpret_nth_prior_checkout() or
get_oid_basic() or similar gain extra methods of using curly braces,
then the logic in get_oid_with_context_1() would need to be updated as
well. But it's not clear how to refactor all of these to have a simple
common callpoint with the specialized logic.
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Helped-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
|
|
The syntax ":/<text>" to name the latest commit with the matching
text was broken with a recent change, which has been corrected.
* ps/commit-with-message-syntax-fix:
object-name: fix reversed ordering with ":/<text>" revisions
|
|
The advice messages now tell the newer 'git config set' command to
set the advice.token configuration variable to squelch a message.
* bf/explicit-config-set-in-advice-messages:
advice: suggest using subcommand "git config set"
|
|
Recently it was reported [1] that "look for the youngest commit
reachable from any ref with log message that match the given
pattern" syntax (i.e. ':/<text>') started to return results in
reverse recency order. This regression was introduced in Git v2.47.0
and is caused by a memory leak fix done in 57fb139b5e (object-name:
fix leaking commit list items, 2024-08-01).
The intent of the identified commit is to stop modifying the commit list
provided by the caller such that the caller can properly free all commit
list items, including those that the called function might potentially
remove from the list. This was done by creating a copy of the passed-in
commit list and modifying this copy instead of the caller-provided list.
We already knew to create such a copy beforehand with the `backup` list,
which was used to clear the `ONELINE_SEEN` commit mark after we were
done. So the refactoring simply renamed that list to `copy` and started
to operate on that list instead. There is a gotcha though: the backup
list, and thus now also the copied list, is always being prepended to,
so the resulting list is in reverse order! The end result is that we
pop commits from the wrong end of the commit list, returning commits in
reverse recency order.
Fix the bug by appending to the list instead.
[1]: <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com>
Reported-by: Aarni Koskela <aarni@valohai.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The advice message currently suggests using "git config advice..." to
disable advice messages, but since
00bbdde141 (builtin/config: introduce "set" subcommand, 2024-05-06)
we have the "set" subcommand for config. Since using the subcommand is
more in-line with the modern interface, any advice should be promoting
its usage. Change the disable advice message to use the subcommand
instead. Change all uses of "git config advice" in the tests to use the
subcommand.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
strbuf_branchname(), strbuf_check_{branch,tag}_ref() are helper
functions to deal with branch and tag names, and the fact that they
happen to use strbuf to hold the name of a branch or a tag is not
essential. These functions fit better in the refs API than strbuf
API, the latter of which is about string manipulations.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Same as the preceding commits, storing the "core.warnAmbiguousRefs"
value globally is misdesigned as this setting may be set per repository.
Move the logic into the repo-settings subsystem. The usual pattern here
is that users are expected to call `prepare_repo_settings()` before they
access the settings themselves. This seems somewhat fragile though, as
it is easy to miss and leads to somewhat ugly code patterns at the call
sites.
Instead, introduce a new function that encapsulates this logic for us.
This also allows us to change how exactly the lazy initialization works
in the future, e.g. by only partially initializing values as requested
by the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More leak fixes.
* ps/leakfixes-part-4: (22 commits)
builtin/diff: free symmetric diff members
diff: free state populated via options
builtin/log: fix leak when showing converted blob contents
userdiff: fix leaking memory for configured diff drivers
builtin/format-patch: fix various trivial memory leaks
diff: fix leak when parsing invalid ignore regex option
unpack-trees: clear index when not propagating it
sequencer: release todo list on error paths
merge-ort: unconditionally release attributes index
builtin/fast-export: plug leaking tag names
builtin/fast-export: fix leaking diff options
builtin/fast-import: plug trivial memory leaks
builtin/notes: fix leaking `struct notes_tree` when merging notes
builtin/rebase: fix leaking `commit.gpgsign` value
config: fix leaking comment character config
submodule-config: fix leaking name entry when traversing submodules
read-cache: fix leaking hashfile when writing index fails
bulk-checkin: fix leaking state TODO
object-name: fix leaking symlink paths in object context
object-file: fix memory leak when reading corrupted headers
...
|
|
Incremental updates of multi-pack index files.
* tb/incremental-midx-part-1:
midx: implement support for writing incremental MIDX chains
t/t5313-pack-bounds-checks.sh: prepare for sub-directories
t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
midx: implement verification support for incremental MIDXs
midx: support reading incremental MIDX chains
midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
midx: teach `midx_preferred_pack()` about incremental MIDXs
midx: teach `midx_contains_pack()` about incremental MIDXs
midx: remove unused `midx_locate_pack()`
midx: teach `fill_midx_entry()` about incremental MIDXs
midx: teach `nth_midxed_offset()` about incremental MIDXs
midx: teach `bsearch_midx()` about incremental MIDXs
midx: introduce `bsearch_one_midx()`
midx: teach `nth_bitmapped_pack()` about incremental MIDXs
midx: teach `nth_midxed_object_oid()` about incremental MIDXs
midx: teach `prepare_midx_pack()` about incremental MIDXs
midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
midx: add new fields for incremental MIDX chains
Documentation: describe incremental MIDX format
|
|
The refs API has been taught to give symref target information to
the users of ref iterators, allowing for-each-ref and friends to
avoid an extra ref_resolve_* API call per a symbolic ref.
* jc/refs-symref-referent:
ref-filter: populate symref from iterator
refs: add referent to each_ref_fn
refs: keep track of unresolved reference value in iterators
|
|
The object context may be populated with symlink contents when reading a
symlink, but the associated strbuf doesn't ever get released when
releasing the object context, causing a memory leak. Plug it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `bsearch_midx()` function will be extended in a following commit to
search for the location of a given object ID across all MIDXs in a chain
(or the single non-chain MIDX if no chain is available).
While most callers will naturally want to use the updated
`bsearch_midx()` function, there are a handful of special cases that
will want finer control and will only want to search through a single
MIDX.
For instance, the object abbreviation code, which cares about object IDs
near to where we'd expect to find a match in a MIDX. In that case, we
want to look at the nearby matches in each layer of the MIDX chain, not
just a single one).
Split the more fine-grained control out into a separate function called
`bsearch_one_midx()` which searches only a single MIDX.
At present both `bsearch_midx()` and `bsearch_one_midx()` have identical
behavior, but the following commit will rewrite the former to be aware
of incremental MIDXs for the remaining non-special case callers.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When calling `get_oid_oneline()`, we pass in a `struct commit_list` that
gets modified by the function. This creates a weird situation where the
commit list may sometimes be empty after returning, but sometimes it
will continue to carry additional commits. In those cases the remainder
of the list leaks.
Ultimately, the design where we only pass partial ownership to
`get_oid_oneline()` feels shoddy. Refactor the code such that we only
pass a constant pointer to the list, creating a local copy as needed.
Callers are thus always responsible for freeing the commit list, which
then allows us to plug a bunch of memory leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More memory leaks have been plugged.
* ps/leakfixes-more: (29 commits)
builtin/blame: fix leaking ignore revs files
builtin/blame: fix leaking prefixed paths
blame: fix leaking data for blame scoreboards
line-range: plug leaking find functions
merge: fix leaking merge bases
builtin/merge: fix leaking `struct cmdnames` in `get_strategy()`
sequencer: fix memory leaks in `make_script_with_merges()`
builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()`
apply: fix leaking string in `match_fragment()`
sequencer: fix leaking string buffer in `commit_staged_changes()`
commit: fix leaking parents when calling `commit_tree_extended()`
config: fix leaking "core.notesref" variable
rerere: fix various trivial leaks
builtin/stash: fix leak in `show_stash()`
revision: free diff options
builtin/log: fix leaking commit list in git-cherry(1)
merge-recursive: fix memory leak when finalizing merge
builtin/merge-recursive: fix leaking object ID bases
builtin/difftool: plug memory leaks in `run_dir_diff()`
object-name: free leaking object contexts
...
|
|
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.
* ps/use-the-repository:
hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
t/helper: remove dependency on `the_repository` in "proc-receive"
t/helper: fix segfault in "oid-array" command without repository
t/helper: use correct object hash in partial-clone helper
compat/fsmonitor: fix socket path in networked SHA256 repos
replace-object: use hash algorithm from passed-in repository
protocol-caps: use hash algorithm from passed-in repository
oidset: pass hash algorithm when parsing file
http-fetch: don't crash when parsing packfile without a repo
hash-ll: merge with "hash.h"
refs: avoid include cycle with "repository.h"
global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
hash: require hash algorithm in `empty_tree_oid_hex()`
hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
hash: make `is_null_oid()` independent of `the_repository`
hash: convert `oidcmp()` and `oideq()` to compare whole hash
global: ensure that object IDs are always padded
hash: require hash algorithm in `oidread()` and `oidclr()`
hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
|
|
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When given a length that equals the current hash algorithm's hex size,
then `repo_find_unique_abbrev_r()` exits early without trying to find an
abbreviation. This is only sensible because there is nothing to
abbreviate in the first place, so searching through objects to find a
unique prefix would be a waste of compute.
What we don't handle though is the case where the user passes a length
greater than the hash length. This is fine in practice as we still
compute the correct result. But at the very least, this is a waste of
resources as we try to abbreviate a value that cannot be abbreviated,
which causes us to hit the object database.
Start to explicitly handle values larger than hexsz to avoid this
performance penalty, which leads to a measureable speedup. The following
benchmark has been executed in linux.git:
Benchmark 1: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD~)
Time (mean ± σ): 12.812 s ± 0.040 s [User: 12.225 s, System: 0.554 s]
Range (min … max): 12.723 s … 12.857 s 10 runs
Benchmark 2: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD)
Time (mean ± σ): 11.095 s ± 0.029 s [User: 10.546 s, System: 0.521 s]
Range (min … max): 11.037 s … 11.122 s 10 runs
Summary
git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD) ran
1.15 ± 0.00 times faster than git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While it is documented in `struct object_context::path` that this
variable needs to be released by the caller, this fact is rather easy to
miss given that we do not ever provide a function to release the object
context. And of course, while some callers dutifully release the path,
many others don't.
Introduce a new `object_context_release()` function that releases the
path. Convert callsites that used to free the path to use that new
function and add missing calls to callsites that were leaking memory.
Refactor those callsites as required to have a single return path, only.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.
* eb/hash-transition: (30 commits)
t1016-compatObjectFormat: add tests to verify the conversion between objects
t1006: test oid compatibility with cat-file
t1006: rename sha1 to oid
test-lib: compute the compatibility hash so tests may use it
builtin/ls-tree: let the oid determine the output algorithm
object-file: handle compat objects in check_object_signature
tree-walk: init_tree_desc take an oid to get the hash algorithm
builtin/cat-file: let the oid determine the output algorithm
rev-parse: add an --output-object-format parameter
repository: implement extensions.compatObjectFormat
object-file: update object_info_extended to reencode objects
object-file-convert: convert commits that embed signed tags
object-file-convert: convert commit objects when writing
object-file-convert: don't leak when converting tag objects
object-file-convert: convert tag objects when writing
object-file-convert: add a function to convert trees between algorithms
object: factor out parse_mode out of fast-import and tree-walk into in object.h
cache: add a function to read an OID of a specific algorithm
tag: sign both hashes
commit: export add_header_signature to support handling signatures on tags
...
|
|
Make sure failure return from merge_bases_many() is properly caught.
* js/merge-base-with-missing-commit:
merge-ort/merge-recursive: do report errors in `merge_submodule()`
merge-recursive: prepare for `merge_submodule()` to report errors
commit-reach(repo_get_merge_bases_many_dirty): pass on errors
commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors
commit-reach(get_octopus_merge_bases): pass on "missing commits" errors
commit-reach(repo_get_merge_bases): pass on "missing commits" errors
commit-reach(get_merge_bases_many_0): pass on "missing commits" errors
commit-reach(merge_bases_many): pass on "missing commits" errors
commit-reach(paint_down_to_common): start reporting errors
commit-reach(paint_down_to_common): prepare for handling shallow commits
commit-reach(repo_in_merge_bases_many): report missing commits
commit-reach(repo_in_merge_bases_many): optionally expect missing commits
commit-reach(paint_down_to_common): plug two memory leaks
|
|
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases()` function (which is also
surfaced via the `repo_get_merge_bases()` macro) is aware of that, too.
Naturally, there are a lot of callers that need to be adjusted now, too.
Next step: adjust the callers of `get_octopus_merge_bases()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The goal of 6436a20284 (refs: allow @{n} to work with n-sized reflog,
2021-01-07) was that if we have "n" entries in a reflog, we should still
be able to resolve ref@{n} by looking at the "old" value of the oldest
entry.
Commit 6436a20284 tried to put the logic into read_ref_at() by shifting
its idea of "n" by one. But we reverted that in the previous commit,
since it led to bugs in other callers which cared about the details of
the reflog entry we found. Instead, let's put the special case into the
caller that resolves @{n}, as it cares only about the oid.
read_ref_at() is even kind enough to return the "old" value from the
final reflog; it just returns "1" to signal to us that we ran off the
end of the reflog. But we can notice in the caller that we read just
enough records for that "old" value to be the one we're looking for, and
use it.
Note that read_ref_at() could notice this case, too, and just return 0.
But we don't want to do that, because the caller must be made aware that
we only found the oid, not an actual reflog entry (and the call sites in
show-branch do care about this).
There is one complication, though. When read_ref_at() hits a truncated
reflog, it will return the "old" value of the oldest entry only if it is
not the null oid. Otherwise, it actually returns the "new" value from
that entry! This bit of fudging is due to d1a4489a56 (avoid null SHA1 in
oldest reflog, 2008-07-08), where asking for "ref@{20.years.ago}" for a
ref created recently will produce the initial value as a convenience
(even though technically it did not exist 20 years ago).
But this convenience is only useful for time-based cutoffs. For
count-based cutoffs, get_oid_basic() has always simply complained about
going too far back:
$ git rev-parse HEAD@{20}
fatal: log for 'HEAD' only has 16 entries
and we should continue to do so, rather than returning a nonsense value
(there's even a test in t1508 already which covers this). So let's have
the d1a4489a56 code kick in only when doing timestamp-based cutoffs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Support short oids encoded in any algorithm, while ensuring enough of
the oid is specified to disambiguate between all of the oids in the
repository encoded in any algorithm.
By default have the code continue to only accept oids specified in the
storage hash algorithm of the repository, but when something is
ambiguous display all of the possible oids from any accepted oid
encoding.
A new flag is added GET_OID_HASH_ANY that when supplied causes the
code to accept oids specified in any hash algorithm, and to return the
oids that were resolved.
This implements the functionality that allows both SHA-1 and SHA-256
object names, from the "Object names on the command line" section of
the hash function transition document.
Care is taken in get_short_oid so that when the result is ambiguous
the output remains the same if GIT_OID_HASH_ANY was not supplied. If
GET_OID_HASH_ANY was supplied objects of any hash algorithm that match
the prefix are displayed.
This required updating repo_for_each_abbrev to give it a parameter so
that it knows to look at all hash algorithms.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move functions that are not about pure string manipulation out of
strbuf.[ch]
* cw/strbuf-cleanup:
strbuf: remove global variable
path: move related function to path
object-name: move related functions to object-name
credential-store: move related functions to credential-store file
abspath: move related functions to abspath
strbuf: clarify dependency
strbuf: clarify API boundary
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h. Also move some related inline
functions from cache.h to read-cache.h. The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move object-name-related functions from strbuf.[ch] to object-name.[ch]
so that strbuf is focused on string manipulation routines with minimal
dependencies.
dir.h relied on the forward declration of the repository struct in
strbuf.h. Since that is removed in this patch, add the forward
declaration to dir.h.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More header clean-up.
* en/header-split-cache-h-part-2: (22 commits)
reftable: ensure git-compat-util.h is the first (indirect) include
diff.h: reduce unnecessary includes
object-store.h: reduce unnecessary includes
commit.h: reduce unnecessary includes
fsmonitor: reduce includes of cache.h
cache.h: remove unnecessary headers
treewide: remove cache.h inclusion due to previous changes
cache,tree: move basic name compare functions from read-cache to tree
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
hash-ll.h: split out of hash.h to remove dependency on repository.h
tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
dir.h: move DTYPE defines from cache.h
versioncmp.h: move declarations for versioncmp.c functions from cache.h
ws.h: move declarations for ws.c functions from cache.h
match-trees.h: move declarations for match-trees.c functions from cache.h
pkt-line.h: move declarations for pkt-line.c functions from cache.h
base85.h: move declarations for base85.c functions from cache.h
copy.h: move declarations for copy.c functions from cache.h
server-info.h: move declarations for server-info.c functions from cache.h
packfile.h: move pack_window and pack_entry from cache.h
...
|
|
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of advice functions, without explicitly
including advice.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
advice.h if they are using it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.
* en/header-split-cleanup:
csum-file.h: remove unnecessary inclusion of cache.h
write-or-die.h: move declarations for write-or-die.c functions from cache.h
treewide: remove cache.h inclusion due to setup.h changes
setup.h: move declarations for setup.c functions from cache.h
treewide: remove cache.h inclusion due to environment.h changes
environment.h: move declarations for environment.c functions from cache.h
treewide: remove unnecessary includes of cache.h
wrapper.h: move declarations for wrapper.c functions from cache.h
path.h: move function declarations for path.c functions from cache.h
cache.h: remove expand_user_path()
abspath.h: move absolute path functions from cache.h
environment: move comment_line_char from cache.h
treewide: remove unnecessary cache.h inclusion from several sources
treewide: remove unnecessary inclusion of gettext.h
treewide: be explicit about dependence on gettext.h
treewide: remove unnecessary cache.h inclusion from a few headers
|
|
Code clean-up around the use of the_repository.
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
|
|
en/header-split-cache-h
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
|
|
"git rev-parse --quiet foo@{u}", or anything that asks @{u} to be
parsed with GET_OID_QUIETLY option, did not quietly fail, which has
been corrected.
* fc/oid-quietly-parse-upstream:
object-name: fix quiet @{u} parsing
|
|
As can easily be seen from grepping in our sources, we had these uses
of "the_repository" in various library code in cases where the
function in question was already getting a "struct repository *"
argument. Let's use that argument instead.
Out of these changes only the changes to "cache-tree.c",
"commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly
applied before the migration away from the "repo_*()" wrapper macros
in the preceding commits.
The rest aren't new, as we'd previously implicitly refer to
"the_repository", but it's now more obvious that we were doing the
wrong thing all along, and should have used the parameter instead.
The change to change "get_index_format_default(the_repository)" in
"read-cache.c" to use the "r" variable instead should arguably have
been part of [1], or in the subsequent cleanup in [2]. Let's do it
here, as can be seen from the initial code in [3] it's not important
that we use "the_repository" there, but would prefer to always use the
current repository.
This change excludes the "the_repository" use in "upload-pack.c"'s
upload_pack_advertise(), as the in-flight [4] makes that change.
1. ee1f0c242ef (read-cache: add index.skipHash config option,
2023-01-06)
2. 6269f8eaad0 (treewide: always have a valid "index_state.repo"
member, 2023-01-17)
3. 7211b9e7534 (repo-settings: consolidate some config settings,
2019-08-13)
4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the part of "the_repository.pending.cocci" pertaining to
"pretty.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the part of "the_repository.pending.cocci" pertaining to
"commit.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the part of "the_repository.pending.cocci" pertaining to
"cache.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More work towards -Wunused.
* jk/unused-post-2.39-part2: (21 commits)
help: mark unused parameter in git_unknown_cmd_config()
run_processes_parallel: mark unused callback parameters
userformat_want_item(): mark unused parameter
for_each_commit_graft(): mark unused callback parameter
rewrite_parents(): mark unused callback parameter
fetch-pack: mark unused parameter in callback function
notes: mark unused callback parameters
prio-queue: mark unused parameters in comparison functions
for_each_object: mark unused callback parameters
list-objects: mark unused callback parameters
mark unused parameters in signal handlers
run-command: mark error routine parameters as unused
mark "pointless" data pointers in callbacks
ref-filter: mark unused callback parameters
http-backend: mark unused parameters in virtual functions
http-backend: mark argc/argv unused
object-name: mark unused parameters in disambiguate callbacks
serve: mark unused parameters in virtual functions
serve: use repository pointer to get config
ls-refs: drop config caching
...
|
|
Currently `git rev-parse --quiet @{u}` is not actually quiet when
upstream isn't configured:
fatal: no upstream configured for branch 'foo'
Make it so.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The object-name disambiguation code triggers a callback for each
possible object id we find. This is really used for two purposes:
- "hint" functions like disambiguate_commit_only report back on
whether the value is usable
- iterator functions like repo_for_each_abbrev() use it to collect
and report matching names.
Compiling with -Wunused-parameter generates several warnings, but
they're distinct for each type. The "hint" functions never look at the
void cb_data pointer; they only care whether the oid matches our hint.
The iterator functions never look at the "struct repository" parameter;
they're just reporting back the oids they see, and always return 0.
So arguably these could be two separate interfaces:
int (*hint)(struct repository *r, const struct object_id *oid);
void (*iter)(const struct object_id *oid, void *cb_data);
But doing so would complicate the disambiguation code, which now has to
accept and call the two different types. Since we can easily squelch the
compiler warnings by annotating the functions, let's just do that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b240347543 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Functions used with for_each_reflog_ent() need to conform to a
particular interface, but not every function needs all of the
parameters. Mark the unused ones to make -Wunused-parameter happy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Functions used with for_each_ref(), etc, need to conform to the
each_ref_fn interface. But most of them don't need every parameter;
let's annotate the unused ones to quiet -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When running 'git show :<path>' where '<path>' is a directory, then
there is a subtle difference between a full checkout and a sparse
checkout. The error message from diagnose_invalid_index_path() reports
whether the path is on disk or not. The full checkout will have the
directory on disk, but the path will not be in the index. The sparse
checkout could have the directory not exist, specifically when that
directory is outside of the sparse-checkout cone.
In the case of a sparse index, we have yet another state: the path can
be a sparse directory in the index. In this case, the error message from
diagnose_invalid_index_path() would erroneously say "path '<path>' is in
the index, but not at stage 0", which is false.
Add special casing around sparse directory entries so we get to the
correct error message. This requires two checks in order to get parity
with the normal sparse-checkout case.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The get_oid_with_context_1() method is used when parsing revision
arguments. One particular case is to take a ":<path>" string and search
the index for the given path.
In the case of a sparse index, this might find a sparse directory entry,
in which case the contained object is a tree. In the case of a full
index, this search within the index would fail.
In order to maintain identical return state as in a full index, inspect
the discovered cache entry to see if it is a sparse directory and reject
it. This requires being careful around the only_to_die option to be sure
we die only at the correct time.
This changes the behavior of 'git show :<sparse-dir>', but does not
bring it entirely into alignment with a full index case. It specifically
hits the wrong error message within diagnose_invalid_index_path(). That
error message will be corrected in a future change.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Plug (some) memory leaks around parse_date_format().
* ab/date-mode-release:
date API: add and use a date_mode_release()
date API: add basic API docs
date API: provide and use a DATE_MODE_INIT
date API: create a date.h, split from cache.h
cache.h: remove always unused show_date_human() declaration
|
|
Error output given in response to an ambiguous object name has been
improved.
* ab/ambiguous-object-name:
object-name: re-use "struct strbuf" in show_ambiguous_object()
object-name: iterate ambiguous objects before showing header
object-name: show date for ambiguous tag objects
object-name: make ambiguous object output translatable
object-name: explicitly handle bad tags in show_ambiguous_object()
object-name: explicitly handle OBJ_BAD in show_ambiguous_object()
object-name tests: add tests for ambiguous object blind spots
|
|
Move the declaration of the date.c functions from cache.h, and adjust
the relevant users to include the new date.h header.
The show_ident_date() function belonged in pretty.h (it's defined in
pretty.c), its two users outside of pretty.c didn't strictly need to
include pretty.h, as they get it indirectly, but let's add it to them
anyway.
Similarly, the change to "builtin/{fast-import,show-branch,tag}.c"
isn't needed as far as the compiler is concerned, but since they all
use the "DATE_MODE()" macro we now define in date.h, let's have them
include it.
We could simply include this new header in "cache.h", but as this
change shows these functions weren't common enough to warrant
including in it in the first place. By moving them out of cache.h
changes to this API will no longer cause a (mostly) full re-build of
the project when "make" is run.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reduce the allocations done by show_ambiguous_object() by moving the
"desc" strbuf into the "struct ambiguous_output" introduced in the
preceding commit.
This doesn't matter for optimization purposes, but since we're
accumulating a "struct strbuf advice" anyway let's follow that pattern
and add a "struct strbuf sb", we can then strbuf_reset() it rather
than calling strbuf_release() for each call to
show_ambiguous_object().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the "The candidates are" header that's shown for ambiguous
objects to be shown after we've iterated over all of the objects.
If we get any errors while doing so we don't want to split up the the
header and the list as a result. The two will now be printed together,
as shown in the updated testcase.
As we're accumulating the lines into as "struct strbuf" before
emitting them we need to add a trailing newline to the call in
show_ambiguous_object(). This and the change from "The candidates
are:" to "The candidates are:\n%s" helps to give translators more
context.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the ambiguous tag object output nicer in the case of tag objects
such as ebf3c04b262 (Git 2.32, 2021-06-06) by including the date in
the "tagger" header. I.e.:
$ git rev-parse b7e68
error: short object ID b7e68 is ambiguous
hint: The candidates are:
hint: b7e68c41d92 tag 2021-06-06 - v2.32.0
hint: b7e68ae18e0 commit 2019-12-23 - bisect: use the standard 'if (!var)' way to check for 0
hint: b7e68f6b413 tree
hint: b7e68490b97 blob
b7e68
[...]
Before this we'd emit a "tag" line without a date, e.g.:
hint: b7e68c41d92 tag v2.32.0
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the output of show_ambiguous_object() added in [1] and last
tweaked in [2] and the preceding commit to be more friendly to
translators.
By being able to customize the "<SP><SP>%s\n" format we're even ready
for RTL languages, who'd presumably like to change that to
"%s<SP><SP>\n".
In the case of the existing "tag [tag could not be parsed]" output
we'll now instead emit "[bad tag, could not parse it]". This is
consistent with the "[bad object]" output. Rephrasing the message like
this is possible because we're not unconditionally adding the
type_name() at the beginning.
1. 1ffa26c461 (get_short_sha1: list ambiguous objects on error,
2016-09-26)
2. 5cc044e0257 (get_short_oid: sort ambiguous objects by type,
then SHA-1, 2018-05-10)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Follow-up the handling of OBJ_BAD in the preceding commit and
explicitly handle those cases where parse_tag() fails, or we don't end
up with a non-NULL pointer in in tag->tag.
If we run into such a tag we'd previously be silent about it. We
really should also be handling these batter in parse_tag_buffer() by
being more eager to emit an error(), instead of silently aborting with
"return -1;".
One example of such a tag is the one that's tested for in
"t3800-mktag.sh", where the code takes the "size <
the_hash_algo->hexsz + 24" branch.
But in lieu of earlier missing "error" output let's show the user
something to indicate why we're not showing a tag message in these
cases, now instead of showing:
hint: deadbeef tag
We'll instead display:
hint: deadbeef tag [tag could not be parsed]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Amend the "unknown type" handling in the code that displays the
ambiguous object list to assert() that we're either going to get the
"real" object types we can pass to type_name(), or a -1 (OBJ_BAD)
return value from oid_object_info().
See [1] for the current output, and [1] for the commit that added the
"unknown type" handling.
We are never going to get an "unknown type" in the sense of custom
types crafted with "hash-object --literally", since we're not using
the OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag.
If we manage to otherwise unpack such an object without errors we'll
die() in parse_loose_header_extended() called by sort_ambiguous()
before we get to show_ambiguous_object(), as is asserted by the test
added in the preceding commit.
So saying "unknown type" here was always misleading, we really meant
to say that we had a failure parsing the object at all, i.e. that we
had repository corruption. If the problem is only that it's type is
unknown we won't reach this code.
So let's emit a generic "[bad object]" instead. As our tests added in
the preceding commit show, we'll have emitted various "error" output
already in those cases.
We should do better in the truly "unknown type" cases, which we'd need
to handle if we were passing down the OBJECT_INFO_ALLOW_UNKNOWN_TYPE
flag. But let's leave that for some future improvement. In a
subsequent commit I'll improve the output we do show, and not having
to handle the "unknown type" (as in OBJECT_INFO_ALLOW_UNKNOWN_TYPE)
simplifies that change.
1. 5cc044e0257 (get_short_oid: sort ambiguous objects by type,
then SHA-1, 2018-05-10)
2. 1ffa26c461 (get_short_sha1: list ambiguous objects on error,
2016-09-26)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the cat_one_file() logic that calls get_oid_with_context()
under --textconv and --filters to use the GET_OID_ONLY_TO_DIE flag,
thus improving the error messaging emitted when e.g. <path> is missing
but <rev> is not.
To service the "cat-file" use-case we need to introduce a new
"GET_OID_REQUIRE_PATH" flag, otherwise it would exit early as soon as
a valid "HEAD" was resolved, but in the "cat-file" case being changed
we always need a valid revision and path.
This arguably makes the "<bad rev>:<bad path>" and "<bad
rev>:<good (in HEAD) path>" use cases worse, as we won't quote the
<path> component at the user anymore, but let's just use the existing
logic "git log" et al use for now. We can improve the messaging for
those cases as a follow-up for all callers.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop having GET_OID_ONLY_TO_DIE imply GET_OID_QUIETLY in
get_oid_with_context_1().
The *_DIE flag was added in 33bd598c390 (sha1_name.c: teach lookup
context to get_sha1_with_context(), 2012-07-02), and then later
tweaked in 7243ffdd78d (get_sha1: avoid repeating ourselves via
ONLY_TO_DIE, 2016-09-26).
Everything in that commit makes sense, but only for callers that
expect to fail in an initial call to get_oid_with_context_1(), e.g. as
"git show 0017" does via handle_revision_arg(), and then would like to
call get_oid_with_context_1() again via this
maybe_die_on_misspelt_object_name() function.
In the subsequent commit we'll add a new caller that expects to call
this only once, but who would still like to have all the error
messaging that GET_OID_ONLY_TO_DIE gives it, in addition to any
regular errors.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In c4a09cc9ccb (Merge branch 'hw/advise-ng', 2020-03-25), a new API for
accessing advice variables was introduced and deprecated `advice_config`
in favor of a new array, `advice_setting`.
This patch ports all but two uses which read the status of the global
`advice_` variables over to the new `advice_enabled` API. We'll deal
with advice_add_embedded_repo and advice_graft_file_deprecated
separately.
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This saves 8K per `struct object_directory', meaning it saves
around 800MB in my case involving 100K alternates (half or more
of those alternates are unlikely to hold loose objects).
This is implemented in two parts: a generic, allocation-free
`cbtree' and the `oidtree' wrapper on top of it. The latter
provides allocation using alloc_state as a memory pool to
improve locality and reduce free(3) overhead.
Unlike oid-array, the crit-bit tree does not require sorting.
Performance is bound by the key length, for oidtree that is
fixed at sizeof(struct object_id). There's no need to have
256 oidtrees to mitigate the O(n log n) overhead like we did
with oid-array.
Being a prefix trie, it is natively suited for expanding short
object IDs via prefix-limited iteration in
`find_short_object_filename'.
On my busy workstation, p4205 performance seems to be roughly
unchanged (+/-8%). Startup with 100K total alternates with no
loose objects seems around 10-20% faster on a hot cache.
(800MB in memory savings means more memory for the kernel FS
cache).
The generic cbtree implementation does impose some extra
overhead for oidtree in that it uses memcmp(3) on
"struct object_id" so it wastes cycles comparing 12 extra bytes
on SHA-1 repositories. I've not yet explored reducing this
overhead, but I expect there are many places in our code base
where we'd want to investigate this.
More information on crit-bit trees: https://cr.yp.to/critbit.html
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Generalize the last remnants of "sha" and "sha1" in this file and rename
it to reflect that we're not just able to handle SHA-1 these days.
We need to update one test to check for an updated error string.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|