| Age | Commit message (Collapse) | Author | Files | Lines |
|
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop using `the_repository` in the "progress" subsystem by passing in a
repository when initializing `struct progress`. Furthermore, store a
pointer to the repository in that struct so that we can pass it to the
trace2 API when logging information.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Same as the preceding commit, we unconditionally dereference the index's
cache entries depending on the number of cache-tree entries, which can
lead to a segfault when the cache-tree is corrupted. Fix this bug.
This also makes t4058 pass with the leak sanitizer enabled.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function `cache_tree_verify()` will `BUG()` whenever it finds that
the cache-tree extension of the index is corrupt. The function is thus
inherently untestable because the resulting call to `abort()` will be
detected by our testing framework and labelled an error. And rightfully
so: it shouldn't ever be possible to hit bugs, as they should indicate a
programming error rather than corruption of on-disk state.
Refactor the function to instead return error codes. This also ensures
that the function can be used e.g. by git-fsck(1) without the whole
process dying. Furthermore, this refactoring plugs some memory leaks
when returning early by creating a common exit path.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When provided a pointer to a destination index, then `unpack_trees()`
will end up copying its `o->internal.result` index into the provided
pointer. In those cases it is thus not necessary to free the index, as
we have transferred ownership of it.
There are cases though where we do not end up transferring ownership of
the memory, but `clear_unpack_trees_porcelain()` will never discard the
index in that case and thus cause a memory leak. And right now it cannot
do so in the first place because we have no indicator of whether we did
or didn't transfer ownership of the index.
Adapt the code to zero out the index in case we transfer its ownership.
Like this, we can now unconditionally discard the index when being asked
to clear the `unpack_trees_options`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `resolve_gitlink_ref()` we implicitly rely on `the_repository` to
look up the submodule ref store. Now that we can look up submodule ref
stores for arbitrary repositories we can improve this function to
instead accept a repository as parameter for which we want to resolve
the gitlink.
Do so and adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Shuffle some bits across headers and sources to prepare for
libification effort.
* cw/prelim-cleanup:
parse: separate out parsing functions from config.h
config: correct bad boolean env value error message
wrapper: reduce scope of remove_or_warn()
hex-ll: separate out non-hash-algo functions
|
|
The files config.{h,c} contain functions that have to do with parsing,
but not config.
In order to further reduce all-in-one headers, separate out functions in
config.c that do not operate on config into its own file, parse.h,
and update the include directives in the .c files that need only such
functions accordingly.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The traverse_trees() and traverse_trees_recursive() functions call each
other recursively. In a deep tree, this can result in running out of
stack space and crashing.
There's obviously going to be some limit here based on available stack,
but the problem is exacerbated by a few large structs, many of which we
over-allocate. For example, in traverse_trees() we store a name_entry
and tree_desc_x per tree, both of which contain an object_id (which is
now 32 bytes). And we allocate 8 of them (from MAX_TRAVERSE_TREES), even
though many traversals will only look at 1 or 2.
Interestingly, we used to allocate these on the heap, prior to
8dd40c0472 (traverse_trees(): use stack array for name entries,
2020-01-30). That commit was trying to simplify away allocation size
computations, and naively assumed that the sizes were small enough not
to matter. And they don't in normal cases, but on my stock Debian system
I see a crash running "git archive" on a tree with ~3600 entries.
That's deep enough we wouldn't see it in practice, but probably shallow
enough that we'd prefer not to make it a hard limit. Especially because
other systems may have even smaller stacks.
We can replace these stack variables with a few malloc invocations. This
reduces the stack sizes for the two functions from 1128 and 752 bytes,
respectively, down to 40 and 92 bytes. That allows a depth of ~13000 on
my machine (the improvement isn't in linear proportion because my
numbers don't count the size of parameters and other function overhead).
The possible downsides are:
1. We now have to remember to free(). But both functions have an easy
single exit (and already had to clean up other bits anyway).
2. The extra malloc()/free() overhead might be measurable. I tested
this by setting up a 3000-depth tree with a single blob and running
"git archive" on it. After switching to the heap, it consistently
runs 2-3% faster! Presumably this is because the 1K+ of wasted
stack space penalized memory caches.
On a more real-world case like linux.git, the speed difference isn't
measurable at all, simply because most trees aren't that deep and
there's so much other work going on (like accessing the objects
themselves). So the improvement I saw should be taken as evidence that
we're not making anything worse, but isn't really that interesting on
its own. The main motivation here is that we're now less likely to run
out of stack space and crash.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h. Also move some related inline
functions from cache.h to read-cache.h. The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More header clean-up.
* en/header-split-cache-h-part-2: (22 commits)
reftable: ensure git-compat-util.h is the first (indirect) include
diff.h: reduce unnecessary includes
object-store.h: reduce unnecessary includes
commit.h: reduce unnecessary includes
fsmonitor: reduce includes of cache.h
cache.h: remove unnecessary headers
treewide: remove cache.h inclusion due to previous changes
cache,tree: move basic name compare functions from read-cache to tree
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
hash-ll.h: split out of hash.h to remove dependency on repository.h
tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
dir.h: move DTYPE defines from cache.h
versioncmp.h: move declarations for versioncmp.c functions from cache.h
ws.h: move declarations for ws.c functions from cache.h
match-trees.h: move declarations for match-trees.c functions from cache.h
pkt-line.h: move declarations for pkt-line.c functions from cache.h
base85.h: move declarations for base85.c functions from cache.h
copy.h: move declarations for copy.c functions from cache.h
server-info.h: move declarations for server-info.c functions from cache.h
packfile.h: move pack_window and pack_entry from cache.h
...
|
|
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of advice functions, without explicitly
including advice.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
advice.h if they are using it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h. This made it more difficult
to find which files could remove a dependence on cache.h. Make C files
explicitly include trace.h or trace2.h if they are using them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.
* en/header-split-cleanup:
csum-file.h: remove unnecessary inclusion of cache.h
write-or-die.h: move declarations for write-or-die.c functions from cache.h
treewide: remove cache.h inclusion due to setup.h changes
setup.h: move declarations for setup.c functions from cache.h
treewide: remove cache.h inclusion due to environment.h changes
environment.h: move declarations for environment.c functions from cache.h
treewide: remove unnecessary includes of cache.h
wrapper.h: move declarations for wrapper.c functions from cache.h
path.h: move function declarations for path.c functions from cache.h
cache.h: remove expand_user_path()
abspath.h: move absolute path functions from cache.h
environment: move comment_line_char from cache.h
treewide: remove unnecessary cache.h inclusion from several sources
treewide: remove unnecessary inclusion of gettext.h
treewide: be explicit about dependence on gettext.h
treewide: remove unnecessary cache.h inclusion from a few headers
|
|
Code clean-up around the use of the_repository.
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
|
|
The index files can become corrupt under certain conditions when
the split-index feature is in use, especially together with
fsmonitor, which have been corrected.
* js/split-index-fixes:
unpack-trees: take care to propagate the split-index flag
fsmonitor: avoid overriding `cache_changed` bits
split-index; stop abusing the `base_oid` to strip the "link" extension
split-index & fsmonitor: demonstrate a bug
|
|
en/header-split-cache-h
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
|
|
Apply the part of "the_repository.pending.cocci" pertaining to
"promisor-remote.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When copying the `split_index` structure from one index structure to
another, we need to propagate the `SPLIT_INDEX_ORDERED` flag, too, if it
is set, otherwise Git might forget to write the shared index when that
is actually needed.
It just so _happens_ that in many instances when `unpack_trees()` is
called, the result causes the shared index to be written anyway, but
there are edge cases when that is not so.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up to clarify the rule that "git-compat-util.h" must be
the first to be included.
* en/header-cleanup:
diff.h: remove unnecessary include of object.h
Remove unnecessary includes of builtin.h
treewide: replace cache.h with more direct headers, where possible
replace-object.h: move read_replace_refs declaration from cache.h to here
object-store.h: move struct object_info from cache.h
dir.h: refactor to no longer need to include cache.h
object.h: stop depending on cache.h; make cache.h depend on object.h
ident.h: move ident-related declarations out of cache.h
pretty.h: move has_non_ascii() declaration from commit.h
cache.h: remove dependence on hex.h; make other files include it explicitly
hex.h: move some hex-related declarations from cache.h
hash.h: move some oid-related declarations from cache.h
alloc.h: move ALLOC_GROW() functions from cache.h
treewide: remove unnecessary cache.h includes in source files
treewide: remove unnecessary cache.h includes
treewide: remove unnecessary git-compat-util.h includes in headers
treewide: ensure one of the appropriate headers is sourced first
|
|
Avoid making users believe they need to initialize df_conflict_entry
to something (as happened with other output only fields before) with
a quick comment and a small sanity check.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
builtin/read-tree.c has some special functionality explicitly designed
for debugging unpack-trees.[ch]. Associated with that is two fields
that no other external caller would or should use. Mark these as
internal to unpack-trees, but allow builtin/read-tree to read or write
them for this special case.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous patch made many lines a little longer, resulting in four
becoming a bit too long. They were left as-is for the previous patch
to facilitate reviewers verifying that we were just adding "internal."
in a bunch of places, but rewrap them now.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Continue the work from the previous patch by finding additional fields
which are only used internally but not yet explicitly marked as such,
and include them in the internal fields struct.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This just splits the two fields already marked as internal-only into a
separate internal struct. Future commits will add more fields that
were meant to be internal-only but were not explicitly marked as such
to the same struct.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 2f6b1eb794 ("cache API: add a "INDEX_STATE_INIT" macro/function,
add release_index()", 2023-01-12) mistakenly added some initialization
of a member of unpack_trees_options that was intended to be
internal-only. This initialization should be done within
update_sparsity() instead.
Note that while o->result is mostly meant for unpack_trees() and
update_sparsity() mostly operates without o->result,
check_ok_to_remove() does consult it so we need to ensure it is properly
initialized.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
struct unpack_trees_options has the following field and comment:
struct pattern_list *pl; /* for internal use */
Despite the internal-use comment, commit e091228e17 ("sparse-checkout:
update working directory in-process", 2019-11-21) starting setting this
field from an external caller. At the time, the only way around that
would have been to modify unpack_trees() to take an extra pattern_list
argument, and there's a lot of callers of that function. However, when
we split update_sparsity() off as a separate function, with
sparse-checkout being the sole caller, the need to update other callers
went away. Fix this API problem by adding a pattern_list argument to
update_sparsity() and stop setting the internal o.pl field directly.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The update_sparsity() function was introduced in commit 7af7a25853
("unpack-trees: add a new update_sparsity() function", 2020-03-27).
Prior to that, unpack_trees() was used, but that had a few bugs because
the needs of the caller were different, and different enough that
unpack_trees() could not easily be modified to handle both usecases.
The implementation detail that update_sparsity() was written by copying
unpack_trees() and then streamlining it, and then modifying it in the
needed ways still shows through in that there are leftover vestiges in
both functions that are no longer needed. Clean them up. In
particular:
* update_sparsity() allows a pattern list to be passed in, but
unpack_trees() never should use a different pattern list. Add a
check and a BUG() if this gets violated.
* update_sparsity() has a check early on that will BUG() if
o->skip_sparse_checkout is set; as such, there's no need to check
for that condition again later in the code. We can simply remove
the check and its corresponding goto label.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When a directory exists but has only ignored files within it and we are
trying to switch to a branch that has a file where that directory is,
the behavior depends upon --[no]-overwrite-ignore. If the user wants to
--overwrite-ignore (the default), then we should delete the ignored file
and directory and switch to the new branch.
The code to handle this in verify_clean_subdirectory() in unpack-trees
tried to handle this via paying attention to the exclude_per_dir setting
of the internal dir field. This came from commit c81935348b ("Fix
switching to a branch with D/F when current branch has file D.",
2007-03-15), which pre-dated 039bc64e88 ("core.excludesfile clean-up",
2007-11-14), and thus did not pay attention to ignore patterns from
other relevant files. Change it to use setup_standard_excludes() so
that it is also aware of excludes specified in other locations.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the "repo" member was added to "the_index" in [1] the
repo_read_index() was made to populate it, but the unpopulated
"the_index" variable didn't get the same treatment.
Let's do that in initialize_the_repository() when we set it up, and
likewise for all of the current callers initialized an empty "struct
index_state".
This simplifies code that needs to deal with "the_index" or a custom
"struct index_state", we no longer need to second-guess this part of
the "index_state" deep in the stack. A recent example of such
second-guessing is the "istate->repo ? istate->repo : the_repository"
code in [2]. We can now simply use "istate->repo".
We're doing this by making use of the INDEX_STATE_INIT() macro (and
corresponding function) added in [3], which now have mandatory "repo"
arguments.
Because we now call index_state_init() in repository.c's
initialize_the_repository() we don't need to handle the case where we
have a "repo->index" whose "repo" member doesn't match the "repo"
we're setting up, i.e. the "Complete the double-reference" code in
repo_read_index() being altered here. That logic was originally added
in [1], and was working around the lack of what we now have in
initialize_the_repository().
For "fsmonitor-settings.c" we can remove the initialization of a NULL
"r" argument to "the_repository". This was added back in [4], and was
needed at the time for callers that would pass us the "r" from an
"istate->repo". Before this change such a change to
"fsmonitor-settings.c" would segfault all over the test suite (e.g. in
t0002-gitfile.sh).
This change has wider eventual implications for
"fsmonitor-settings.c". The reason the other lazy loading behavior in
it is required (starting with "if (!r->settings.fsmonitor) ..." is
because of the previously passed "r" being "NULL".
I have other local changes on top of this which move its configuration
reading to "prepare_repo_settings()" in "repo-settings.c", as we could
now start to rely on it being called for our "r". But let's leave all
of that for now, and narrowly remove this particular part of the
lazy-loading.
1. 1fd9ae517c4 (repository: add repo reference to index_state,
2021-01-23)
2. ee1f0c242ef (read-cache: add index.skipHash config option,
2023-01-06)
3. 2f6b1eb794e (cache API: add a "INDEX_STATE_INIT" macro/function,
add release_index(), 2023-01-12)
4. 1e0ea5c4316 (fsmonitor: config settings are repository-specific,
2022-03-25)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Hopefully in some not so distant future, we'll get advantages from always
initializing the "repo" member of the "struct index_state". To make
that easier let's introduce an initialization macro & function.
The various ad-hoc initialization of the structure can then be changed
over to it, and we can remove the various "0" assignments in
discard_index() in favor of calling index_state_init() at the end.
While not strictly necessary, let's also change the CALLOC_ARRAY() of
various "struct index_state *" to use an ALLOC_ARRAY() followed by
index_state_init() instead.
We're then adding the release_index() function and converting some
callers (including some of these allocations) over to it if they
either won't need to use their "struct index_state" again, or are just
about to call index_state_init().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the ensure_full_index() function stricter, and have it only
accept a non-NULL "struct index_state". This function (and this
behavior) was added in [1].
The only reason it needed to be this lax was due to interaction with
repo_index_has_changes(). See the addition of that code in [2].
The other reason for why this was needed dates back to interaction
with code added in [3]. In [4] we started calling ensure_full_index()
in unpack_trees(), but the caller added in 34110cd4e39 wants to pass
us a NULL "dst_index". Let's instead do the NULL check in
unpack_trees() itself.
1. 4300f8442a2 (sparse-index: implement ensure_full_index(), 2021-03-30)
2. 0c18c059a15 (read-cache: ensure full index, 2021-04-01)
3. 34110cd4e39 (Make 'unpack_trees()' have a separate source and
destination index, 2008-03-06)
4. 6863df35503 (unpack-trees: ensure full index, 2021-03-30)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop using "git --super-prefix" and narrow the scope of its use to
the submodule--helper.
* ab/no-more-git-global-super-prefix:
read-tree: add "--super-prefix" option, eliminate global
submodule--helper: convert "{update,clone}" to their own "--super-prefix"
submodule--helper: convert "status" to its own "--super-prefix"
submodule--helper: convert "sync" to its own "--super-prefix"
submodule--helper: convert "foreach" to its own "--super-prefix"
submodule--helper: don't use global --super-prefix in "absorbgitdirs"
submodule.c & submodule--helper: pass along "super_prefix" param
read-tree + fetch tests: test failing "--super-prefix" interaction
submodule absorbgitdirs tests: add missing "Migrating git..." tests
|
|
The "--super-prefix" option to "git" was initially added in [1] for
use with "ls-files"[2], and shortly thereafter "submodule--helper"[3]
and "grep"[4]. It wasn't until [5] that "read-tree" made use of it.
At the time [5] made sense, but since then we've made "ls-files"
recurse in-process in [6], "grep" in [7], and finally
"submodule--helper" in the preceding commits.
Let's also remove it from "read-tree", which allows us to remove the
option to "git" itself.
We can do this because the only remaining user of it is the submodule
API, which will now invoke "read-tree" with its new "--super-prefix"
option. It will only do so when the "submodule_move_head()" function
is called.
That "submodule_move_head()" function was then only invoked by
"read-tree" itself, but now rather than setting an environment
variable to pass "--super-prefix" between cmd_read_tree() we:
- Set a new "super_prefix" in "struct unpack_trees_options". The
"super_prefixed()" function in "unpack-trees.c" added in [5] will now
use this, rather than get_super_prefix() looking up the environment
variable we set earlier in the same process.
- Add the same field to the "struct checkout", which is only needed to
ferry the "super_prefix" in the "struct unpack_trees_options" all the
way down to the "entry.c" callers of "submodule_move_head()".
Those calls which used the super prefix all originated in
"cmd_read_tree()". The only other caller is the "unlink_entry()"
caller in "builtin/checkout.c", which now passes a "NULL".
1. 74866d75793 (git: make super-prefix option, 2016-10-07)
2. e77aa336f11 (ls-files: optionally recurse into submodules, 2016-10-07)
3. 89c86265576 (submodule helper: support super prefix, 2016-12-08)
4. 0281e487fd9 (grep: optionally recurse into submodules, 2016-12-16)
5. 3d415425c7b (unpack-trees: support super-prefix option, 2017-01-17)
6. 188dce131fa (ls-files: use repository object, 2017-06-22)
7. f9ee2fcdfa0 (grep: recurse in-process using 'struct repository', 2017-08-02)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These files are already included; we do not need to include them again
Signed-off-by: Seija Kijin <doremylover123@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add (disabled by default) option to skip the 'cache_tree_update()' at the
end of 'unpack_trees()'. In many cases, this cache tree update is redundant
because the caller of 'unpack_trees()' immediately follows it with
'prime_cache_tree()', rebuilding the entire cache tree from scratch. While
these operations aren't the most expensive part of operations like 'git
reset', the duplicate calls still create a minor unnecessary slowdown.
Introduce an option for callers to skip the 'cache_tree_update()' in
'unpack_trees()' if it is redundant (that is, if 'prime_cache_tree()' is
called afterwards). At the moment, no 'unpack_trees()' callers use the new
option; they will be updated in subsequent patches.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
|
|
Segfault fix-up to an earlier fix to the topic to teach "git reset"
and "git checkout" work better in a sparse checkout.
* vd/sparse-reset-checkout-fixes:
unpack-trees: fix sparse directory recursion check
|
|
Ensure 'is_sparse_directory_entry()' receives a valid 'name_entry *' if one
exists in the list of tree(s) being unpacked in 'unpack_callback()'.
Currently, 'is_sparse_directory_entry()' is called with the first
'name_entry' in the 'names' list of entries on 'unpack_callback()'. However,
this entry may be empty even when other elements of 'names' are not (such as
when switching from an orphan branch back to a "normal" branch). As a
result, 'is_sparse_directory_entry()' could incorrectly indicate that a
sparse directory is *not* actually sparse because the name of the index
entry does not match the (empty) 'name_entry' path.
Fix the issue by using the existing 'name_entry *p' value in
'unpack_callback()', which points to the first non-empty entry in 'names'.
Because 'p' is 'const', also update 'is_sparse_directory_entry()'s
'name_entry *' argument to be 'const'.
Finally, add a regression test case.
Reported-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fixes to sparse index compatibility work for "reset" and "checkout"
commands.
source: <pull.1312.v3.git.1659985672.gitgitgadget@gmail.com>
* vd/sparse-reset-checkout-fixes:
unpack-trees: unpack new trees as sparse directories
cache.h: create 'index_name_pos_sparse()'
oneway_diff: handle removed sparse directories
checkout: fix nested sparse directory diff in sparse index
|
|
Fixes to sparse index compatibility work for "reset" and "checkout"
commands.
* vd/sparse-reset-checkout-fixes:
unpack-trees: unpack new trees as sparse directories
cache.h: create 'index_name_pos_sparse()'
oneway_diff: handle removed sparse directories
checkout: fix nested sparse directory diff in sparse index
|
|
If 'unpack_single_entry()' is unpacking a new directory tree (that is, one
not already present in the index) into a sparse index, unpack the tree as a
sparse directory rather than traversing its contents and unpacking each file
individually. This helps keep the sparse index as collapsed as possible in
cases such as 'git reset --hard' restoring a outside-of-cone directory
removed with 'git rm -r --sparse'.
Without this patch, 'unpack_single_entry()' will only unpack a directory
into the index as a sparse directory (rather than traversing into it and
unpacking its files one-by-one) if an entry with the same name already
exists in the index. This patch allows sparse directory unpacking without a
matching index entry when the following conditions are met:
1. the directory's path is outside the sparse cone, and
2. there are no children of the directory in the index
If a directory meets these requirements (as determined by
'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory
index entry and propagates the decision back up to 'unpack_callback()' to
prevent unnecessary tree traversal into the unpacked directory.
Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
At the end of `git checkout <pathspec>`, we get a message informing how
many entries were updated in the working tree. However, this number can
be inaccurate for two reasons:
1) Delayed entries currently get counted twice.
2) Failed entries are included in the count.
The first problem happens because the counter is first incremented
before inserting the entry in the delayed checkout queue, and once again
when finish_delayed_checkout() calls checkout_entry(). And the second
happens because the counter is incremented too early in
checkout_entry(), before the entry was in fact checked out. Fix that by
moving the count increment further down in the call stack and removing
the duplicate increment on delayed entries. Note that we have to keep
a per-entry reference for the counter (both on parallel checkout and
delayed checkout) because not all entries are always accumulated at the
same counter. See checkout_worktree(), at builtin/checkout.c for an
example.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More fsmonitor--daemon.
* jh/builtin-fsmonitor-part3: (30 commits)
t7527: improve implicit shutdown testing in fsmonitor--daemon
fsmonitor--daemon: allow --super-prefix argument
t7527: test Unicode NFC/NFD handling on MacOS
t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd
t/helper/hexdump: add helper to print hexdump of stdin
fsmonitor: on macOS also emit NFC spelling for NFD pathname
t7527: test FSMonitor on case insensitive+preserving file system
fsmonitor: never set CE_FSMONITOR_VALID on submodules
t/perf/p7527: add perf test for builtin FSMonitor
t7527: FSMonitor tests for directory moves
fsmonitor: optimize processing of directory events
fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed
fsm-health-win32: force shutdown daemon if worktree root moves
fsm-health-win32: add polling framework to monitor daemon health
fsmonitor--daemon: stub in health thread
fsmonitor--daemon: rename listener thread related variables
fsmonitor--daemon: prepare for adding health thread
fsmonitor--daemon: cd out of worktree root
fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS
unpack-trees: initialize fsmonitor_has_run_once in o->result
...
|
|
"sparse-checkout" learns to work well with the sparse-index
feature.
* ds/sparse-sparse-checkout:
sparse-checkout: integrate with sparse index
p2000: add test for 'git sparse-checkout [add|set]'
sparse-index: complete partial expansion
sparse-index: partially expand directories
sparse-checkout: --no-sparse-index needs a full index
cache-tree: implement cache_tree_find_path()
sparse-index: introduce partially-sparse indexes
sparse-index: create expand_index()
t1092: stress test 'git sparse-checkout set'
t1092: refactor 'sparse-index contents' test
|
|
Initialize `o->result.fsmonitor_has_run_once` based upon value
in `o->src_index->fsmonitor_has_run_once` to prevent a second
fsmonitor query during the tree traversal and possibly getting
a skewed view of the working directory.
The checkout code has already talked to the fsmonitor and the
traversal is updating the index as it traverses, so there is
no need to query the fsmonitor.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When modifying the sparse-checkout definition, the sparse-checkout
builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all
cache entries in the index. Before, we needed the index to be fully
expanded in order to ensure we had the full list of files necessary that
match the new patterns.
Insert a call to reset_sparse_directories() that expands sparse
directories that are within the new pattern list, but only far enough
that every necessary file path now exists as a cache entry. The
remaining logic within update_sparsity() will modify the SKIP_WORKTREE
bits appropriately.
This allows us to disable command_requires_full_index within the
sparse-checkout builtin. Add tests that demonstrate that we are not
expanding to a full index unnecessarily.
We can see the improved performance in the p2000 test script:
Test HEAD~1 HEAD
------------------------------------------------------------------------
2000.24: git ... (sparse-v3) 2.14(1.55+0.58) 1.57(1.03+0.53) -26.6%
2000.25: git ... (sparse-v4) 2.20(1.62+0.57) 1.58(0.98+0.59) -28.2%
These reductions of 26-28% are small compared to most examples, but the
time is dominated by writing a new copy of the base repository to the
worktree and then deleting it again. The fact that the previous index
expansion was such a large portion of the time is telling how important
it is to complete this sparse index integration.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When unpacking trees, set the default sparsity of the resultant index based
on repo settings and 'is_sparse_index_allowed()'.
Normally, when executing 'unpack_trees', the output index is marked sparse
when (and only when) it unpacks a sparse directory. However, an index may be
"sparse" even if it contains no sparse directories - when all files fall
inside the sparse-checkout definition or otherwise have SKIP_WORKTREE
disabled. Therefore, the output index may be marked "full" even when it is
"sparse", resulting in unnecessary 'ensure_full_index' calls when writing to
disk. Avoid this by setting the "default" index sparsity to match what is
expected for the repository.
As a consequence of this fix, the (non-merge) 'read-tree' performed when
applying a stash with untracked files no longer expands the index. Update
the corresponding test in 't1092'.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This reverts commit f2a454e0a5 (unpack-trees: improve performance of
next_cache_entry, 2021-11-29).
The "hint" value was originally needed to improve performance in 'git reset
-- <pathspec>' caused by 'cache_bottom' lagging behind its correct value
when using a sparse index. The 'cache_bottom' tracking has since been
corrected, removing the need for an additional "pseudo-cache_bottom"
tracking variable.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Correct tracking of the 'cache_bottom' for cases where sparse directories
are present in the index.
BACKGROUND
----------
The 'unpack_trees_options.cache_bottom' is a variable that tracks the
in-progress "bottom" of the cache as 'unpack_trees()' iterates through the
contents of the index. Most importantly, this value informs the sequential
return values of 'next_cache_entry()' which, in the "diff cache" usage of
'unpack_callback()', are either unpacked as-is or are passed into the diff
machinery.
The 'cache_bottom' is intended to track the position of the first entry in
the index that has not yet been diffed or unpacked. It is advanced in two
main ways: either it is incremented when an index entry is marked as "used"
(in 'mark_ce_used()'), indicating that it was unpacked or diffed, or when a
directory is unpacked, in which case it is increased by an amount equaling
the number of index entries inside that tree.
In 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14), it was
identified that sparse directories posed a problem to the above
'cache_bottom' advancement logic - because a sparse directory was both an
index entry that could be "used" and a directory that can be unpacked, the
'cache_bottom' would be incremented too many times. To solve this problem,
the 'mark_ce_used()' advancement of 'cache_bottom' was skipped for sparse
directories.
INCORRECT CACHE_BOTTOM TRACKING
-------------------------------
Skipping the 'cache_bottom' advancement for sparse directories in
'mark_ce_used()' breaks down in two cases:
1. When the 'unpack_trees()' operation is *not* a "cache diff" (because the
directory contents-based incrementing of 'cache_bottom' does not happen).
2. When a cache diff is performed with a pathspec (because
'unpack_index_entry()' will unpack a sparse directory not matched by the
pathspec without performing the directory contents-based increment).
The former luckily does not appear to affect 'git' behavior, likely because
'cache_bottom' is largely unused (non-"cache diff" 'unpack_trees()' uses
'find_index_entry()' - rather than 'next_cache_entry()' - to find the index
entries to unpack).
The latter, however, causes 'cache_bottom' to "lag behind" its intended
position by an amount equal to the number of sparse directories unpacked so
far with 'unpack_index_entry()'. If a repository is structured such that any
sparse directories are ordered lexicographically *after* any
pathspec-matching directories, though, this issue won't present any adverse
behavior.
This was the case with the 't1092-sparse-checkout-compatibility.sh' tests
before the addition of the 'before/' sparse directory (ordered *before* the
in-cone 'deep/' directory), therefore sidestepping the issue. Once the
'before/' directory was added, though, 'cache_bottom' began to lag behind
its intended position, causing 'next_cache_entry()' to return index entries
it had already processed and, ultimately, an incorrect diff.
CORRECTING CACHE_BOTTOM
-----------------------
The problems observed in 't1092' come from 'cache_bottom' lagging behind in
cases where the cache tree-based advancement doesn't occur. To solve this,
then, the fix in 17a1bb570b is "reversed"; rather than skipping
'cache_bottom' advancement in 'mark_ce_used()', we skip the directory
contents-based advancement for sparse directories. Now, every index entry
can be accounted for in 'cache_bottom':
* if you're working with a single index entry, 'cache_bottom' is incremented
in 'mark_ce_used()'
* if you're working with a directory that contains index entries (but is not
one itself), 'cache_bottom' is incremented by the number of entries in
that directory.
Finally, change the 'test_expect_failure' tests in 't1092' failing due to
this bug back to 'test_expect_success'.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git read-tree" has been made to be aware of the sparse-index
feature.
* vd/sparse-read-tree:
read-tree: make three-way merge sparse-aware
read-tree: make two-way merge sparse-aware
read-tree: narrow scope of index expansion for '--prefix'
read-tree: integrate with sparse index
read-tree: expand sparse checkout test coverage
read-tree: explicitly disallow prefixes with a leading '/'
status: fix nested sparse directory diff in sparse index
sparse-index: prevent repo root from becoming sparse
|
|
Enable use of 'merged_sparse_dir' in 'threeway_merge'. As with two-way
merge, the contents of each conflicted sparse directory are merged without
referencing the index, avoiding sparse index expansion.
As with two-way merge, the 't/t1092-sparse-checkout-compatibility.sh' test
'read-tree --merge with edit/edit conflicts in sparse directories' confirms
that three-way merges with edit/edit changes (both with and without
conflicts) inside a sparse directory result in the correct index state or
error message. To ensure the index is not unnecessarily expanded, add
three-way merge cases to 'sparse index is not expanded: read-tree'.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Enable two-way merge with 'git read-tree' without expanding the sparse
index. When in a sparse index, a two-way merge will trivially succeed as
long as there are not changes to the same sparse directory in multiple trees
(i.e., sparse directory-level "edit-edit" conflicts). If there are such
conflicts, the merge will fail despite the possibility that individual files
could merge cleanly.
In order to resolve these "edit-edit" conflicts, "conflicted" sparse
directories are - rather than rejected - merged by traversing their
associated trees by OID. For each child of the sparse directory:
1. Files are merged as normal (see Documentation/git-read-tree.txt for
details).
2. Subdirectories are treated as sparse directories and merged in
'twoway_merge'. If there are no conflicts, they are merged according to
the rules in Documentation/git-read-tree.txt; otherwise, the subdirectory
is recursively traversed and merged.
This process allows sparse directories to be individually merged at the
necessary depth *without* expanding a full index.
The 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --merge with
edit/edit conflicts in sparse directories' tests two-way merges with 1)
changes inside sparse directories that do not conflict and 2) changes that
do conflict (with the correct file(s) reported in the error message).
Additionally, add two-way merge cases to 'sparse index is not expanded:
read-tree' to confirm that the index is not expanded regardless of whether
edit/edit conflicts are present in a sparse directory.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When 'git read-tree' is provided with a prefix, expand the index only if the
prefix is equivalent to a sparse directory or contained within one. If the
index is not expanded in these cases, 'ce_in_traverse_path' will indicate
that the relevant sparse directory is not in the prefix/traverse path,
skipping past it and not unpacking the appropriate tree(s).
If the prefix is in-cone, its sparse subdirectories (if any) will be
traversed correctly without index expansion.
The behavior of 'git read-tree' with prefixes 1) inside of cone, 2) equal to
a sparse directory, and 3) inside a sparse directory are all tested as part
of the 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --prefix',
ensuring that the sparse index case works the way it did prior to this
change as well as matching non-sparse index sparse-checkout.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For sparse-checkouts, we don't want unpack-trees to error out on files
that are missing from the worktree, so there has traditionally been
logic to make it skip the verify_uptodate() check for these.
Unfortunately, it was skipping the verify_uptodate() check for files
that were expected to *become* SKIP_WORKTREE. For files that were not
already SKIP_WORKTREE, that can cause us to later delete the file in
apply_sparse_checkout(). Only skip the check for files that were
already SKIP_WORKTREE as well to avoid lightly discarding important
changes users may have made to files.
Note 1: unpack-trees.c is already a bit complex, and the logic around
CE_SKIP_WORKTREE and CE_NEW_SKIP_WORKTREE in that file are no exception.
I also tried just replacing CE_NEW_SKIP_WORKTREE with CE_SKIP_WORKTREE
in the verify_uptodate() check instead of checking for both flags, and
found that it also fixed this bug and passed all the tests. I also
attempted to devise a few testcases that might trip either variant of my
fix and was unable to find any problems. It may be that just checking
CE_SKIP_WORKTREE is a better fix, but I'm not sure. I thought it
was a bit safer to strictly reduce the number of cases where we skip the
up-to-date check rather than just toggling which kind of cases skip it,
and thus went with the current variant of the fix.
Note 2: I also wondered if verify_absent() might have a similar bug, but
despite my attempts to try to devise a testcase that would trigger such
a thing, I couldn't find any problematic testcases. Thus, this patch
makes no attempt to apply similar changes to verify_absent() and
verify_absent_if_directory().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Many git commands that deal with working tree files try to remove a
directory that becomes empty (i.e. "git switch" from a branch that
has the directory to another branch that does not would attempt
remove all files in the directory and the directory itself). This
drops users into an unfamiliar situation if the command was run in
a subdirectory that becomes subject to removal due to the command.
The commands have been taught to keep an empty directory if it is
the directory they were started in to avoid surprising users.
* en/keep-cwd:
t2501: simplify the tests since we can now assume desired behavior
dir: new flag to remove_dir_recurse() to spare the original_cwd
dir: avoid incidentally removing the original_cwd in remove_path()
stash: do not attempt to remove startup_info->original_cwd
rebase: do not attempt to remove startup_info->original_cwd
clean: do not attempt to remove startup_info->original_cwd
symlinks: do not include startup_info->original_cwd in dir removal
unpack-trees: add special cwd handling
unpack-trees: refuse to remove startup_info->original_cwd
setup: introduce startup_info->original_cwd
t2501: add various tests for removing the current working directory
|
|
The sparse-index/sparse-checkout feature had a bug in its use of
the matching code to determine which path is in or outside the
sparse checkout patterns.
* ds/sparse-deep-pattern-checkout-fix:
unpack-trees: use traverse_path instead of name
t1092: add deeper changes during a checkout
|
|
Various operating modes of "git reset" have been made to work
better with the sparse index.
* vd/sparse-reset:
unpack-trees: improve performance of next_cache_entry
reset: make --mixed sparse-aware
reset: make sparse-aware (except --mixed)
reset: integrate with sparse index
reset: expand test coverage for sparse checkouts
sparse-index: update command for expand/collapse test
reset: preserve skip-worktree bit in mixed reset
reset: rename is_missing to !is_in_reset_tree
|
|
When running commands such as `git reset --hard` from a subdirectory, if
that subdirectory is in the way of adding needed files, bail with an
error message.
Note that this change looks kind of like it duplicates the new lines of
code from the previous commit in verify_clean_subdirectory(). However,
when we are preserving untracked files, we would rather any error
messages about untracked files being in the way take precedence over
error messages about a subdirectory that happens to be the_original_cwd
being in the way. But in the UNPACK_RESET_OVERWRITE_UNTRACKED case,
there is no untracked checking to be done, so we simply add a special
case near the top of verify_absent_1.
Acked-by: Derrick Stolee <stolee@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the past, when a directory needs to be removed to make room for a
file, we have always errored out when that directory contains any
untracked (but not ignored) files. Add an extra condition on that: also
error out if the directory is the current working directory we inherited
from our parent process.
Acked-by: Derrick Stolee <stolee@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The sparse_dir_matches_path() method compares a cache entry that is a
sparse directory entry against a 'struct traverse_info *info' and a
'struct name_entry *p' to see if the cache entry has exactly the right
name for those other inputs.
This method was introduced in 523506d (unpack-trees: unpack sparse
directory entries, 2021-07-14), but included a significant mistake. The
path comparisons used 'info->name' instead of 'info->traverse_path'.
Since 'info->name' only stores a single tree entry name while
'info->traverse_path' stores the full path from root, this method does
not work when 'info' is in a subdirectory of a directory. Replacing the
right strings and their corresponding lengths make the method work
properly.
The previous change included a failing test that exposes this issue.
That test now passes. The critical detail is that as we go deep into
unpack_trees(), the logic for merging a sparse directory entry with a
tree entry during 'git checkout' relies on this
sparse_dir_matches_path() in order to avoid calling
traverse_trees_recursive() during unpack_callback() in this hunk:
if (!is_sparse_directory_entry(src[0], names, info) &&
traverse_trees_recursive(n, dirmask, mask & ~dirmask,
names, info) < 0) {
return -1;
}
For deep paths, the short-circuit never occurred and
traverse_trees_recursive() was being called incorrectly and that was
causing other strange issues. Specifically, the error message from the
now-passing test previously included this:
error: Your local changes to the following files would be overwritten by checkout:
deep/deeper1/deepest2/a
deep/deeper1/deepest3/a
Please commit your changes or stash them before you switch branches.
Aborting
These messages occurred because the 'current' cache entry in
twoway_merge() was showing as NULL because the index did not contain
entries for the paths contained within the sparse directory entries. We
instead had 'oldtree' given as the entry at HEAD and 'newtree' as the
entry in the target tree. This led to reject_merge() listing these
paths.
Now that sparse_dir_matches_path() works the same for deep paths as it
does for shallow depths, the rest of the logic kicks in to properly
handle modifying the sparse directory entries as designed.
Reported-by: Gustave Granroth <gus.gran@gmail.com>
Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To find the first non-unpacked cache entry, `next_cache_entry` iterates
through index, starting at `cache_bottom`. The performance of this in full
indexes is helped by `cache_bottom` advancing with each invocation of
`mark_ce_used` (called by `unpack_index_entry`). However, the presence of
sparse directories can prevent the `cache_bottom` from advancing in a sparse
index case, effectively forcing `next_cache_entry` to search from the
beginning of the index each time it is called.
The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b
(unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the
benefit `cache_bottom` provides in non-sparse index cases, a separate `hint`
position indicates the first position `next_cache_entry` should search,
updated each execution with a new position.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Leakfix.
* ab/unpack-trees-leakfix:
sequencer: fix a memory leak in do_reset()
sequencer: add a "goto cleanup" to do_reset()
unpack-trees: don't leak memory in verify_clean_subdirectory()
|
|
Various fixes in code paths that move untracked files away to make room.
* en/removing-untracked-fixes:
Documentation: call out commands that nuke untracked files/directories
Comment important codepaths regarding nuking untracked files/dirs
unpack-trees: avoid nuking untracked dir in way of locally deleted file
unpack-trees: avoid nuking untracked dir in way of unmerged file
Change unpack_trees' 'reset' flag into an enum
Remove ignored files by default when they are in the way
unpack-trees: make dir an internal-only struct
unpack-trees: introduce preserve_ignored to unpack_trees_options
read-tree, merge-recursive: overwrite ignored files by default
checkout, read-tree: fix leak of unpack_trees_options.dir
t2500: add various tests for nuking untracked files
|
|
Fix two different but related memory leaks in
verify_clean_subdirectory(). We leaked both the "pathbuf" if
read_directory() returned non-zero, and we never cleaned up our own
"struct dir_struct" either.
* "pathbuf": When the read_directory() call followed by the
free(pathbuf) was added in c81935348be (Fix switching to a branch
with D/F when current branch has file D., 2007-03-15) we didn't
bother to free() before we called die().
But when this code was later libified in 203a2fe1170 (Allow callers
of unpack_trees() to handle failure, 2008-02-07) we started to leak
as we returned data to the caller. This fixes that memory leak,
which can be observed under SANITIZE=leak with e.g. the
"t1001-read-tree-m-2way.sh" test.
* "struct dir_struct": We've leaked the dir_struct ever since this
code was added back in c81935348be.
When that commit was written there wasn't an equivalent of
dir_clear(). Since it was added in 270be816049 (dir.c: provide
clear_directory() for reclaiming dir_struct memory, 2013-01-06)
we've omitted freeing the memory allocated here.
This memory leak could also be observed under SANITIZE=leak and the
"t1001-read-tree-m-2way.sh" test.
This makes all the test in "t1001-read-tree-m-2way.sh" pass under
"GIT_TEST_PASSING_SANITIZE_LEAK=true", we'd previously die in tests
25, 26 & 28.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Traditionally, unpack_trees_options->reset was used to signal that it
was okay to delete any untracked files in the way. This was used by
`git read-tree --reset`, but then started appearing in other places as
well. However, many of the other uses should not be deleting untracked
files in the way. Change this value to an enum so that a value of 1
(i.e. "true") can be split into two:
UNPACK_RESET_PROTECT_UNTRACKED,
UNPACK_RESET_OVERWRITE_UNTRACKED
In order to catch accidental misuses (i.e. where folks call it the way
they traditionally used to), define the special enum value of
UNPACK_RESET_INVALID = 1
which will trigger a BUG().
Modify existing callers so that
read-tree --reset
reset --hard
checkout --force
continue using the UNPACK_RESET_OVERWRITE_UNTRACKED logic, while other
callers, including
am
checkout without --force
stash (though currently dead code; reset always had a value of 0)
numerous callers from rebase/sequencer to reset_head()
will use the new UNPACK_RESET_PROTECT_UNTRACKED value.
Also, note that it has been reported that 'git checkout <treeish>
<pathspec>' currently also allows overwriting untracked files[1]. That
case should also be fixed, but it does not use unpack_trees() and thus
is outside the scope of the current changes.
[1] https://lore.kernel.org/git/15dad590-087e-5a48-9238-5d2826950506@gmail.com/
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Avoid accidental misuse or confusion over ownership by clearly making
unpack_trees_options.dir an internal-only variable.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Currently, every caller of unpack_trees() that wants to ensure ignored
files are overwritten by default needs to:
* allocate unpack_trees_options.dir
* flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags
* call setup_standard_excludes
AND then after the call to unpack_trees() needs to
* call dir_clear()
* deallocate unpack_trees_options.dir
That's a fair amount of boilerplate, and every caller uses identical
code. Make this easier by instead introducing a new boolean value where
the default value (0) does what we want so that new callers of
unpack_trees() automatically get the appropriate behavior. And move all
the handling of unpack_trees_options.dir into unpack_trees() itself.
While preserve_ignored = 0 is the behavior we feel is the appropriate
default, we defer fixing commands to use the appropriate default until a
later commit. So, this commit introduces several locations where we
manually set preserve_ignored=1. This makes it clear where code paths
were previously preserving ignored files when they should not have been;
a future commit will flip these to instead use a value of 0 to get the
behavior we want.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In cone mode, the sparse-index code path learned to remove ignored
files (like build artifacts) outside the sparse cone, allowing the
entire directory outside the sparse cone to be removed, which is
especially useful when the sparse patterns change.
* ds/sparse-index-ignored-files:
sparse-checkout: clear tracked sparse dirs
sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag
attr: be careful about sparse directories
sparse-checkout: create helper methods
sparse-index: use WRITE_TREE_MISSING_OK
sparse-index: silently return when cache tree fails
unpack-trees: fix nested sparse-dir search
sparse-index: silently return when not using cone-mode patterns
t7519: rewrite sparse index test
|
|
Code clean up to migrate callers from older advice_config[] based
API to newer advice_if_enabled() and advice_enabled() API.
* ab/retire-advice-config:
advice: move advice.graftFileDeprecated squashing to commit.[ch]
advice: remove use of global advice_add_embedded_repo
advice: remove read uses of most global `advice_` variables
advice: add enum variants for missing advice variables
|
|
The iterated search in find_cache_entry() was recently modified to
include a loop that searches backwards for a sparse directory entry that
matches the given traverse_info and name_entry. However, the string
comparison failed to actually concatenate those two strings, so this
failed to find a sparse directory when it was not a top-level directory.
This caused some errors in rare cases where a 'git checkout' spanned a
diff that modified files within the sparse directory entry, but we could
not correctly find the entry.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The 'Filtering contents...' progress report from delayed checkout is
displayed even when checkout and clone are invoked with --quiet or
--no-progress. Furthermore, it is displayed unconditionally, without
first checking whether stdout is a tty. Let's fix these issues and also
add some regression tests for the two code paths that currently use
delayed checkout: unpack_trees.c:check_updates() and
builtin/checkout.c:checkout_worktree().
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In c4a09cc9ccb (Merge branch 'hw/advise-ng', 2020-03-25), a new API for
accessing advice variables was introduced and deprecated `advice_config`
in favor of a new array, `advice_setting`.
This patch ports all but two uses which read the status of the global
`advice_` variables over to the new `advice_enabled` API. We'll deal
with advice_add_embedded_repo and advice_graft_file_deprecated
separately.
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git checkout" and "git commit" learn to work without unnecessarily
expanding sparse indexes.
* ds/commit-and-checkout-with-sparse-index:
unpack-trees: resolve sparse-directory/file conflicts
t1092: document bad 'git checkout' behavior
checkout: stop expanding sparse indexes
sparse-index: recompute cache-tree
commit: integrate with sparse-index
p2000: compress repo names
p2000: add 'git checkout -' test and decrease depth
|
|
"git read-tree" had a codepath where blobs are fetched one-by-one
from the promisor remote, which has been corrected to fetch in bulk.
* jt/bulk-prefetch:
cache-tree: prefetch in partial clone read-tree
unpack-trees: refactor prefetching code
|
|
Refactor the prefetching code in unpack-trees.c into its own function,
because it will be used elsewhere in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When running unpack_trees() with a sparse index, we attempt to operate
on the index without expanding the sparse directory entries. Thus, we
operate by manipulating entire directories and passing them to the
unpack function. In the case of the 'git checkout' command, this is the
twoway_merge() function.
There are several cases in twoway_merge() that handle different
situations. One new one to add is the case of a directory/file conflict
where the directory is sparse. Before the sparse index, such a conflict
would appear as a list of file additions and deletions. Now,
twoway_merge() initializes 'current', 'oldtree', and 'newtree' from
src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is
equal to the df_conflict_entry. The way to determine that we have a
directory/file conflict is to test that 'current' and 'newtree' disagree
on being sparse directory entries.
When we are in this case, we want to resolve the situation by calling
merged_entry(). This allows replacing the 'current' entry with the
'newtree' entry. This is important for cases where we want to run 'git
checkout' across the conflict and have the new HEAD represent the new
file type at that path. The first NEEDSWORK comment dropped in t1092
demonstrates this necessary behavior.
However, we still are in a confusing state when 'current' corresponds to
a staged change within a sparse directory that is not present at HEAD.
This should be atypical, because it requires adding a change outside of
the sparse-checkout cone, but it is possible. Since we are unable to
determine that this is a staged change within twoway_merge(), we cannot
add a case to reject the merge at this point. I believe this is due to
the use of df_conflict_entry in the place of 'oldtree' instead of using
the valud at HEAD, which would provide some perspective to this
decision. Any change that would allow this differentiation for staged
entries would need to involve information further up in unpack_trees().
That work should be done, sometime, because we are further confusing the
behavior of a directory/file conflict when staging a change in the
directory. The two cases 'checkout behaves oddly with df-conflict-?' in
t1092 demonstrate that even without a sparse-checkout, Git is not
consistent in its behavior. Neither of the two options seems correct,
either. This change makes the sparse-index behave differently than the
typcial sparse-checkout case, but it does match the full checkout
behavior in the df-conflict-2 case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
During unpack_callback(), index entries are compared against tree
entries. These are matched according to names and types. One goal is to
decide if we should recurse into subtrees or simply operate on one index
entry.
In the case of a sparse-directory entry, we do not want to recurse into
that subtree and instead simply compare the trees. In some cases, we
might want to perform a merge operation on the entry, such as during
'git checkout <commit>' which wants to replace a sparse tree entry with
the tree for that path at the target commit. We extend the logic within
unpack_single_entry() to create a sparse-directory entry in this case,
and then that is sent to call_unpack_fn().
There are some subtleties in this process. For instance, we need to
update find_cache_entry() to allow finding a sparse-directory entry that
exactly matches a given path. Use the new helper method
sparse_dir_matches_path() for this. We also need to ignore conflict
markers in the case that the entries correspond to directories and we
already have a sparse directory entry.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the next change, we will use this method to unpack a sparse directory
entry, so change the name to unpack_single_entry() so these entries
apply. The new name reflects that we will not recurse into trees in
order to resolve the conflicts.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As we further integrate the sparse-index into unpack-trees, we need to
ensure that we compare sparse directory entries correctly with other
entries. This affects searching for an exact path as well as sorting
index entries.
Sparse directory entries contain the trailing directory separator. This
is important for the sorting, in particular. Thus, within
do_compare_entry() we stop using S_IFREG in all cases, since sparse
directories should use S_IFDIR to indicate that the comparison should
treat the entry name as a dirctory.
Within compare_entry(), it first calls do_compare_entry() to check the
leading portion of the name. When the input path is a directory name, we
could match exactly already. Thus, we should return 0 if we have an
exact string match on a sparse directory entry. The final check is a
length comparison between the strings.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The cache_bottom member of 'struct unpack_trees_options' is used to
track the range of index entries corresponding to a node of the cache
tree. While recursing with traverse_by_cache_tree(), this value is
preserved on the call stack using a local and then restored as that
method returns.
The mark_ce_used() method normally modifies the cache_bottom member when
it refers to the marked cache entry. However, sparse directory entries
are stored as nodes in the cache-tree data structure as of 2de37c53
(cache-tree: integrate with sparse directory entries, 2021-03-30). Thus,
the cache_bottom will be modified as the cache-tree walk advances. Do
not update it as well within mark_ce_used().
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The final part of "parallel checkout".
* mt/parallel-checkout-part-3:
ci: run test round with parallel-checkout enabled
parallel-checkout: add tests related to .gitattributes
t0028: extract encoding helpers to lib-encoding.sh
parallel-checkout: add tests related to path collisions
parallel-checkout: add tests for basic operations
checkout-index: add parallel checkout support
builtin/checkout.c: complete parallel checkout support
make_transient_cache_entry(): optionally alloc from mem_pool
|
|
Allow make_transient_cache_entry() to optionally receive a mem_pool
struct in which it should allocate the entry. This will be used in the
following patch, to store some transient entries which should persist
until parallel checkout finishes.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The checkout machinery has been taught to perform the actual
write-out of the files in parallel when able.
* mt/parallel-checkout-part-2:
parallel-checkout: add design documentation
parallel-checkout: support progress displaying
parallel-checkout: add configuration options
parallel-checkout: make it truly parallel
unpack-trees: add basic support for parallel checkout
|
|
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.
* ds/sparse-index-protections: (47 commits)
name-hash: use expand_to_path()
sparse-index: expand_to_path()
name-hash: don't add directories to name_hash
revision: ensure full index
resolve-undo: ensure full index
read-cache: ensure full index
pathspec: ensure full index
merge-recursive: ensure full index
entry: ensure full index
dir: ensure full index
update-index: ensure full index
stash: ensure full index
rm: ensure full index
merge-index: ensure full index
ls-files: ensure full index
grep: ensure full index
fsck: ensure full index
difftool: ensure full index
commit: ensure full index
checkout: ensure full index
...
|
|
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make parallel checkout configurable by introducing two new settings:
checkout.workers and checkout.thresholdForParallelism. The first defines
the number of workers (where one means sequential checkout), and the
second defines the minimum number of entries to attempt parallel
checkout.
To decide the default value for checkout.workers, the parallel version
was benchmarked during three operations in the linux repo, with cold
cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and
checking out v5.8 from v5.7 (checkout II). The four tables below show
the mean run times and standard deviations for 5 runs in: a local file
system on SSD, a local file system on HDD, a Linux NFS server, and
Amazon EFS (all on Linux). Each parallel checkout test was executed with
the number of workers that brings the best overall results in that
environment.
Local SSD:
Sequential 10 workers Speedup
Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03
Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03
Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03
Local HDD:
Sequential 10 workers Speedup
Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03
Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18
Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07
Linux NFS server (v4.1, on EBS, single availability zone):
Sequential 32 workers Speedup
Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13
Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07
Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08
EFS (v4.1, replicated over multiple availability zones):
Sequential 32 workers Speedup
Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07
Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03
Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03
The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. For this reason,
the default value for checkout.workers is one, a.k.a. sequential
checkout.
To decide the default value for checkout.thresholdForParallelism,
another benchmark was executed in the "Local SSD" setup, where parallel
checkout showed to be beneficial. This time, we compared the runtime of
a `git checkout -f`, with and without parallelism, after randomly
removing an increasing number of files from the Linux working tree. The
"sequential fallback" column below corresponds to the executions where
checkout.workers was 10 but checkout.thresholdForParallelism was equal
to the number of to-be-updated files plus one (so that we end up writing
sequentially). Each test case was sampled 15 times, and each sample had
a randomly different set of files removed. Here are the results:
sequential fallback 10 workers speedup
10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02
20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02
50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02
100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04
200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05
500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09
1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12
2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12
From the above numbers, 100 files seems to be a reasonable default value
for the threshold setting.
Note: Up to 1000 files, we observe a drop in the execution time of the
parallel code with an increase in the number of files. This is a rather
odd behavior, but it was observed in multiple repetitions. Above 1000
files, the execution time increases according to the number of files, as
one would expect.
About the test environments: Local SSD tests were executed on an
i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
the linux repository was removed (or checked out back to its previous
state), and `sync && sysctl vm.drop_caches=3` was executed.
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.
Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:
- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
detects it by looking for EEXIST and EISDIR errors after an
open(O_CREAT | O_EXCL) failure.
- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
at the has_dirs_only_path() check, which is done for the leading path
of each item in the parallel checkout queue.
Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).
After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Preparatory API changes for parallel checkout.
* mt/parallel-checkout-part-1:
entry: add checkout_entry_ca() taking preloaded conv_attrs
entry: move conv_attrs lookup up to checkout_entry()
entry: extract update_ce_after_write() from write_entry()
entry: make fstat_output() and read_blob_entry() public
entry: extract a header file for entry.c functions
convert: add classification for conv_attrs struct
convert: add get_stream_filter_ca() variant
convert: add [async_]convert_to_working_tree_ca() variants
convert: make convert_attrs() and convert structs public
|
|
When "git checkout" removes a path that does not exist in the
commit it is checking out, it wasn't careful enough not to follow
symbolic links, which has been corrected.
* mt/checkout-remove-nofollow:
checkout: don't follow symlinks when removing entries
symlinks: update comment on threaded_check_leading_path()
|
|
The index_pos_by_traverse_info() currently throws a BUG() when a
directory entry exists exactly in the index. We need to consider that it
is possible to have a directory in a sparse index as long as that entry
is itself marked with the skip-worktree bit.
The 'pos' variable is assigned a negative value if an exact match is not
found. Since a directory name can be an exact match, it is no longer an
error to have a nonnegative 'pos' value.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The next change will translate full indexes into sparse indexes at write
time. The existing logic provides a way for every sparse index to be
expanded to a full index at read time. However, there are cases where an
index is written and then continues to be used in-memory to perform
further updates.
unpack_trees() is frequently called after such a write. In particular,
commands like 'git reset' do this double-update of the index.
Ensure that we have a full index when entering unpack_trees(), but only
when command_requires_full_index is true. This is always true at the
moment, but we will later relax that after unpack_trees() is updated to
handle sparse directory entries.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The declarations of entry.c's public functions and structures currently
reside in cache.h. Although not many, they contribute to the size of
cache.h and, when changed, cause the unnecessary recompilation of
modules that don't really use these functions. So let's move them to a
new entry.h header. While at it let's also move a comment related to
checkout_entry() from entry.c to entry.h as it's more useful to describe
the function there.
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git stash show" learned to optionally show untracked part of the
stash.
* dl/stash-show-untracked:
stash show: learn stash.showIncludeUntracked
stash show: teach --include-untracked and --only-untracked
|
|
The data structure used by fsmonitor interface was not properly
duplicated during an in-core merge, leading to use-after-free etc.
* js/fsmonitor-unpack-fix:
fsmonitor: do not forget to release the token in `discard_index()`
fsmonitor: fix memory corruption in some corner cases
|
|
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20),
symlink.c:check_leading_path() started returning different codes for
FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was
not adjusted for this change, so it started to follow symlinks on the
leading path of to-be-removed entries. Fix that and add a regression
test.
Note that since 1d718a5108 check_leading_path() no longer differentiates
the case where it found a symlink in the path's leading components from
the cases where it found a regular file or failed to lstat() the
component. So, a side effect of this current patch is that
unlink_entry() now returns early in all of these three cases. And
because we no longer try to unlink such paths, we also don't get the
warning from remove_or_warn().
For the regular file and symlink cases, it's questionable whether the
warning was useful in the first place: unlink_entry() removes tracked
paths that should no longer be present in the state we are checking out
to. If the path had its leading dir replaced by another file, it means
that the basename already doesn't exist, so there is no need for a
warning. Sure, we are leaving a regular file or symlink behind at the
path's dirname, but this file is either untracked now (so again, no
need to warn), or it will be replaced by a tracked file during the next
phase of this checkout operation.
As for failing to lstat() one of the leading components, the basename
might still exist only we cannot unlink it (e.g. due to the lack of the
required permissions). Since the user expect it to be removed
(especially with checkout's --no-overlay option), add back the warning
in this more relevant case.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 56c6910028a (fsmonitor: change last update timestamp on the
index_state to opaque token, 2020-01-07), we forgot to adjust the part
of `unpack_trees()` that copies the FSMonitor "last-update" information
that we copy from the source index to the result index since 679f2f9fdd2
(unpack-trees: skip stat on fsmonitor-valid files, 2019-11-20).
Since the "last-update" information is no longer a 64-bit number, but a
free-form string that has been allocated, we need to duplicate it rather
than just copying it.
This is important because there _are_ cases when `unpack_trees()` will
perform a oneway merge that implicitly calls `refresh_fsmonitor()`
(which will allocate that "last-update" token). This happens _after_
that token was copied into the result index. However, we _then_ call
`check_updates()` on that index, which will _also_ call
`refresh_fsmonitor()`, accessing the "last-update" string, which by now
would be released already.
In the instance that lead to this patch, this caused a segmentation
fault during a lengthy, complicated rebase involving the todo command
`reset` that (crucially) had to updated many files. Unfortunately, it
seems very hard to trigger that crash, therefore this patch is not
accompanied by a regression test.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stash entries can be made with untracked files via
`git stash push --include-untracked`. However, because the untracked
files are stored in the third parent of the stash entry and not the
stash entry itself, running `git stash show` does not include the
untracked files as part of the diff.
With --include-untracked, untracked paths, which are recorded in the
third-parent if it exists, are shown in addition to the paths that have
modifications between the stash base and the working tree in the stash.
It is possible to manually craft a malformed stash entry where duplicate
untracked files in the stash entry will mask tracked files. We detect
and error out in that case via a custom unpack_trees() callback:
stash_worktree_untracked_merge().
Also, teach stash the --only-untracked option which only shows the
untracked files of a stash entry. This is similar to `git show stash^3`
but it is nice to provide a convenient abstraction for it so that users
do not have to think about the underlying implementation.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-2.28:
Git 2.28.1
Git 2.27.1
Git 2.26.3
Git 2.25.5
Git 2.24.4
Git 2.23.4
Git 2.22.5
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.26:
Git 2.26.3
Git 2.25.5
Git 2.24.4
Git 2.23.4
Git 2.22.5
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.25:
Git 2.25.5
Git 2.24.4
Git 2.23.4
Git 2.22.5
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.24:
Git 2.24.4
Git 2.23.4
Git 2.22.5
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.23:
Git 2.23.4
Git 2.22.5
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.22:
Git 2.22.5
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.21:
Git 2.21.4
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.20:
Git 2.20.5
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.19:
Git 2.19.6
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.18:
Git 2.18.5
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
* maint-2.17:
Git 2.17.6
unpack_trees(): start with a fresh lstat cache
run-command: invalidate lstat cache after a command finished
checkout: fix bug that makes checkout follow symlinks in leading path
|
|
We really want to avoid relying on stale information.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
|
|
A future feature will want to load the sparse-checkout patterns into a
pattern_list, but the current mechanism to do so is a bit complicated.
This is made difficult due to needing to find the sparse-checkout file
in different ways throughout the codebase.
The logic implemented in the new get_sparse_checkout_patterns() was
duplicated in populate_from_existing_patterns() in unpack-trees.c. Use
the new method instead, keeping the logic around handling the struct
unpack_trees_options.
The callers to get_sparse_checkout_filename() in
builtin/sparse-checkout.c manipulate the sparse-checkout file directly,
so it is not appropriate to replace logic in that file with
get_sparse_checkout_patterns().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the method safer by allocating a cache_tree member for the given
index_state if it is not already present. This is preferrable to a
BUG() statement or returning with an error because future callers will
want to populate an empty cache-tree using this method.
Callers can also remove their conditional allocations of cache_tree.
Also drop local variables that can be found directly from the 'istate'
parameter.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The unpack_trees() method is quite complicated and its performance can
change dramatically depending on how it is used. We already have some
performance tracing regions, but they have not been updated to the
trace2 API. Do so now.
We already have trace2 regions in unpack_trees.c:clear_ce_flags(), which
uses a linear scan through the index without recursing into trees.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).
This patch converts all of the remaining files, as the resulting diff is
reasonably sized.
The conversion was done purely mechanically with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe '
s/ARGV_ARRAY/STRVEC/g;
s/argv_array/strvec/g;
'
We'll deal with any indentation/style fallouts separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe 's/argv-array.h/strvec.h/'
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Consistency fix to a topic already in 'master'.
* en/sparse-checkout:
unpack-trees: also allow get_progress() to work on a different index
|
|
commit b0a5a12a60 ("unpack-trees: allow check_updates() to work on a
different index", 2020-03-27) allowed check_updates() to work on a
different index, but it called get_progress() which was hardcoded to
work on o->result much like check_updates() had been. Update it to also
accept an index parameter and have check_updates() pass that parameter
along so that both are working on the same index.
Noticed-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code to skip unmerged paths in the index when sparse checkout
is in use would have made out-of-bound access of the in-core index
when the last path was unmerged, which has been corrected.
* ds/sparse-updates-oob-access-fix:
unpack-trees: avoid array out-of-bounds error
|
|
The sparse-checkout patterns have been forbidden from excluding all
paths, leaving an empty working tree, for a long time. This
limitation has been lifted.
* ds/sparse-allow-empty-working-tree:
sparse-checkout: stop blocking empty workdirs
|
|
The loop in warn_conflicted_path() that checks for the count of
entries with the same path uses "i+count" for the array
entry. However, the loop only verifies that the value of count is
below the array size. Fix this by adding i to the condition.
I hit this condition during a test of the in-tree sparse-checkout
feature, so it is exercised by the end of the series.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
[jc: readability fix]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove the error condition when updating the sparse-checkout leaves
an empty working directory.
This behavior was added in 9e1afb167 (sparse checkout: inhibit empty
worktree, 2009-08-20). The comment was added in a7bc906f2 (Add
explanation why we do not allow to sparse checkout to empty working
tree, 2011-09-22) in response to a "dubious" comment in 84563a624
(unpack-trees.c: cosmetic fix, 2010-12-22).
With the recent "cone mode" and "git sparse-checkout init [--cone]"
command, it is common to set a reasonable sparse-checkout pattern
set of
/*
!/*/
which matches only files at root. If the repository has no such files,
then their "git sparse-checkout init" command will fail.
Now that we expect this to be a common pattern, we should not have the
commands fail on an empty working directory. If it is a confusing
result, then the user can recover with "git sparse-checkout disable"
or "git sparse-checkout set". This is especially simple when using cone
mode.
Reported-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"sparse-checkout" UI improvements.
* en/sparse-checkout:
sparse-checkout: provide a new reapply subcommand
unpack-trees: failure to set SKIP_WORKTREE bits always just a warning
unpack-trees: provide warnings on sparse updates for unmerged paths too
unpack-trees: make sparse path messages sound like warnings
unpack-trees: split display_error_msgs() into two
unpack-trees: rename ERROR_* fields meant for warnings to WARNING_*
unpack-trees: move ERROR_WOULD_LOSE_SUBMODULE earlier
sparse-checkout: use improved unpack_trees porcelain messages
sparse-checkout: use new update_sparsity() function
unpack-trees: add a new update_sparsity() function
unpack-trees: pull sparse-checkout pattern reading into a new function
unpack-trees: do not mark a dirty path with SKIP_WORKTREE
unpack-trees: allow check_updates() to work on a different index
t1091: make some tests a little more defensive against failures
unpack-trees: simplify pattern_list freeing
unpack-trees: simplify verify_absent_sparse()
unpack-trees: remove unused error type
unpack-trees: fix minor typo in comment
|
|
"git diff" in a partial clone learned to avoid lazy loading blob
objects in more casese when they are not needed.
* jt/avoid-prefetch-when-able-in-diff:
diff: restrict when prefetching occurs
diff: refactor object read
diff: make diff_populate_filespec_options struct
promisor-remote: accept 0 as oid_nr in function
|
|
There are 3 callers to promisor_remote_get_direct() that first check if
the number of objects to be fetched is equal to 0. Fold that check into
promisor_remote_get_direct(), and in doing so, be explicit as to what
promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success,
immediately).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Setting and clearing of the SKIP_WORKTREE bit is not only done when
users run 'sparse-checkout'; other commands such as 'checkout' also run
through unpack_trees() which has logic for handling this special bit.
As such, we need to consider how they handle special cases. A couple
comparison points should help explain the rationale for changing how
unpack_trees() handles these bits:
Ignoring sparse checkouts for a moment, if you are switching
branches and have dirty changes, it is only considered an error that
will prevent the branch switching from being successful if the dirty
file happens to be one of the paths with different contents.
SKIP_WORKTREE has always been considered advisory; for example, if
rebase or merge need or even want to materialize a path as part of
their work, they have always been allowed to do so regardless of the
SKIP_WORKTREE setting. This has been used for unmerged paths, but
it was often used for paths it wasn't needed just because it made
the code simpler. It was a best-effort consideration, and when it
materialized paths contrary to the SKIP_WORKTREE setting, it was
never required to even print a warning message.
In the past if you trying to run e.g. 'git checkout' and:
1) you had a path that was materialized and had some dirty changes
2) the path was listed in $GITDIR/info/sparse-checkout
3) this path did not different between the current and target branches
then despite the comparison points above, the inability to set
SKIP_WORKTREE was treated as a *hard* error that would abort the
checkout operation. This is completely inconsistent with how
SKIP_WORKTREE is handled elsewhere, and rather annoying for users as
leaving the paths materialized in the working copy (with a simple
warning) should present no problem at all.
Downgrade any errors from inability to toggle the SKIP_WORKTREE bit to a
warning and allow the operations to continue.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When sparse-checkout runs to update the list of sparsity patterns, it
gives warnings if it can't remove paths from the working tree because
those files have dirty changes. Add a similar warning for unmerged
paths as well.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The messages for problems with sparse paths are phrased as errors that
cause the operation to abort, even though we are not making the
operation abort. Reword the messages to make sense in their new
context.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
display_error_msgs() is never called to show messages of both ERROR_*
and WARNING_* types at the same time; it is instead called multiple
times, separately for each type. Since we want to display these types
differently, make two slightly different versions of this function.
A subsequent commit will further modify unpack_trees() and how it calls
the new display_warning_msgs().
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We want to treat issues with setting the SKIP_WORKTREE bit as a warning
rather than an error; rename the enum values to reflect this intent as
a simple step towards that goal.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A minor change, but we want to convert the sparse messages to warnings
and this allows us to group warnings and errors.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Previously, the only way to update the SKIP_WORKTREE bits for various
paths was invoking `git read-tree -mu HEAD` or calling the same code
that this codepath invoked. This however had a number of problems if
the index or working directory were not clean. First, let's consider
the case:
Flipping SKIP_WORKTREE -> !SKIP_WORKTREE (materializing files)
If the working tree was clean this was fine, but if there were files or
directories or symlinks or whatever already present at the given path
then the operation would abort with an error. Let's label this case
for later discussion:
A) There is an untracked path in the way
Now let's consider the opposite case:
Flipping !SKIP_WORKTREE -> SKIP_WORKTREE (removing files)
If the index and working tree was clean this was fine, but if there were
any unclean paths we would run into problems. There are three different
cases to consider:
B) The path is unmerged
C) The path has unstaged changes
D) The path has staged changes (differs from HEAD)
If any path fell into case B or C, then the whole operation would be
aborted with an error. With sparse-checkout, the whole operation would
be aborted for case D as well, but for its predecessor of using `git
read-tree -mu HEAD` directly, any paths that fell into case D would be
removed from the working copy and the index entry for that path would be
reset to match HEAD -- which looks and feels like data loss to users
(only a few are even aware to ask whether it can be recovered, and even
then it requires walking through loose objects trying to match up the
right ones).
Refusing to remove files that have unsaved user changes is good, but
refusing to work on any other paths is very problematic for users. If
the user is in the middle of a rebase or has made modifications to files
that bring in more dependencies, then for their build to work they need
to update the sparse paths. This logic has been preventing them from
doing so. Sometimes in response, the user will stage the files and
re-try, to no avail with sparse-checkout or to the horror of losing
their changes if they are using its predecessor of `git read-tree -mu
HEAD`.
Add a new update_sparsity() function which will not error out in any of
these cases but behaves as follows for the special cases:
A) Leave the file in the working copy alone, clear the SKIP_WORKTREE
bit, and print a warning (thus leaving the path in a state where
status will report the file as modified, which seems logical).
B) Do NOT mark this path as SKIP_WORKTREE, and leave it as unmerged.
C) Do NOT mark this path as SKIP_WORKTREE and print a warning about
the dirty path.
D) Mark the path as SKIP_WORKTREE, but do not revert the version
stored in the index to match HEAD; leave the contents alone.
I tried a different behavior for A (leave the SKIP_WORKTREE bit set),
but found it very surprising and counter-intuitive (e.g. the user sees
it is present along with all the other files in that directory, tries to
stage it, but git add ignores it since the SKIP_WORKTREE bit is set). A
& C seem like optimal behavior to me. B may be as well, though I wonder
if printing a warning would be an improvement. Some might be slightly
surprised by D at first, but given that it does the right thing with
`git commit` and even `git commit -a` (`git add` ignores entries that
are marked SKIP_WORKTREE and thus doesn't delete them, and `commit -a`
is similar), it seems logical to me.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Create a populate_from_existing_patterns() function for reading the
path_patterns from $GIT_DIR/info/sparse-checkout so that we can re-use
it elsewhere.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If a path is dirty, removing from the working tree risks losing data.
As such, we want to make sure any such path is not marked with
SKIP_WORKTREE. While the current callers of this code detect this case
and re-populate with a previous set of sparsity patterns, we want to
allow some paths to be marked with SKIP_WORKTREE while others are left
unmarked without it being considered an error. The reason this
shouldn't be considered an error is that SKIP_WORKTREE has always been
an advisory-only setting; merge and rebase for example were free to
materialize paths and clear the SKIP_WORKTREE bit in order to accomplish
their work even though they kept the SKIP_WORKTREE bit set for other
paths. Leaving dirty working files in the working tree is thus a
natural extension of what we have already been doing.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
check_updates() previously assumed it was working on o->result. We want
to use this function in combination with a different index_state, so
take the intended index_state as a parameter.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
commit e091228e17 ("sparse-checkout: update working directory
in-process", 2019-11-21) allowed passing a pre-defined set of patterns
to unpack_trees(). However, if o->pl was NULL, it would still read the
existing patterns and use those. If those patterns were read into a
data structure that was allocated, naturally they needed to be free'd.
However, despite the same function being responsible for knowing about
both the allocation and the free'ing, the logic for tracking whether to
free the pattern_list was hoisted to an outer function with an
additional flag in unpack_trees_options. Put the logic back in the
relevant function and discard the now unnecessary flag.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
verify_absent_sparse() was introduced in commit 08402b0409
("merge-recursive: distinguish "removed" and "overwritten" messages",
2010-08-11), and has always had exactly one caller which always passes
error_type == ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN. This function
then checks whether error_type is this value, and if so, sets it instead
to ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN. It has been nearly a decade
and no other caller has been created, and no other value has ever been
passed, so just pass the expected value to begin with.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
commit 08402b0409 ("merge-recursive: distinguish "removed" and
"overwritten" messages", 2010-08-11) split
ERROR_WOULD_LOSE_UNTRACKED
into both
ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN
ERROR_WOULD_LOSE_UNTRACKED_REMOVED
and also split
ERROR_WOULD_LOSE_ORPHANED
into both
ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN
ERROR_WOULD_LOSE_ORPHANED_REMOVED
However, despite the split only three of these four types were used.
ERROR_WOULD_LOSE_ORPHANED_REMOVED was not put into use when it was
introduced and nothing else has used it in the intervening decade
either. Remove it.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Provide more information (e.g. the object of the tree-ish in which
the blob being converted appears, in addition to its path, which
has already been given) to smudge/clean conversion filters.
* bc/filter-process:
t0021: test filter metadata for additional cases
builtin/reset: compute checkout metadata for reset
builtin/rebase: compute checkout metadata for rebases
builtin/clone: compute checkout metadata for clones
builtin/checkout: compute checkout metadata for checkouts
convert: provide additional metadata to filters
convert: permit passing additional metadata to filter processes
builtin/checkout: pass branch info down to checkout_worktree
|
|
Fix "git checkout --recurse-submodules" of a nested submodule
hierarchy.
* pb/recurse-submodules-fix:
t/lib-submodule-update: add test removing nested submodules
unpack-trees: check for missing submodule directory in merged_entry
unpack-trees: remove outdated description for verify_clean_submodule
t/lib-submodule-update: move a test to the right section
t/lib-submodule-update: remove outdated test description
t7112: remove mention of KNOWN_FAILURE_SUBMODULE_RECURSIVE_NESTED
|
|
Code simplification.
* en/simplify-check-updates-in-unpack-trees:
unpack-trees: exit check_updates() early if updates are not wanted
|
|
C pedantry ;-) fix.
* jk/clang-sanitizer-fixes:
obstack: avoid computing offsets from NULL pointer
xdiff: avoid computing non-zero offset from NULL pointer
avoid computing zero offsets from NULL pointer
merge-recursive: use subtraction to flip stage
merge-recursive: silence -Wxor-used-as-pow warning
|
|
Provide commit metadata for checkout code paths that use unpack_trees
and friends. When we're checking out a commit, use the commit
information, but don't provide commit information if we're checking out
from the index, since there need not be any particular commit associated
with the index, and even if there is one, we can't know what it is.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using `git checkout --recurse-submodules` to switch between a
branch with no submodules and a branch with initialized nested
submodules currently causes a fatal error:
$ git checkout --recurse-submodules branch-with-nested-submodules
fatal: exec '--super-prefix=submodule/nested/': cd to 'nested'
failed: No such file or directory
error: Submodule 'nested' could not be updated.
error: Submodule 'submodule/nested' cannot checkout new HEAD.
error: Submodule 'submodule' could not be updated.
M submodule
Switched to branch 'branch-with-nested-submodules'
The checkout succeeds but the worktree and index of the first level
submodule are left empty:
$ cd submodule
$ git -c status.submoduleSummary=1 status
HEAD detached at b3ce885
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
deleted: .gitmodules
deleted: first.t
deleted: nested
fatal: not a git repository: 'nested/.git'
Submodule changes to be committed:
* nested 1e96f59...0000000:
$ git ls-files -s
$ # empty
$ ls -A
.git
The reason for the fatal error during the checkout is that a child git
process tries to cd into the yet unexisting nested submodule directory.
The sequence is the following:
1. The main git process (the one running in the superproject) eventually
reaches write_entry() in entry.c, which creates the first level
submodule directory and then calls submodule_move_head() in submodule.c,
which spawns `git read-tree` in the submodule directory.
2. The first child git process (the one in the submodule of the
superproject) eventually calls check_submodule_move_head() at
unpack_trees.c:2021, which calls submodule_move_head in dry-run mode,
which spawns `git read-tree` in the nested submodule directory.
3. The second child git process tries to chdir() in the yet unexisting
nested submodule directory in start_command() at run-command.c:829 and
dies before exec'ing.
The reason why check_submodule_move_head() is reached in the first child
and not in the main process is that it is inside an
if(submodule_from_ce()) construct, and submodule_from_ce() returns a
valid struct submodule pointer, whereas it returns a null pointer in the
main git process.
The reason why submodule_from_ce() returns a null pointer in the main
git process is because the call to cache_lookup_path() in config_from()
(called from submodule_from_path() in submodule_from_ce()) returns a
null pointer since the hashmap "for_path" in the submodule_cache of
the_repository is not yet populated. It is not populated because both
repo_get_oid(repo, GITMODULES_INDEX, &oid) and repo_get_oid(repo,
GITMODULES_HEAD, &oid) in config_from_gitmodules() at
submodule-config.c:639-640 return -1, as at this stage of the operation,
neither the HEAD of the superproject nor its index contain any
.gitmodules file.
In contrast, in the first child the hashmap is populated because
repo_get_oid(repo, GITMODULES_HEAD, &oid) returns 0 as the HEAD of the
first level submodule, i.e. .git/modules/submodule/HEAD, points to a
commit where .gitmodules is present and records 'nested' as a submodule.
Fix this bug by checking that the submodule directory exists before
calling check_submodule_move_head() in merged_entry() in the `if(!old)`
branch, i.e. if going from a commit with no submodule to a commit with a
submodule present.
Also protect the other call to check_submodule_move_head() in
merged_entry() the same way as it is safer, even though the `else if
(!(old->ce_flags & CE_CONFLICTED))` branch of the code is not at play in
the present bug.
The other calls to check_submodule_move_head() in other functions in
unpack_trees.c are all already protected by calls to lstat() somewhere
in
the program flow so we don't need additional protection for them.
All commands in the unpack_trees machinery are affected, i.e. checkout,
reset and read-tree when called with the --recurse-submodules flag.
This bug was first reported in [1].
[1]
https://lore.kernel.org/git/7437BB59-4605-48EC-B05E-E2BDB2D9DABC@gmail.com/
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Reported-by: Damien Robert <damien.olivier.robert@gmail.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function verify_clean_submodule() learned to verify if a submodule
working tree is clean in a7bc845a9a (unpack-trees: check if we can
perform the operation for submodules, 2017-03-14), but the commented
description above it was not updated to reflect that, such that this
description has been outdated since then.
Since Git has now learned to optionnally recursively check out
submodules during a superproject checkout, remove this outdated
description.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some rough edges in the sparse-checkout feature, especially around
the cone mode, have been cleaned up.
* ds/sparse-checkout-harden:
sparse-checkout: fix cone mode behavior mismatch
sparse-checkout: improve docs around 'set' in cone mode
sparse-checkout: escape all glob characters on write
sparse-checkout: use C-style quotes in 'list' subcommand
sparse-checkout: unquote C-style strings over --stdin
sparse-checkout: write escaped patterns in cone mode
sparse-checkout: properly match escaped characters
sparse-checkout: warn on globs in cone patterns
sparse-checkout: detect short patterns
sparse-checkout: cone mode does not recognize "**"
sparse-checkout: fix documentation typo for core.sparseCheckoutCone
clone: fix --sparse option with URLs
sparse-checkout: create leading directories
t1091: improve here-docs
t1091: use check_files to reduce boilerplate
|
|
Traditionally, we avoided threaded grep while searching in objects
(as opposed to files in the working tree) as accesses to the object
layer is not thread-safe. This limitation is getting lifted.
* mt/threaded-grep-in-object-store:
grep: use no. of cores as the default no. of threads
grep: move driver pre-load out of critical section
grep: re-enable threads in non-worktree case
grep: protect packed_git [re-]initialization
grep: allow submodule functions to run in parallel
submodule-config: add skip_if_read option to repo_read_gitmodules()
grep: replace grep_read_mutex by internal obj read lock
object-store: allow threaded access to object reading
replace-object: make replace operations thread-safe
grep: fix racy calls in grep_objects()
grep: fix race conditions at grep_submodule()
grep: fix race conditions on userdiff calls
|
|
The code recently added in this release to move to the entry beyond
the ones in the same directory in the index in the sparse-cone mode
did not count the number of entries to skip over incorrectly, which
has been corrected.
* ds/sparse-cone:
.mailmap: fix GGG authoship screwup
unpack-trees: correctly compute result count
|
|
The code that tries to skip over the entries for the paths in a
single directory using the cache-tree was not careful enough
against corrupt index file.
* es/unpack-trees-oob-fix:
unpack-trees: watch for out-of-range index position
|
|
C pedantry ;-) fix.
* jk/clang-sanitizer-fixes:
obstack: avoid computing offsets from NULL pointer
xdiff: avoid computing non-zero offset from NULL pointer
avoid computing zero offsets from NULL pointer
merge-recursive: use subtraction to flip stage
merge-recursive: silence -Wxor-used-as-pow warning
|
|
The intention of the special "cone mode" in the sparse-checkout
feature is to always match the same patterns that are matched by the
same sparse-checkout file as when cone mode is disabled.
When a file path is given to "git sparse-checkout set" in cone mode,
then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting the
MATCHED_RECURSIVE response, and hence these were left out of the
matched cone.
Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED
and add a test that prevents regression.
Reported-by: Finn Bryant <finnbryant@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code recently added in this release to move to the entry beyond
the ones in the same directory in the index in the sparse-cone mode
did not count the number of entries to skip over incorrectly, which
has been corrected.
* ds/sparse-cone:
.mailmap: fix GGG authoship screwup
unpack-trees: correctly compute result count
|
|
The Undefined Behavior Sanitizer in clang-11 seems to have learned a new
trick: it complains about computing offsets from a NULL pointer, even if
that offset is 0. This causes numerous test failures. For example, from
t1090:
unpack-trees.c:1355:41: runtime error: applying zero offset to null pointer
...
not ok 6 - in partial clone, sparse checkout only fetches needed blobs
The code in question looks like this:
struct cache_entry **cache_end = cache + nr;
...
while (cache != cache_end)
and we sometimes pass in a NULL and 0 for "cache" and "nr". This is
conceptually fine, as "cache_end" would be equal to "cache" in this
case, and we wouldn't enter the loop at all. But computing even a zero
offset violates the C standard. And given the fact that UBSan is
noticing this behavior, this might be a potential problem spot if the
compiler starts making unexpected assumptions based on undefined
behavior.
So let's just avoid it, which is pretty easy. In some cases we can just
switch to iterating with a numeric index (as we do in sequencer.c here).
In other cases (like the cache_end one) the use of an end pointer is
more natural; we can keep that by just explicitly checking for the
NULL/0 case when assigning the end pointer.
Note that there are two ways you can write this latter case, checking
for the pointer:
cache_end = cache ? cache + nr : cache;
or the size:
cache_end = nr ? cache + nr : cache;
For the case of a NULL/0 ptr/len combo, they are equivalent. But writing
it the second way (as this patch does) has the property that if somebody
were to incorrectly pass a NULL pointer with a non-zero length, we'd
continue to notice and segfault, rather than silently pretending the
length was zero.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code that tries to skip over the entries for the paths in a
single directory using the cache-tree was not careful enough
against corrupt index file.
* es/unpack-trees-oob-fix:
unpack-trees: watch for out-of-range index position
|
|
Code simplification.
* en/simplify-check-updates-in-unpack-trees:
unpack-trees: exit check_updates() early if updates are not wanted
|
|
Currently, submodule-config.c doesn't have an externally accessible
function to read gitmodules only if it wasn't already read. But this
exact behavior is internally implemented by gitmodules_read_check(), to
perform a lazy load. Let's merge this function with
repo_read_gitmodules() adding a 'skip_if_read' which allows both
internal and external callers to access this functionality. This
simplifies a little the code. The added option will also be used in
the following patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The clear_ce_flags_dir() method processes the cache entries within
a common directory. The returned int is the number of cache entries
processed by that directory. When using the sparse-checkout feature
in cone mode, we can skip the pattern matching for entries in the
directories that are entirely included or entirely excluded.
eb42feca (unpack-trees: hash less in cone mode, 2019-11-21)
introduced this performance feature. The old mechanism relied on
the counts returned by calling clear_ce_flags_1(), but the new
mechanism calculated the number of rows by subtracting "cache_end"
from "cache" to find the size of the range. However, the equation
is wrong because it divides by sizeof(struct cache_entry *). This
is not how pointer arithmetic works!
A coverity build of Git for Windows in preparation for the 2.25.0
release found this issue with the warning, "Pointer differences,
such as cache_end - cache, are automatically scaled down by the
size (8 bytes) of the pointed-to type (struct cache_entry *).
Most likely, the division by sizeof(struct cache_entry *) is
extraneous and should be eliminated." This warning is correct.
This leaves us with the question "how did this even work?" The
problem that occurs with this incorrect pointer arithmetic is
a performance-only bug, and a very slight one at that. Since
the entry count returned by clear_ce_flags_dir() is reduced by
a factor of 8, the loop in clear_ce_flags_1() will re-process
entries from those directories.
By inserting global counters into unpack-tree.c and tracing
them with trace2_data_intmax() (in a private change, for
testing), I was able to see count how many times the loop inside
clear_ce_flags_1() processed an entry and how many times
clear_ce_flags_dir() was called. Each of these are reduced by at
least a factor of 8 with the current change. A factor larger
than 8 happens when multiple levels of directories are repeated.
Specifically, in the Linux kernel repo, the command
git sparse-checkout set LICENSES
restricts the working directory to only the files at root and
in the LICENSES directory. Here are the measured counts:
clear_ce_flags_1 loop blocks:
Before: 11,520
After: 1,621
clear_ce_flags_dir calls:
Before: 7,048
After: 606
While these are dramatic counts, the time spent in
clear_ce_flags_1() is under one millisecond in each case, so
the improvement is not measurable as an end-to-end time.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It's possible in a case where the index file contains a tree extension
but no blobs within that tree exist for index_pos_by_traverse_info() to
segfault. If the name_entry passed into index_pos_by_traverse_info() has
no blobs inside, AND is alphabetically later than all blobs currently in
the index file, index_pos_by_traverse_info() will segfault. For example,
an index file which looks something like this:
aaa#0
bbb/aaa#0
[Extensions]
TREE: zzz
In this example, 'index_name_pos(..., "zzz/", ...)' will return '-4',
indicating that "zzz/" could be inserted at position 3. However, when
the checks which ensure that the insertion position of "zzz/" look for a
blob at that position beginning with "zzz/", the index cache is accessed
out of range, causing a segfault.
This kind of index state is not typically generated during user
operations, and is in fact an edge case of the state being checked for
in the conditional where it was added. However, since the entry for the
BUG() line is ambiguous, tell some additional context to help Git
developers debug the failure later. When we know the name of the dir we
were trying to look up, it becomes possible to examine the index file
in a hex util to determine what went wrong; the position gives a hint
about where to start looking.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
check_updates() has a lot of code that repeatedly checks whether
o->update or o->dry_run are set. (Note that o->dry_run is a
near-synonym for !o->update, but not quite as per commit 2c9078d05bf2
("unpack-trees: add the dry_run flag to unpack_trees_options",
2011-05-25).) In fact, this function almost turns into a no-op whenever
the condition
!o->update || o->dry_run
is met. Simplify the code by checking this condition at the beginning
of the function, and when it is true, do the few things that are
relevant and return early.
There are a few things that make the conversion not quite obvious:
* The fact that check_updates() does not actually turn into a no-op
when updates are not wanted may be slightly surprising. However,
commit 33ecf7eb61 (Discard "deleted" cache entries after using them
to update the working tree, 2008-02-07) put the discarding of
unused cache entries in check_updates() so we still need to keep
the call to remove_marked_cache_entries(). It's possible this
call belongs in another function, but it is certainly needed as
tests will fail if it is removed.
* The original called remove_scheduled_dirs() unconditionally.
Technically, commit 7847892716 (unlink_entry(): introduce
schedule_dir_for_removal(), 2009-02-09) should have made that call
conditional, but it didn't matter in practice because
remove_scheduled_dirs() becomes a no-op when all the calls to
unlink_entry() are skipped. As such, we do not need to call it.
* When (o->dry_run && o->update), the original would have two calls
to git_attr_set_direction() surrounding a bunch of skipped updates.
These two calls to git_attr_set_direction() cancel each other out
and thus can be omitted when o->dry_run is true just as they
already are when !o->update.
* The code would previously call setup_collided_checkout_detection()
and report_collided_checkout() even when o->dry_run. However, this
was just an expensive no-op because
setup_collided_checkout_detection() merely cleared the CE_MATCHED
flag for each cache entry, and report_collided_checkout() reported
which ones had it set. Since a dry-run would skip all the
checkout_entry() calls, CE_MATCHED would never get set and thus
no collisions would be reported. Since we can't detect the
collisions anyway without doing updates, skipping the collisions
detection setup and reporting is an optimization.
* The code previously would call get_progress() and
display_progress() even when (!o->update || o->dry_run). This
served to show how long it took to skip all the updates, which is
somewhat useless. Since we are skipping the updates, we can skip
showing how long it takes to skip them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.
* ds/sparse-cone: (21 commits)
sparse-checkout: improve OS ls compatibility
sparse-checkout: respect core.ignoreCase in cone mode
sparse-checkout: check for dirty status
sparse-checkout: update working directory in-process for 'init'
sparse-checkout: cone mode should not interact with .gitignore
sparse-checkout: write using lockfile
sparse-checkout: use in-process update for disable subcommand
sparse-checkout: update working directory in-process
sparse-checkout: sanitize for nested folders
unpack-trees: add progress to clear_ce_flags()
unpack-trees: hash less in cone mode
sparse-checkout: init and set in cone mode
sparse-checkout: use hashmaps for cone patterns
sparse-checkout: add 'cone' mode
trace2: add region in clear_ce_flags
sparse-checkout: create 'disable' subcommand
sparse-checkout: add '--stdin' option to set subcommand
sparse-checkout: 'set' subcommand
clone: add --sparse mode
sparse-checkout: create 'init' subcommand
...
|
|
|
|
* maint-2.23: (44 commits)
Git 2.23.1
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
...
|
|
* maint-2.22: (43 commits)
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
...
|
|
* maint-2.21: (42 commits)
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
...
|
|
* maint-2.20: (36 commits)
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
...
|
|
* maint-2.19: (34 commits)
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
...
|
|
* maint-2.18: (33 commits)
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
...
|
|
* maint-2.17: (32 commits)
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
...
|
|
* maint-2.16: (31 commits)
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
...
|
|
* maint-2.15: (29 commits)
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
...
|
|
* maint-2.14: (28 commits)
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
test-path-utils: offer to run a protectNTFS/protectHFS benchmark
...
|
|
Users of oneway_merge() (like "reset --hard") learned to take
advantage of fsmonitor to avoid unnecessary lstat(2) calls.
* us/unpack-trees-fsmonitor:
unpack-trees: skip stat on fsmonitor-valid files
|
|
A `git clone` will end with exit code 0 when `merged_entry()` returns a
positive value during a call of `unpack_trees()` to `traverse_trees()`.
The reason is that `unpack_trees()` will interpret a positive value not
to be an error.
The problem is, however, that `add_index_entry()` (which is called by
`merged_entry()` can report an error, and we really should fail the
entire clone in such a case.
Let's fix this problem, in preparation for a Windows-specific patch
disallowing `mkdir()` with directory names that contain a trailing space
(which is illegal on NTFS): we want `git clone` to abort when a path
cannot be checked out due to that condition.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
|
|
The sparse-checkout builtin used 'git read-tree -mu HEAD' to update the
skip-worktree bits in the index and to update the working directory.
This extra process is overly complex, and prone to failure. It also
requires that we write our changes to the sparse-checkout file before
trying to update the index.
Remove this extra process call by creating a direct call to
unpack_trees() in the same way 'git read-tree -mu HEAD' does. In
addition, provide an in-memory list of patterns so we can avoid
reading from the sparse-checkout file. This allows us to test a
proposed change to the file before writing to it.
An earlier version of this patch included a bug when the 'set' command
failed due to the "Sparse checkout leaves no entry on working directory"
error. It would not rollback the index.lock file, so the replay of the
old sparse-checkout specification would fail. A test in t1091 now
covers that scenario.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When a large repository has many sparse-checkout patterns, the
process for updating the skip-worktree bits can take long enough
that a user gets confused why nothing is happening. Update the
clear_ce_flags() method to write progress.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The sparse-checkout feature in "cone mode" can use the fact that
the recursive patterns are "connected" to the root via parent
patterns to decide if a directory is entirely contained in the
sparse-checkout or entirely removed.
In these cases, we can skip hashing the paths within those
directories and simply set the skipworktree bit to the correct
value.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The parent and recursive patterns allowed by the "cone mode"
option in sparse-checkout are restrictive enough that we
can avoid using the regex parsing. Everything is based on
prefix matches, so we can use hashsets to store the prefixes
from the sparse-checkout file. When checking a path, we can
strip path entries from the path and check the hashset for
an exact match.
As a test, I created a cone-mode sparse-checkout file for the
Linux repository that actually includes every file. This was
constructed by taking every folder in the Linux repo and creating
the pattern pairs here:
/$folder/
!/$folder/*/
This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following
performance:
core.sparseCheckout=false: 0.21 s (0.00 s)
core.sparseCheckout=true: 3.75 s (3.50 s)
core.sparseCheckoutCone=true: 0.23 s (0.01 s)
The times in parentheses above correspond to the time spent
in the first clear_ce_flags() call, according to the trace2
performance traces.
While this example is contrived, it demonstrates how these
patterns can slow the sparse-checkout feature.
Helped-by: Eric Wong <e@80x24.org>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When Git updates the working directory with the sparse-checkout
feature enabled, the unpack_trees() method calls clear_ce_flags()
to update the skip-wortree bits on the cache entries. This
check can be expensive, depending on the patterns used.
Add trace2 regions around the method, including some flag
information, so we can get granular performance data during
experiments. This data will be used to measure improvements
to the pattern-matching algorithms for sparse-checkout.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The index might be aware that a file hasn't modified via fsmonitor, but
unpack-trees did not pay attention to it and checked via ie_match_stat
which can be inefficient on certain filesystems. This significantly slows
down commands that run oneway_merge, like checkout and reset --hard.
This patch makes oneway_merge check whether a file is considered
unchanged through fsmonitor and skips ie_match_stat on it. unpack-trees
also now correctly copies over fsmonitor validity state from the source
index. Finally, for correctness, we force a refresh of fsmonitor state in
tweak_fsmonitor.
After this change, commands like stash (that use reset --hard
internally) go from 8s or more to ~2s on a 250k file repository on a
mac.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Utsav Shah <utsav@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Found with "git grep '^#include ' '*.c' | sort | uniq -d".
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The internal code originally invented for ".gitignore" processing
got reshuffled and renamed to make it less tied to "excluding" and
stress more that it is about "matching", as it has been reused for
things like sparse checkout specification that want to check if a
path is "included".
* ds/include-exclude:
unpack-trees: rename 'is_excluded_from_list()'
treewide: rename 'exclude' methods to 'pattern'
treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_'
treewide: rename 'struct exclude_list' to 'struct pattern_list'
treewide: rename 'struct exclude' to 'struct path_pattern'
|
|
Teach the lazy clone machinery that there can be more than one
promisor remote and consult them in order when downloading missing
objects on demand.
* cc/multi-promisor:
Move core_partial_clone_filter_default to promisor-remote.c
Move repository_format_partial_clone to promisor-remote.c
Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
remote: add promisor and partial clone config to the doc
partial-clone: add multiple remotes in the doc
t0410: test fetching from many promisor remotes
builtin/fetch: remove unique promisor remote limitation
promisor-remote: parse remote.*.partialclonefilter
Use promisor_remote_get_direct() and has_promisor_remote()
promisor-remote: use repository_format_partial_clone
promisor-remote: add promisor_remote_reinit()
promisor-remote: implement promisor_remote_get_direct()
Add initial support for many promisor remotes
fetch-object: make functions return an error code
t0410: remove pipes after git commands
|
|
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list' makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!
Now that this library is renamed to use 'struct pattern_list'
and 'struct pattern', we can now rename the method used by
the sparse-checkout feature to determine which paths should
appear in the working directory.
The method is_excluded_from_list() is only used by the
sparse-checkout logic in unpack-trees and list-objects-filter.
The confusing part is that it returned 1 for "excluded" (i.e.
it matches the list of exclusions) but that really manes that
the path matched the list of patterns for _inclusion_ in the
working directory.
Rename the method to be path_matches_pattern_list() and have
it return an explicit 'enum pattern_match_result'. Here, the
values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree
with the previous integer values. This shift allows future
consumers to better understand what the retur values mean,
and provides more type checking for handling those values.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list' makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!
It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.
This commit renames several methods defined in dir.h to make
more sense with the renamed 'struct exclude_list' to 'struct
pattern_list' and 'struct exclude' to 'struct path_pattern':
* last_exclude_matching() -> last_matching_pattern()
* parse_exclude() -> parse_path_pattern()
In addition, the word 'exclude' was replaced with 'pattern'
in the methods below:
* add_exclude_list()
* add_excludes_from_file_to_list()
* add_excludes_from_file()
* add_excludes_from_blob_to_list()
* add_exclude()
* clear_exclude_list()
A few methods with the word "exclude" remain. These will
be handled seperately. In particular, the method
"is_excluded()" is concretely about the .gitignore file
relative to a specific directory. This is the important
boundary between library and consumer: is_excluded() cares
about .gitignore, but is_excluded() calls
last_matching_pattern() to make that decision.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|