aboutsummaryrefslogtreecommitdiffstats
path: root/unpack-trees.c
AgeCommit message (Collapse)AuthorFilesLines
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt1-1/+1
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt1-1/+1
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03unpack-trees.c: *.txt -> *.adoc fixesTodd Zullinger1-1/+1
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18progress: stop using `the_repository`Patrick Steinhardt1-1/+3
Stop using `the_repository` in the "progress" subsystem by passing in a repository when initializing `struct progress`. Furthermore, store a pointer to the repository in that struct so that we can pass it to the trace2 API when logging information. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-07unpack-trees: detect mismatching number of cache-tree/index entriesPatrick Steinhardt1-0/+2
Same as the preceding commit, we unconditionally dereference the index's cache entries depending on the number of cache-tree entries, which can lead to a segfault when the cache-tree is corrupted. Fix this bug. This also makes t4058 pass with the leak sanitizer enabled. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-07cache-tree: refactor verification to return error codesPatrick Steinhardt1-3/+7
The function `cache_tree_verify()` will `BUG()` whenever it finds that the cache-tree extension of the index is corrupt. The function is thus inherently untestable because the resulting call to `abort()` will be detected by our testing framework and labelled an error. And rightfully so: it shouldn't ever be possible to hit bugs, as they should indicate a programming error rather than corruption of on-disk state. Refactor the function to instead return error codes. This also ensures that the function can be used e.g. by git-fsck(1) without the whole process dying. Furthermore, this refactoring plugs some memory leaks when returning early by creating a common exit path. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14unpack-trees: clear index when not propagating itPatrick Steinhardt1-0/+2
When provided a pointer to a destination index, then `unpack_trees()` will end up copying its `o->internal.result` index into the provided pointer. In those cases it is thus not necessary to free the index, as we have transferred ownership of it. There are cases though where we do not end up transferring ownership of the memory, but `clear_unpack_trees_porcelain()` will never discard the index in that case and thus cause a memory leak. And right now it cannot do so in the first place because we have no indicator of whether we did or didn't transfer ownership of the index. Adapt the code to zero out the index in case we transfer its ownership. Like this, we can now unconditionally discard the index when being asked to clear the `unpack_trees_options`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+2
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: refactor `resolve_gitlink_ref()` to accept a repositoryPatrick Steinhardt1-1/+2
In `resolve_gitlink_ref()` we implicitly rely on `the_repository` to look up the submodule ref store. Now that we can look up submodule ref stores for arbitrary repositories we can improve this function to instead accept a repository as parameter for which we want to resolve the gitlink. Do so and adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10Merge branch 'cw/prelim-cleanup'Junio C Hamano1-1/+1
Shuffle some bits across headers and sources to prepare for libification effort. * cw/prelim-cleanup: parse: separate out parsing functions from config.h config: correct bad boolean env value error message wrapper: reduce scope of remove_or_warn() hex-ll: separate out non-hash-algo functions
2023-09-29parse: separate out parsing functions from config.hCalvin Wan1-1/+1
The files config.{h,c} contain functions that have to do with parsing, but not config. In order to further reduce all-in-one headers, separate out functions in config.c that do not operate on config into its own file, parse.h, and update the include directives in the .c files that need only such functions accordingly. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31tree-walk: reduce stack size for recursive functionsJeff King1-2/+7
The traverse_trees() and traverse_trees_recursive() functions call each other recursively. In a deep tree, this can result in running out of stack space and crashing. There's obviously going to be some limit here based on available stack, but the problem is exacerbated by a few large structs, many of which we over-allocate. For example, in traverse_trees() we store a name_entry and tree_desc_x per tree, both of which contain an object_id (which is now 32 bytes). And we allocate 8 of them (from MAX_TRAVERSE_TREES), even though many traversals will only look at 1 or 2. Interestingly, we used to allocate these on the heap, prior to 8dd40c0472 (traverse_trees(): use stack array for name entries, 2020-01-30). That commit was trying to simplify away allocation size computations, and naively assumed that the sizes were small enough not to matter. And they don't in normal cases, but on my stock Debian system I see a crash running "git archive" on a tree with ~3600 entries. That's deep enough we wouldn't see it in practice, but probably shallow enough that we'd prefer not to make it a hard limit. Especially because other systems may have even smaller stacks. We can replace these stack variables with a few malloc invocations. This reduces the stack sizes for the two functions from 1128 and 752 bytes, respectively, down to 40 and 92 bytes. That allows a depth of ~13000 on my machine (the improvement isn't in linear proportion because my numbers don't count the size of parameters and other function overhead). The possible downsides are: 1. We now have to remember to free(). But both functions have an easy single exit (and already had to clean up other bits anyway). 2. The extra malloc()/free() overhead might be measurable. I tested this by setting up a 3000-depth tree with a single blob and running "git archive" on it. After switching to the heap, it consistently runs 2-3% faster! Presumably this is because the 1K+ of wasted stack space penalized memory caches. On a more real-world case like linux.git, the speed difference isn't measurable at all, simply because most trees aren't that deep and there's so much other work going on (like accessing the objects themselves). So the improvement I saw should be taken as evidence that we're not making anything worse, but isn't really that interesting on its own. The main motivation here is that we're now less likely to run out of stack space and crash. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren1-1/+1
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren1-1/+1
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache*.h: move declarations for read-cache.c functions from cache.hElijah Newren1-0/+1
For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21name-hash.h: move declarations for name-hash.c from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano1-0/+1
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-04-25Merge branch 'en/header-split-cache-h'Junio C Hamano1-0/+2
Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-24symlinks.h: move declarations for symlinks.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on advice.hElijah Newren1-0/+1
Dozens of files made use of advice functions, without explicitly including advice.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include advice.h if they are using it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on trace.h & trace2.hElijah Newren1-0/+1
Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano1-0/+3
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano1-1/+1
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04Merge branch 'js/split-index-fixes'Junio C Hamano1-0/+2
The index files can become corrupt under certain conditions when the split-index feature is in use, especially together with fsmonitor, which have been corrected. * js/split-index-fixes: unpack-trees: take care to propagate the split-index flag fsmonitor: avoid overriding `cache_changed` bits split-index; stop abusing the `base_oid` to strip the "link" extension split-index & fsmonitor: demonstrate a bug
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-1/+1
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "promisor-remote.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "promisor-remote.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27unpack-trees: take care to propagate the split-index flagJohannes Schindelin1-0/+2
When copying the `split_index` structure from one index structure to another, we need to propagate the `SPLIT_INDEX_ORDERED` flag, too, if it is set, otherwise Git might forget to write the shared index when that is actually needed. It just so _happens_ that in many instances when `unpack_trees()` is called, the result causes the shared index to be written anyway, but there are edge cases when that is not so. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21setup.h: move declarations for setup.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'en/header-cleanup'Junio C Hamano1-0/+1
Code clean-up to clarify the rule that "git-compat-util.h" must be the first to be included. * en/header-cleanup: diff.h: remove unnecessary include of object.h Remove unnecessary includes of builtin.h treewide: replace cache.h with more direct headers, where possible replace-object.h: move read_replace_refs declaration from cache.h to here object-store.h: move struct object_info from cache.h dir.h: refactor to no longer need to include cache.h object.h: stop depending on cache.h; make cache.h depend on object.h ident.h: move ident-related declarations out of cache.h pretty.h: move has_non_ascii() declaration from commit.h cache.h: remove dependence on hex.h; make other files include it explicitly hex.h: move some hex-related declarations from cache.h hash.h: move some oid-related declarations from cache.h alloc.h: move ALLOC_GROW() functions from cache.h treewide: remove unnecessary cache.h includes in source files treewide: remove unnecessary cache.h includes treewide: remove unnecessary git-compat-util.h includes in headers treewide: ensure one of the appropriate headers is sourced first
2023-02-27unpack-trees: add usage notices around df_conflict_entryElijah Newren1-0/+2
Avoid making users believe they need to initialize df_conflict_entry to something (as happened with other output only fields before) with a quick comment and a small sanity check. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27unpack-trees: special case read-tree debugging as internal usageElijah Newren1-11/+11
builtin/read-tree.c has some special functionality explicitly designed for debugging unpack-trees.[ch]. Associated with that is two fields that no other external caller would or should use. Mark these as internal to unpack-trees, but allow builtin/read-tree to read or write them for this special case. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27unpack-trees: rewrap a few overlong lines from previous patchElijah Newren1-7/+9
The previous patch made many lines a little longer, resulting in four becoming a bit too long. They were left as-is for the previous patch to facilitate reviewers verifying that we were just adding "internal." in a bunch of places, but rewrap them now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27unpack-trees: mark fields only used internally as internalElijah Newren1-79/+80
Continue the work from the previous patch by finding additional fields which are only used internally but not yet explicitly marked as such, and include them in the internal fields struct. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27unpack_trees: start splitting internal fields from public APIElijah Newren1-20/+20
This just splits the two fields already marked as internal-only into a separate internal struct. Future commits will add more fields that were meant to be internal-only but were not explicitly marked as such to the same struct. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27sparse-checkout: avoid using internal API of unpack-trees, take 2Elijah Newren1-0/+1
Commit 2f6b1eb794 ("cache API: add a "INDEX_STATE_INIT" macro/function, add release_index()", 2023-01-12) mistakenly added some initialization of a member of unpack_trees_options that was intended to be internal-only. This initialization should be done within update_sparsity() instead. Note that while o->result is mostly meant for unpack_trees() and update_sparsity() mostly operates without o->result, check_ok_to_remove() does consult it so we need to ensure it is properly initialized. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27sparse-checkout: avoid using internal API of unpack-treesElijah Newren1-7/+11
struct unpack_trees_options has the following field and comment: struct pattern_list *pl; /* for internal use */ Despite the internal-use comment, commit e091228e17 ("sparse-checkout: update working directory in-process", 2019-11-21) starting setting this field from an external caller. At the time, the only way around that would have been to modify unpack_trees() to take an extra pattern_list argument, and there's a lot of callers of that function. However, when we split update_sparsity() off as a separate function, with sparse-checkout being the sole caller, the need to update other callers went away. Fix this API problem by adding a pattern_list argument to update_sparsity() and stop setting the internal o.pl field directly. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27unpack-trees: clean up some flow controlElijah Newren1-4/+3
The update_sparsity() function was introduced in commit 7af7a25853 ("unpack-trees: add a new update_sparsity() function", 2020-03-27). Prior to that, unpack_trees() was used, but that had a few bugs because the needs of the caller were different, and different enough that unpack_trees() could not easily be modified to handle both usecases. The implementation detail that update_sparsity() was written by copying unpack_trees() and then streamlining it, and then modifying it in the needed ways still shows through in that there are leftover vestiges in both functions that are no longer needed. Clean them up. In particular: * update_sparsity() allows a pattern list to be passed in, but unpack_trees() never should use a different pattern list. Add a check and a BUG() if this gets violated. * update_sparsity() has a check early on that will BUG() if o->skip_sparse_checkout is set; as such, there's no need to check for that condition again later in the code. We can simply remove the check and its corresponding goto label. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27unpack-trees: heed requests to overwrite ignored filesElijah Newren1-1/+1
When a directory exists but has only ignored files within it and we are trying to switch to a branch that has a file where that directory is, the behavior depends upon --[no]-overwrite-ignore. If the user wants to --overwrite-ignore (the default), then we should delete the ignored file and directory and switch to the new branch. The code to handle this in verify_clean_subdirectory() in unpack-trees tried to handle this via paying attention to the exclude_per_dir setting of the internal dir field. This came from commit c81935348b ("Fix switching to a branch with D/F when current branch has file D.", 2007-03-15), which pre-dated 039bc64e88 ("core.excludesfile clean-up", 2007-11-14), and thus did not pay attention to ignore patterns from other relevant files. Change it to use setup_standard_excludes() so that it is also aware of excludes specified in other locations. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-17treewide: always have a valid "index_state.repo" memberÆvar Arnfjörð Bjarmason1-1/+1
When the "repo" member was added to "the_index" in [1] the repo_read_index() was made to populate it, but the unpopulated "the_index" variable didn't get the same treatment. Let's do that in initialize_the_repository() when we set it up, and likewise for all of the current callers initialized an empty "struct index_state". This simplifies code that needs to deal with "the_index" or a custom "struct index_state", we no longer need to second-guess this part of the "index_state" deep in the stack. A recent example of such second-guessing is the "istate->repo ? istate->repo : the_repository" code in [2]. We can now simply use "istate->repo". We're doing this by making use of the INDEX_STATE_INIT() macro (and corresponding function) added in [3], which now have mandatory "repo" arguments. Because we now call index_state_init() in repository.c's initialize_the_repository() we don't need to handle the case where we have a "repo->index" whose "repo" member doesn't match the "repo" we're setting up, i.e. the "Complete the double-reference" code in repo_read_index() being altered here. That logic was originally added in [1], and was working around the lack of what we now have in initialize_the_repository(). For "fsmonitor-settings.c" we can remove the initialization of a NULL "r" argument to "the_repository". This was added back in [4], and was needed at the time for callers that would pass us the "r" from an "istate->repo". Before this change such a change to "fsmonitor-settings.c" would segfault all over the test suite (e.g. in t0002-gitfile.sh). This change has wider eventual implications for "fsmonitor-settings.c". The reason the other lazy loading behavior in it is required (starting with "if (!r->settings.fsmonitor) ..." is because of the previously passed "r" being "NULL". I have other local changes on top of this which move its configuration reading to "prepare_repo_settings()" in "repo-settings.c", as we could now start to rely on it being called for our "r". But let's leave all of that for now, and narrowly remove this particular part of the lazy-loading. 1. 1fd9ae517c4 (repository: add repo reference to index_state, 2021-01-23) 2. ee1f0c242ef (read-cache: add index.skipHash config option, 2023-01-06) 3. 2f6b1eb794e (cache API: add a "INDEX_STATE_INIT" macro/function, add release_index(), 2023-01-12) 4. 1e0ea5c4316 (fsmonitor: config settings are repository-specific, 2022-03-25) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-16cache API: add a "INDEX_STATE_INIT" macro/function, add release_index()Ævar Arnfjörð Bjarmason1-1/+1
Hopefully in some not so distant future, we'll get advantages from always initializing the "repo" member of the "struct index_state". To make that easier let's introduce an initialization macro & function. The various ad-hoc initialization of the structure can then be changed over to it, and we can remove the various "0" assignments in discard_index() in favor of calling index_state_init() at the end. While not strictly necessary, let's also change the CALLOC_ARRAY() of various "struct index_state *" to use an ALLOC_ARRAY() followed by index_state_init() instead. We're then adding the release_index() function and converting some callers (including some of these allocations) over to it if they either won't need to use their "struct index_state" again, or are just about to call index_state_init(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13sparse-index API: BUG() out on NULL ensure_full_index()Ævar Arnfjörð Bjarmason1-1/+2
Make the ensure_full_index() function stricter, and have it only accept a non-NULL "struct index_state". This function (and this behavior) was added in [1]. The only reason it needed to be this lax was due to interaction with repo_index_has_changes(). See the addition of that code in [2]. The other reason for why this was needed dates back to interaction with code added in [3]. In [4] we started calling ensure_full_index() in unpack_trees(), but the caller added in 34110cd4e39 wants to pass us a NULL "dst_index". Let's instead do the NULL check in unpack_trees() itself. 1. 4300f8442a2 (sparse-index: implement ensure_full_index(), 2021-03-30) 2. 0c18c059a15 (read-cache: ensure full index, 2021-04-01) 3. 34110cd4e39 (Make 'unpack_trees()' have a separate source and destination index, 2008-03-06) 4. 6863df35503 (unpack-trees: ensure full index, 2021-03-30) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-05Merge branch 'ab/no-more-git-global-super-prefix'Junio C Hamano1-10/+14
Stop using "git --super-prefix" and narrow the scope of its use to the submodule--helper. * ab/no-more-git-global-super-prefix: read-tree: add "--super-prefix" option, eliminate global submodule--helper: convert "{update,clone}" to their own "--super-prefix" submodule--helper: convert "status" to its own "--super-prefix" submodule--helper: convert "sync" to its own "--super-prefix" submodule--helper: convert "foreach" to its own "--super-prefix" submodule--helper: don't use global --super-prefix in "absorbgitdirs" submodule.c & submodule--helper: pass along "super_prefix" param read-tree + fetch tests: test failing "--super-prefix" interaction submodule absorbgitdirs tests: add missing "Migrating git..." tests
2022-12-26read-tree: add "--super-prefix" option, eliminate globalÆvar Arnfjörð Bjarmason1-10/+14
The "--super-prefix" option to "git" was initially added in [1] for use with "ls-files"[2], and shortly thereafter "submodule--helper"[3] and "grep"[4]. It wasn't until [5] that "read-tree" made use of it. At the time [5] made sense, but since then we've made "ls-files" recurse in-process in [6], "grep" in [7], and finally "submodule--helper" in the preceding commits. Let's also remove it from "read-tree", which allows us to remove the option to "git" itself. We can do this because the only remaining user of it is the submodule API, which will now invoke "read-tree" with its new "--super-prefix" option. It will only do so when the "submodule_move_head()" function is called. That "submodule_move_head()" function was then only invoked by "read-tree" itself, but now rather than setting an environment variable to pass "--super-prefix" between cmd_read_tree() we: - Set a new "super_prefix" in "struct unpack_trees_options". The "super_prefixed()" function in "unpack-trees.c" added in [5] will now use this, rather than get_super_prefix() looking up the environment variable we set earlier in the same process. - Add the same field to the "struct checkout", which is only needed to ferry the "super_prefix" in the "struct unpack_trees_options" all the way down to the "entry.c" callers of "submodule_move_head()". Those calls which used the super prefix all originated in "cmd_read_tree()". The only other caller is the "unlink_entry()" caller in "builtin/checkout.c", which now passes a "NULL". 1. 74866d75793 (git: make super-prefix option, 2016-10-07) 2. e77aa336f11 (ls-files: optionally recurse into submodules, 2016-10-07) 3. 89c86265576 (submodule helper: support super prefix, 2016-12-08) 4. 0281e487fd9 (grep: optionally recurse into submodules, 2016-12-16) 5. 3d415425c7b (unpack-trees: support super-prefix option, 2017-01-17) 6. 188dce131fa (ls-files: use repository object, 2017-06-22) 7. f9ee2fcdfa0 (grep: recurse in-process using 'struct repository', 2017-08-02) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15git: remove duplicate includesSeija Kijin1-1/+0
These files are already included; we do not need to include them again Signed-off-by: Seija Kijin <doremylover123@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-10unpack-trees: add 'skip_cache_tree_update' optionVictoria Dye1-1/+2
Add (disabled by default) option to skip the 'cache_tree_update()' at the end of 'unpack_trees()'. In many cases, this cache tree update is redundant because the caller of 'unpack_trees()' immediately follows it with 'prime_cache_tree()', rebuilding the entire cache tree from scratch. While these operations aren't the most expensive part of operations like 'git reset', the duplicate calls still create a minor unnecessary slowdown. Introduce an option for callers to skip the 'cache_tree_update()' in 'unpack_trees()' if it is redundant (that is, if 'prime_cache_tree()' is called afterwards). At the moment, no 'unpack_trees()' callers use the new option; they will be updated in subsequent patches. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-09-09Merge branch 'vd/sparse-reset-checkout-fixes'Junio C Hamano1-2/+2
Segfault fix-up to an earlier fix to the topic to teach "git reset" and "git checkout" work better in a sparse checkout. * vd/sparse-reset-checkout-fixes: unpack-trees: fix sparse directory recursion check
2022-09-02unpack-trees: fix sparse directory recursion checkVictoria Dye1-2/+2
Ensure 'is_sparse_directory_entry()' receives a valid 'name_entry *' if one exists in the list of tree(s) being unpacked in 'unpack_callback()'. Currently, 'is_sparse_directory_entry()' is called with the first 'name_entry' in the 'names' list of entries on 'unpack_callback()'. However, this entry may be empty even when other elements of 'names' are not (such as when switching from an orphan branch back to a "normal" branch). As a result, 'is_sparse_directory_entry()' could incorrectly indicate that a sparse directory is *not* actually sparse because the name of the index entry does not match the (empty) 'name_entry' path. Fix the issue by using the existing 'name_entry *p' value in 'unpack_callback()', which points to the first non-empty entry in 'names'. Because 'p' is 'const', also update 'is_sparse_directory_entry()'s 'name_entry *' argument to be 'const'. Finally, add a regression test case. Reported-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-26Merge branch 'vd/sparse-reset-checkout-fixes' into maintJunio C Hamano1-10/+96
Fixes to sparse index compatibility work for "reset" and "checkout" commands. source: <pull.1312.v3.git.1659985672.gitgitgadget@gmail.com> * vd/sparse-reset-checkout-fixes: unpack-trees: unpack new trees as sparse directories cache.h: create 'index_name_pos_sparse()' oneway_diff: handle removed sparse directories checkout: fix nested sparse directory diff in sparse index
2022-08-18Merge branch 'vd/sparse-reset-checkout-fixes'Junio C Hamano1-10/+96
Fixes to sparse index compatibility work for "reset" and "checkout" commands. * vd/sparse-reset-checkout-fixes: unpack-trees: unpack new trees as sparse directories cache.h: create 'index_name_pos_sparse()' oneway_diff: handle removed sparse directories checkout: fix nested sparse directory diff in sparse index
2022-08-08unpack-trees: unpack new trees as sparse directoriesVictoria Dye1-10/+96
If 'unpack_single_entry()' is unpacking a new directory tree (that is, one not already present in the index) into a sparse index, unpack the tree as a sparse directory rather than traversing its contents and unpacking each file individually. This helps keep the sparse index as collapsed as possible in cases such as 'git reset --hard' restoring a outside-of-cone directory removed with 'git rm -r --sparse'. Without this patch, 'unpack_single_entry()' will only unpack a directory into the index as a sparse directory (rather than traversing into it and unpacking its files one-by-one) if an entry with the same name already exists in the index. This patch allows sparse directory unpacking without a matching index entry when the following conditions are met: 1. the directory's path is outside the sparse cone, and 2. there are no children of the directory in the index If a directory meets these requirements (as determined by 'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory index entry and propagates the decision back up to 'unpack_callback()' to prevent unnecessary tree traversal into the unpacked directory. Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14checkout: fix two bugs on the final count of updated entriesMatheus Tavares1-1/+1
At the end of `git checkout <pathspec>`, we get a message informing how many entries were updated in the working tree. However, this number can be inaccurate for two reasons: 1) Delayed entries currently get counted twice. 2) Failed entries are included in the count. The first problem happens because the counter is first incremented before inserting the entry in the delayed checkout queue, and once again when finish_delayed_checkout() calls checkout_entry(). And the second happens because the counter is incremented too early in checkout_entry(), before the entry was in fact checked out. Fix that by moving the count increment further down in the call stack and removing the duplicate increment on delayed entries. Note that we have to keep a per-entry reference for the counter (both on parallel checkout and delayed checkout) because not all entries are always accumulated at the same counter. See checkout_worktree(), at builtin/checkout.c for an example. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-10Merge branch 'jh/builtin-fsmonitor-part3'Junio C Hamano1-0/+1
More fsmonitor--daemon. * jh/builtin-fsmonitor-part3: (30 commits) t7527: improve implicit shutdown testing in fsmonitor--daemon fsmonitor--daemon: allow --super-prefix argument t7527: test Unicode NFC/NFD handling on MacOS t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd t/helper/hexdump: add helper to print hexdump of stdin fsmonitor: on macOS also emit NFC spelling for NFD pathname t7527: test FSMonitor on case insensitive+preserving file system fsmonitor: never set CE_FSMONITOR_VALID on submodules t/perf/p7527: add perf test for builtin FSMonitor t7527: FSMonitor tests for directory moves fsmonitor: optimize processing of directory events fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed fsm-health-win32: force shutdown daemon if worktree root moves fsm-health-win32: add polling framework to monitor daemon health fsmonitor--daemon: stub in health thread fsmonitor--daemon: rename listener thread related variables fsmonitor--daemon: prepare for adding health thread fsmonitor--daemon: cd out of worktree root fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS unpack-trees: initialize fsmonitor_has_run_once in o->result ...
2022-06-03Merge branch 'ds/sparse-sparse-checkout'Junio C Hamano1-0/+4
"sparse-checkout" learns to work well with the sparse-index feature. * ds/sparse-sparse-checkout: sparse-checkout: integrate with sparse index p2000: add test for 'git sparse-checkout [add|set]' sparse-index: complete partial expansion sparse-index: partially expand directories sparse-checkout: --no-sparse-index needs a full index cache-tree: implement cache_tree_find_path() sparse-index: introduce partially-sparse indexes sparse-index: create expand_index() t1092: stress test 'git sparse-checkout set' t1092: refactor 'sparse-index contents' test
2022-05-26unpack-trees: initialize fsmonitor_has_run_once in o->resultJeff Hostetler1-0/+1
Initialize `o->result.fsmonitor_has_run_once` based upon value in `o->src_index->fsmonitor_has_run_once` to prevent a second fsmonitor query during the tree traversal and possibly getting a skewed view of the working directory. The checkout code has already talked to the fsmonitor and the traversal is updating the index as it traverses, so there is no need to query the fsmonitor. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23sparse-checkout: integrate with sparse indexDerrick Stolee1-0/+4
When modifying the sparse-checkout definition, the sparse-checkout builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all cache entries in the index. Before, we needed the index to be fully expanded in order to ensure we had the full list of files necessary that match the new patterns. Insert a call to reset_sparse_directories() that expands sparse directories that are within the new pattern list, but only far enough that every necessary file path now exists as a cache entry. The remaining logic within update_sparsity() will modify the SKIP_WORKTREE bits appropriately. This allows us to disable command_requires_full_index within the sparse-checkout builtin. Add tests that demonstrate that we are not expanding to a full index unnecessarily. We can see the improved performance in the p2000 test script: Test HEAD~1 HEAD ------------------------------------------------------------------------ 2000.24: git ... (sparse-v3) 2.14(1.55+0.58) 1.57(1.03+0.53) -26.6% 2000.25: git ... (sparse-v4) 2.20(1.62+0.57) 1.58(0.98+0.59) -28.2% These reductions of 26-28% are small compared to most examples, but the time is dominated by writing a new copy of the base repository to the worktree and then deleting it again. The fact that the previous index expansion was such a large portion of the time is telling how important it is to complete this sparse index integration. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-10unpack-trees: preserve index sparsityVictoria Dye1-0/+6
When unpacking trees, set the default sparsity of the resultant index based on repo settings and 'is_sparse_index_allowed()'. Normally, when executing 'unpack_trees', the output index is marked sparse when (and only when) it unpacks a sparse directory. However, an index may be "sparse" even if it contains no sparse directories - when all files fall inside the sparse-checkout definition or otherwise have SKIP_WORKTREE disabled. Therefore, the output index may be marked "full" even when it is "sparse", resulting in unnecessary 'ensure_full_index' calls when writing to disk. Avoid this by setting the "default" index sparsity to match what is expected for the repository. As a consequence of this fix, the (non-merge) 'read-tree' performed when applying a stash with untracked files no longer expands the index. Update the corresponding test in 't1092'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17Revert "unpack-trees: improve performance of next_cache_entry"Victoria Dye1-17/+6
This reverts commit f2a454e0a5 (unpack-trees: improve performance of next_cache_entry, 2021-11-29). The "hint" value was originally needed to improve performance in 'git reset -- <pathspec>' caused by 'cache_bottom' lagging behind its correct value when using a sparse index. The 'cache_bottom' tracking has since been corrected, removing the need for an additional "pseudo-cache_bottom" tracking variable. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17unpack-trees: increment cache_bottom for sparse directoriesVictoria Dye1-8/+8
Correct tracking of the 'cache_bottom' for cases where sparse directories are present in the index. BACKGROUND ---------- The 'unpack_trees_options.cache_bottom' is a variable that tracks the in-progress "bottom" of the cache as 'unpack_trees()' iterates through the contents of the index. Most importantly, this value informs the sequential return values of 'next_cache_entry()' which, in the "diff cache" usage of 'unpack_callback()', are either unpacked as-is or are passed into the diff machinery. The 'cache_bottom' is intended to track the position of the first entry in the index that has not yet been diffed or unpacked. It is advanced in two main ways: either it is incremented when an index entry is marked as "used" (in 'mark_ce_used()'), indicating that it was unpacked or diffed, or when a directory is unpacked, in which case it is increased by an amount equaling the number of index entries inside that tree. In 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14), it was identified that sparse directories posed a problem to the above 'cache_bottom' advancement logic - because a sparse directory was both an index entry that could be "used" and a directory that can be unpacked, the 'cache_bottom' would be incremented too many times. To solve this problem, the 'mark_ce_used()' advancement of 'cache_bottom' was skipped for sparse directories. INCORRECT CACHE_BOTTOM TRACKING ------------------------------- Skipping the 'cache_bottom' advancement for sparse directories in 'mark_ce_used()' breaks down in two cases: 1. When the 'unpack_trees()' operation is *not* a "cache diff" (because the directory contents-based incrementing of 'cache_bottom' does not happen). 2. When a cache diff is performed with a pathspec (because 'unpack_index_entry()' will unpack a sparse directory not matched by the pathspec without performing the directory contents-based increment). The former luckily does not appear to affect 'git' behavior, likely because 'cache_bottom' is largely unused (non-"cache diff" 'unpack_trees()' uses 'find_index_entry()' - rather than 'next_cache_entry()' - to find the index entries to unpack). The latter, however, causes 'cache_bottom' to "lag behind" its intended position by an amount equal to the number of sparse directories unpacked so far with 'unpack_index_entry()'. If a repository is structured such that any sparse directories are ordered lexicographically *after* any pathspec-matching directories, though, this issue won't present any adverse behavior. This was the case with the 't1092-sparse-checkout-compatibility.sh' tests before the addition of the 'before/' sparse directory (ordered *before* the in-cone 'deep/' directory), therefore sidestepping the issue. Once the 'before/' directory was added, though, 'cache_bottom' began to lag behind its intended position, causing 'next_cache_entry()' to return index entries it had already processed and, ultimately, an incorrect diff. CORRECTING CACHE_BOTTOM ----------------------- The problems observed in 't1092' come from 'cache_bottom' lagging behind in cases where the cache tree-based advancement doesn't occur. To solve this, then, the fix in 17a1bb570b is "reversed"; rather than skipping 'cache_bottom' advancement in 'mark_ce_used()', we skip the directory contents-based advancement for sparse directories. Now, every index entry can be accounted for in 'cache_bottom': * if you're working with a single index entry, 'cache_bottom' is incremented in 'mark_ce_used()' * if you're working with a directory that contains index entries (but is not one itself), 'cache_bottom' is incremented by the number of entries in that directory. Finally, change the 'test_expect_failure' tests in 't1092' failing due to this bug back to 'test_expect_success'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-16Merge branch 'vd/sparse-read-tree'Junio C Hamano1-8/+139
"git read-tree" has been made to be aware of the sparse-index feature. * vd/sparse-read-tree: read-tree: make three-way merge sparse-aware read-tree: make two-way merge sparse-aware read-tree: narrow scope of index expansion for '--prefix' read-tree: integrate with sparse index read-tree: expand sparse checkout test coverage read-tree: explicitly disallow prefixes with a leading '/' status: fix nested sparse directory diff in sparse index sparse-index: prevent repo root from becoming sparse
2022-03-01read-tree: make three-way merge sparse-awareVictoria Dye1-8/+26
Enable use of 'merged_sparse_dir' in 'threeway_merge'. As with two-way merge, the contents of each conflicted sparse directory are merged without referencing the index, avoiding sparse index expansion. As with two-way merge, the 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --merge with edit/edit conflicts in sparse directories' confirms that three-way merges with edit/edit changes (both with and without conflicts) inside a sparse directory result in the correct index state or error message. To ensure the index is not unnecessarily expanded, add three-way merge cases to 'sparse index is not expanded: read-tree'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01read-tree: make two-way merge sparse-awareVictoria Dye1-0/+75
Enable two-way merge with 'git read-tree' without expanding the sparse index. When in a sparse index, a two-way merge will trivially succeed as long as there are not changes to the same sparse directory in multiple trees (i.e., sparse directory-level "edit-edit" conflicts). If there are such conflicts, the merge will fail despite the possibility that individual files could merge cleanly. In order to resolve these "edit-edit" conflicts, "conflicted" sparse directories are - rather than rejected - merged by traversing their associated trees by OID. For each child of the sparse directory: 1. Files are merged as normal (see Documentation/git-read-tree.txt for details). 2. Subdirectories are treated as sparse directories and merged in 'twoway_merge'. If there are no conflicts, they are merged according to the rules in Documentation/git-read-tree.txt; otherwise, the subdirectory is recursively traversed and merged. This process allows sparse directories to be individually merged at the necessary depth *without* expanding a full index. The 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --merge with edit/edit conflicts in sparse directories' tests two-way merges with 1) changes inside sparse directories that do not conflict and 2) changes that do conflict (with the correct file(s) reported in the error message). Additionally, add two-way merge cases to 'sparse index is not expanded: read-tree' to confirm that the index is not expanded regardless of whether edit/edit conflicts are present in a sparse directory. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01read-tree: narrow scope of index expansion for '--prefix'Victoria Dye1-0/+38
When 'git read-tree' is provided with a prefix, expand the index only if the prefix is equivalent to a sparse directory or contained within one. If the index is not expanded in these cases, 'ce_in_traverse_path' will indicate that the relevant sparse directory is not in the prefix/traverse path, skipping past it and not unpacking the appropriate tree(s). If the prefix is in-cone, its sparse subdirectories (if any) will be traversed correctly without index expansion. The behavior of 'git read-tree' with prefixes 1) inside of cone, 2) equal to a sparse directory, and 3) inside a sparse directory are all tested as part of the 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --prefix', ensuring that the sparse index case works the way it did prior to this change as well as matching non-sparse index sparse-checkout. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14unpack-trees: fix accidental loss of user changesElijah Newren1-1/+3
For sparse-checkouts, we don't want unpack-trees to error out on files that are missing from the worktree, so there has traditionally been logic to make it skip the verify_uptodate() check for these. Unfortunately, it was skipping the verify_uptodate() check for files that were expected to *become* SKIP_WORKTREE. For files that were not already SKIP_WORKTREE, that can cause us to later delete the file in apply_sparse_checkout(). Only skip the check for files that were already SKIP_WORKTREE as well to avoid lightly discarding important changes users may have made to files. Note 1: unpack-trees.c is already a bit complex, and the logic around CE_SKIP_WORKTREE and CE_NEW_SKIP_WORKTREE in that file are no exception. I also tried just replacing CE_NEW_SKIP_WORKTREE with CE_SKIP_WORKTREE in the verify_uptodate() check instead of checking for both flags, and found that it also fixed this bug and passed all the tests. I also attempted to devise a few testcases that might trip either variant of my fix and was unable to find any problems. It may be that just checking CE_SKIP_WORKTREE is a better fix, but I'm not sure. I thought it was a bit safer to strictly reduce the number of cases where we skip the up-to-date check rather than just toggling which kind of cases skip it, and thus went with the current variant of the fix. Note 2: I also wondered if verify_absent() might have a similar bug, but despite my attempts to try to devise a testcase that would trigger such a thing, I couldn't find any problematic testcases. Thus, this patch makes no attempt to apply similar changes to verify_absent() and verify_absent_if_directory(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05Merge branch 'en/keep-cwd'Junio C Hamano1-6/+24
Many git commands that deal with working tree files try to remove a directory that becomes empty (i.e. "git switch" from a branch that has the directory to another branch that does not would attempt remove all files in the directory and the directory itself). This drops users into an unfamiliar situation if the command was run in a subdirectory that becomes subject to removal due to the command. The commands have been taught to keep an empty directory if it is the directory they were started in to avoid surprising users. * en/keep-cwd: t2501: simplify the tests since we can now assume desired behavior dir: new flag to remove_dir_recurse() to spare the original_cwd dir: avoid incidentally removing the original_cwd in remove_path() stash: do not attempt to remove startup_info->original_cwd rebase: do not attempt to remove startup_info->original_cwd clean: do not attempt to remove startup_info->original_cwd symlinks: do not include startup_info->original_cwd in dir removal unpack-trees: add special cwd handling unpack-trees: refuse to remove startup_info->original_cwd setup: introduce startup_info->original_cwd t2501: add various tests for removing the current working directory
2021-12-15Merge branch 'ds/sparse-deep-pattern-checkout-fix'Junio C Hamano1-6/+8
The sparse-index/sparse-checkout feature had a bug in its use of the matching code to determine which path is in or outside the sparse checkout patterns. * ds/sparse-deep-pattern-checkout-fix: unpack-trees: use traverse_path instead of name t1092: add deeper changes during a checkout
2021-12-10Merge branch 'vd/sparse-reset'Junio C Hamano1-6/+17
Various operating modes of "git reset" have been made to work better with the sparse index. * vd/sparse-reset: unpack-trees: improve performance of next_cache_entry reset: make --mixed sparse-aware reset: make sparse-aware (except --mixed) reset: integrate with sparse index reset: expand test coverage for sparse checkouts sparse-index: update command for expand/collapse test reset: preserve skip-worktree bit in mixed reset reset: rename is_missing to !is_in_reset_tree
2021-12-09unpack-trees: add special cwd handlingElijah Newren1-2/+11
When running commands such as `git reset --hard` from a subdirectory, if that subdirectory is in the way of adding needed files, bail with an error message. Note that this change looks kind of like it duplicates the new lines of code from the previous commit in verify_clean_subdirectory(). However, when we are preserving untracked files, we would rather any error messages about untracked files being in the way take precedence over error messages about a subdirectory that happens to be the_original_cwd being in the way. But in the UNPACK_RESET_OVERWRITE_UNTRACKED case, there is no untracked checking to be done, so we simply add a special case near the top of verify_absent_1. Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09unpack-trees: refuse to remove startup_info->original_cwdElijah Newren1-4/+13
In the past, when a directory needs to be removed to make room for a file, we have always errored out when that directory contains any untracked (but not ignored) files. Add an extra condition on that: also error out if the directory is the current working directory we inherited from our parent process. Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06unpack-trees: use traverse_path instead of nameDerrick Stolee1-6/+8
The sparse_dir_matches_path() method compares a cache entry that is a sparse directory entry against a 'struct traverse_info *info' and a 'struct name_entry *p' to see if the cache entry has exactly the right name for those other inputs. This method was introduced in 523506d (unpack-trees: unpack sparse directory entries, 2021-07-14), but included a significant mistake. The path comparisons used 'info->name' instead of 'info->traverse_path'. Since 'info->name' only stores a single tree entry name while 'info->traverse_path' stores the full path from root, this method does not work when 'info' is in a subdirectory of a directory. Replacing the right strings and their corresponding lengths make the method work properly. The previous change included a failing test that exposes this issue. That test now passes. The critical detail is that as we go deep into unpack_trees(), the logic for merging a sparse directory entry with a tree entry during 'git checkout' relies on this sparse_dir_matches_path() in order to avoid calling traverse_trees_recursive() during unpack_callback() in this hunk: if (!is_sparse_directory_entry(src[0], names, info) && traverse_trees_recursive(n, dirmask, mask & ~dirmask, names, info) < 0) { return -1; } For deep paths, the short-circuit never occurred and traverse_trees_recursive() was being called incorrectly and that was causing other strange issues. Specifically, the error message from the now-passing test previously included this: error: Your local changes to the following files would be overwritten by checkout: deep/deeper1/deepest2/a deep/deeper1/deepest3/a Please commit your changes or stash them before you switch branches. Aborting These messages occurred because the 'current' cache entry in twoway_merge() was showing as NULL because the index did not contain entries for the paths contained within the sparse directory entries. We instead had 'oldtree' given as the entry at HEAD and 'newtree' as the entry in the target tree. This led to reject_merge() listing these paths. Now that sparse_dir_matches_path() works the same for deep paths as it does for shallow depths, the rest of the logic kicks in to properly handle modifying the sparse directory entries as designed. Reported-by: Gustave Granroth <gus.gran@gmail.com> Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-29unpack-trees: improve performance of next_cache_entryVictoria Dye1-6/+17
To find the first non-unpacked cache entry, `next_cache_entry` iterates through index, starting at `cache_bottom`. The performance of this in full indexes is helped by `cache_bottom` advancing with each invocation of `mark_ce_used` (called by `unpack_index_entry`). However, the presence of sparse directories can prevent the `cache_bottom` from advancing in a sparse index case, effectively forcing `next_cache_entry` to search from the beginning of the index each time it is called. The `cache_bottom` must be preserved for the sparse index (see 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14)). Therefore, to retain the benefit `cache_bottom` provides in non-sparse index cases, a separate `hint` position indicates the first position `next_cache_entry` should search, updated each execution with a new position. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-25Merge branch 'ab/unpack-trees-leakfix'Junio C Hamano1-1/+2
Leakfix. * ab/unpack-trees-leakfix: sequencer: fix a memory leak in do_reset() sequencer: add a "goto cleanup" to do_reset() unpack-trees: don't leak memory in verify_clean_subdirectory()
2021-10-13Merge branch 'en/removing-untracked-fixes'Junio C Hamano1-5/+56
Various fixes in code paths that move untracked files away to make room. * en/removing-untracked-fixes: Documentation: call out commands that nuke untracked files/directories Comment important codepaths regarding nuking untracked files/dirs unpack-trees: avoid nuking untracked dir in way of locally deleted file unpack-trees: avoid nuking untracked dir in way of unmerged file Change unpack_trees' 'reset' flag into an enum Remove ignored files by default when they are in the way unpack-trees: make dir an internal-only struct unpack-trees: introduce preserve_ignored to unpack_trees_options read-tree, merge-recursive: overwrite ignored files by default checkout, read-tree: fix leak of unpack_trees_options.dir t2500: add various tests for nuking untracked files
2021-10-07unpack-trees: don't leak memory in verify_clean_subdirectory()Ævar Arnfjörð Bjarmason1-1/+2
Fix two different but related memory leaks in verify_clean_subdirectory(). We leaked both the "pathbuf" if read_directory() returned non-zero, and we never cleaned up our own "struct dir_struct" either. * "pathbuf": When the read_directory() call followed by the free(pathbuf) was added in c81935348be (Fix switching to a branch with D/F when current branch has file D., 2007-03-15) we didn't bother to free() before we called die(). But when this code was later libified in 203a2fe1170 (Allow callers of unpack_trees() to handle failure, 2008-02-07) we started to leak as we returned data to the caller. This fixes that memory leak, which can be observed under SANITIZE=leak with e.g. the "t1001-read-tree-m-2way.sh" test. * "struct dir_struct": We've leaked the dir_struct ever since this code was added back in c81935348be. When that commit was written there wasn't an equivalent of dir_clear(). Since it was added in 270be816049 (dir.c: provide clear_directory() for reclaiming dir_struct memory, 2013-01-06) we've omitted freeing the memory allocated here. This memory leak could also be observed under SANITIZE=leak and the "t1001-read-tree-m-2way.sh" test. This makes all the test in "t1001-read-tree-m-2way.sh" pass under "GIT_TEST_PASSING_SANITIZE_LEAK=true", we'd previously die in tests 25, 26 & 28. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27unpack-trees: avoid nuking untracked dir in way of locally deleted fileElijah Newren1-0/+3
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27unpack-trees: avoid nuking untracked dir in way of unmerged fileElijah Newren1-4/+31
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27Change unpack_trees' 'reset' flag into an enumElijah Newren1-1/+9
Traditionally, unpack_trees_options->reset was used to signal that it was okay to delete any untracked files in the way. This was used by `git read-tree --reset`, but then started appearing in other places as well. However, many of the other uses should not be deleting untracked files in the way. Change this value to an enum so that a value of 1 (i.e. "true") can be split into two: UNPACK_RESET_PROTECT_UNTRACKED, UNPACK_RESET_OVERWRITE_UNTRACKED In order to catch accidental misuses (i.e. where folks call it the way they traditionally used to), define the special enum value of UNPACK_RESET_INVALID = 1 which will trigger a BUG(). Modify existing callers so that read-tree --reset reset --hard checkout --force continue using the UNPACK_RESET_OVERWRITE_UNTRACKED logic, while other callers, including am checkout without --force stash (though currently dead code; reset always had a value of 0) numerous callers from rebase/sequencer to reset_head() will use the new UNPACK_RESET_PROTECT_UNTRACKED value. Also, note that it has been reported that 'git checkout <treeish> <pathspec>' currently also allows overwriting untracked files[1]. That case should also be fixed, but it does not use unpack_trees() and thus is outside the scope of the current changes. [1] https://lore.kernel.org/git/15dad590-087e-5a48-9238-5d2826950506@gmail.com/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27unpack-trees: make dir an internal-only structElijah Newren1-2/+5
Avoid accidental misuse or confusion over ownership by clearly making unpack_trees_options.dir an internal-only variable. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27unpack-trees: introduce preserve_ignored to unpack_trees_optionsElijah Newren1-0/+10
Currently, every caller of unpack_trees() that wants to ensure ignored files are overwritten by default needs to: * allocate unpack_trees_options.dir * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags * call setup_standard_excludes AND then after the call to unpack_trees() needs to * call dir_clear() * deallocate unpack_trees_options.dir That's a fair amount of boilerplate, and every caller uses identical code. Make this easier by instead introducing a new boolean value where the default value (0) does what we want so that new callers of unpack_trees() automatically get the appropriate behavior. And move all the handling of unpack_trees_options.dir into unpack_trees() itself. While preserve_ignored = 0 is the behavior we feel is the appropriate default, we defer fixing commands to use the appropriate default until a later commit. So, this commit introduces several locations where we manually set preserve_ignored=1. This makes it clear where code paths were previously preserving ignored files when they should not have been; a future commit will flip these to instead use a value of 0 to get the behavior we want. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-20Merge branch 'ds/sparse-index-ignored-files'Junio C Hamano1-3/+5
In cone mode, the sparse-index code path learned to remove ignored files (like build artifacts) outside the sparse cone, allowing the entire directory outside the sparse cone to be removed, which is especially useful when the sparse patterns change. * ds/sparse-index-ignored-files: sparse-checkout: clear tracked sparse dirs sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag attr: be careful about sparse directories sparse-checkout: create helper methods sparse-index: use WRITE_TREE_MISSING_OK sparse-index: silently return when cache tree fails unpack-trees: fix nested sparse-dir search sparse-index: silently return when not using cone-mode patterns t7519: rewrite sparse index test
2021-09-10Merge branch 'ab/retire-advice-config'Junio C Hamano1-9/+9
Code clean up to migrate callers from older advice_config[] based API to newer advice_if_enabled() and advice_enabled() API. * ab/retire-advice-config: advice: move advice.graftFileDeprecated squashing to commit.[ch] advice: remove use of global advice_add_embedded_repo advice: remove read uses of most global `advice_` variables advice: add enum variants for missing advice variables
2021-09-07unpack-trees: fix nested sparse-dir searchDerrick Stolee1-3/+5
The iterated search in find_cache_entry() was recently modified to include a loop that searches backwards for a sparse directory entry that matches the given traverse_info and name_entry. However, the string comparison failed to actually concatenate those two strings, so this failed to find a sparse directory when it was not a top-level directory. This caused some errors in rare cases where a 'git checkout' spanned a diff that modified files within the sparse directory entry, but we could not correctly find the entry. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-26checkout: make delayed checkout respect --quiet and --no-progressMatheus Tavares1-1/+1
The 'Filtering contents...' progress report from delayed checkout is displayed even when checkout and clone are invoked with --quiet or --no-progress. Furthermore, it is displayed unconditionally, without first checking whether stdout is a tty. Let's fix these issues and also add some regression tests for the two code paths that currently use delayed checkout: unpack_trees.c:check_updates() and builtin/checkout.c:checkout_worktree(). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25advice: remove read uses of most global `advice_` variablesBen Boeckel1-9/+9
In c4a09cc9ccb (Merge branch 'hw/advise-ng', 2020-03-25), a new API for accessing advice variables was introduced and deprecated `advice_config` in favor of a new array, `advice_setting`. This patch ports all but two uses which read the status of the global `advice_` variables over to the new `advice_enabled` API. We'll deal with advice_add_embedded_repo and advice_graft_file_deprecated separately. Signed-off-by: Ben Boeckel <mathstuf@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-04Merge branch 'ds/commit-and-checkout-with-sparse-index'Junio C Hamano1-0/+11
"git checkout" and "git commit" learn to work without unnecessarily expanding sparse indexes. * ds/commit-and-checkout-with-sparse-index: unpack-trees: resolve sparse-directory/file conflicts t1092: document bad 'git checkout' behavior checkout: stop expanding sparse indexes sparse-index: recompute cache-tree commit: integrate with sparse-index p2000: compress repo names p2000: add 'git checkout -' test and decrease depth
2021-08-02Merge branch 'jt/bulk-prefetch'Junio C Hamano1-19/+8
"git read-tree" had a codepath where blobs are fetched one-by-one from the promisor remote, which has been corrected to fetch in bulk. * jt/bulk-prefetch: cache-tree: prefetch in partial clone read-tree unpack-trees: refactor prefetching code
2021-07-23unpack-trees: refactor prefetching codeJonathan Tan1-19/+8
Refactor the prefetching code in unpack-trees.c into its own function, because it will be used elsewhere in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20unpack-trees: resolve sparse-directory/file conflictsDerrick Stolee1-0/+11
When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14unpack-trees: unpack sparse directory entriesDerrick Stolee1-8/+99
During unpack_callback(), index entries are compared against tree entries. These are matched according to names and types. One goal is to decide if we should recurse into subtrees or simply operate on one index entry. In the case of a sparse-directory entry, we do not want to recurse into that subtree and instead simply compare the trees. In some cases, we might want to perform a merge operation on the entry, such as during 'git checkout <commit>' which wants to replace a sparse tree entry with the tree for that path at the target commit. We extend the logic within unpack_single_entry() to create a sparse-directory entry in this case, and then that is sent to call_unpack_fn(). There are some subtleties in this process. For instance, we need to update find_cache_entry() to allow finding a sparse-directory entry that exactly matches a given path. Use the new helper method sparse_dir_matches_path() for this. We also need to ignore conflict markers in the case that the entries correspond to directories and we already have a sparse directory entry. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14unpack-trees: rename unpack_nondirectories()Derrick Stolee1-7/+7
In the next change, we will use this method to unpack a sparse directory entry, so change the name to unpack_single_entry() so these entries apply. The new name reflects that we will not recurse into trees in order to resolve the conflicts. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14unpack-trees: compare sparse directories correctlyDerrick Stolee1-1/+13
As we further integrate the sparse-index into unpack-trees, we need to ensure that we compare sparse directory entries correctly with other entries. This affects searching for an exact path as well as sorting index entries. Sparse directory entries contain the trailing directory separator. This is important for the sorting, in particular. Thus, within do_compare_entry() we stop using S_IFREG in all cases, since sparse directories should use S_IFDIR to indicate that the comparison should treat the entry name as a dirctory. Within compare_entry(), it first calls do_compare_entry() to check the leading portion of the name. When the input path is a directory name, we could match exactly already. Thus, we should return 0 if we have an exact string match on a sparse directory entry. The final check is a length comparison between the strings. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14unpack-trees: preserve cache_bottomDerrick Stolee1-0/+7
The cache_bottom member of 'struct unpack_trees_options' is used to track the range of index entries corresponding to a node of the cache tree. While recursing with traverse_by_cache_tree(), this value is preserved on the call stack using a local and then restored as that method returns. The mark_ce_used() method normally modifies the cache_bottom member when it refers to the marked cache entry. However, sparse directory entries are stored as nodes in the cache-tree data structure as of 2de37c53 (cache-tree: integrate with sparse directory entries, 2021-03-30). Thus, the cache_bottom will be modified as the cache-tree walk advances. Do not update it as well within mark_ce_used(). Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-16Merge branch 'mt/parallel-checkout-part-3'Junio C Hamano1-1/+1
The final part of "parallel checkout". * mt/parallel-checkout-part-3: ci: run test round with parallel-checkout enabled parallel-checkout: add tests related to .gitattributes t0028: extract encoding helpers to lib-encoding.sh parallel-checkout: add tests related to path collisions parallel-checkout: add tests for basic operations checkout-index: add parallel checkout support builtin/checkout.c: complete parallel checkout support make_transient_cache_entry(): optionally alloc from mem_pool
2021-05-05make_transient_cache_entry(): optionally alloc from mem_poolMatheus Tavares1-1/+1
Allow make_transient_cache_entry() to optionally receive a mem_pool struct in which it should allocate the entry. This will be used in the following patch, to store some transient entries which should persist until parallel checkout finishes. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'mt/parallel-checkout-part-2'Junio C Hamano1-3/+16
The checkout machinery has been taught to perform the actual write-out of the files in parallel when able. * mt/parallel-checkout-part-2: parallel-checkout: add design documentation parallel-checkout: support progress displaying parallel-checkout: add configuration options parallel-checkout: make it truly parallel unpack-trees: add basic support for parallel checkout
2021-04-30Merge branch 'ds/sparse-index-protections'Junio C Hamano1-3/+14
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-04-19parallel-checkout: support progress displayingMatheus Tavares1-3/+8
Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19parallel-checkout: add configuration optionsMatheus Tavares1-3/+7
Make parallel checkout configurable by introducing two new settings: checkout.workers and checkout.thresholdForParallelism. The first defines the number of workers (where one means sequential checkout), and the second defines the minimum number of entries to attempt parallel checkout. To decide the default value for checkout.workers, the parallel version was benchmarked during three operations in the linux repo, with cold cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The four tables below show the mean run times and standard deviations for 5 runs in: a local file system on SSD, a local file system on HDD, a Linux NFS server, and Amazon EFS (all on Linux). Each parallel checkout test was executed with the number of workers that brings the best overall results in that environment. Local SSD: Sequential 10 workers Speedup Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03 Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03 Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03 Local HDD: Sequential 10 workers Speedup Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03 Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18 Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07 Linux NFS server (v4.1, on EBS, single availability zone): Sequential 32 workers Speedup Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13 Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07 Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08 EFS (v4.1, replicated over multiple availability zones): Sequential 32 workers Speedup Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07 Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03 Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03 The above benchmarks show that parallel checkout is most effective on repositories located on an SSD or over a distributed file system. For local file systems on spinning disks, and/or older machines, the parallelism does not always bring a good performance. For this reason, the default value for checkout.workers is one, a.k.a. sequential checkout. To decide the default value for checkout.thresholdForParallelism, another benchmark was executed in the "Local SSD" setup, where parallel checkout showed to be beneficial. This time, we compared the runtime of a `git checkout -f`, with and without parallelism, after randomly removing an increasing number of files from the Linux working tree. The "sequential fallback" column below corresponds to the executions where checkout.workers was 10 but checkout.thresholdForParallelism was equal to the number of to-be-updated files plus one (so that we end up writing sequentially). Each test case was sampled 15 times, and each sample had a randomly different set of files removed. Here are the results: sequential fallback 10 workers speedup 10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02 20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02 50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02 100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04 200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05 500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09 1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12 2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12 From the above numbers, 100 files seems to be a reasonable default value for the threshold setting. Note: Up to 1000 files, we observe a drop in the execution time of the parallel code with an increase in the number of files. This is a rather odd behavior, but it was observed in multiple repetitions. Above 1000 files, the execution time increases according to the number of files, as one would expect. About the test environments: Local SSD tests were executed on an i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing, the linux repository was removed (or checked out back to its previous state), and `sync && sysctl vm.drop_caches=3` was executed. Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19unpack-trees: add basic support for parallel checkoutMatheus Tavares1-1/+5
This new interface allows us to enqueue some of the entries being checked out to later uncompress them, apply in-process filters, and write out the files in parallel. For now, the parallel checkout machinery is enabled by default and there is no user configuration, but run_parallel_checkout() just writes the queued entries in sequence (without spawning additional workers). The next patch will actually implement the parallelism and, later, we will make it configurable. Note that, to avoid potential data races, not all entries are eligible for parallel checkout. Also, paths that collide on disk (e.g. case-sensitive paths in case-insensitive file systems), are detected by the parallel checkout code and skipped, so that they can be safely sequentially handled later. The collision detection works like the following: - If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework detects it by looking for EEXIST and EISDIR errors after an open(O_CREAT | O_EXCL) failure. - If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected at the has_dirs_only_path() check, which is done for the leading path of each item in the parallel checkout queue. Both verifications rely on the fact that, before enqueueing an entry for parallel checkout, checkout_entry() makes sure that there is no file at the entry's path and that its leading components are all real directories. So, any later change in these conditions indicates that there was a collision (either between two parallel-eligible entries or between an eligible and an ineligible one). After all parallel-eligible entries have been processed, the collided (and thus, skipped) entries are sequentially fed to checkout_entry() again. This is similar to the way the current code deals with collisions, overwriting the previously checked out entries with the subsequent ones. The only difference is that, since we no longer create the files in the same order that they appear on index, we are not able to determine which of the colliding entries will survive on disk (for the classic code, it is always the last entry). Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-02Merge branch 'mt/parallel-checkout-part-1'Junio C Hamano1-0/+1
Preparatory API changes for parallel checkout. * mt/parallel-checkout-part-1: entry: add checkout_entry_ca() taking preloaded conv_attrs entry: move conv_attrs lookup up to checkout_entry() entry: extract update_ce_after_write() from write_entry() entry: make fstat_output() and read_blob_entry() public entry: extract a header file for entry.c functions convert: add classification for conv_attrs struct convert: add get_stream_filter_ca() variant convert: add [async_]convert_to_working_tree_ca() variants convert: make convert_attrs() and convert structs public
2021-03-30Merge branch 'mt/checkout-remove-nofollow'Junio C Hamano1-1/+1
When "git checkout" removes a path that does not exist in the commit it is checking out, it wasn't careful enough not to follow symbolic links, which has been corrected. * mt/checkout-remove-nofollow: checkout: don't follow symlinks when removing entries symlinks: update comment on threaded_check_leading_path()
2021-03-30unpack-trees: allow sparse directoriesDerrick Stolee1-3/+7
The index_pos_by_traverse_info() currently throws a BUG() when a directory entry exists exactly in the index. We need to consider that it is possible to have a directory in a sparse index as long as that entry is itself marked with the skip-worktree bit. The 'pos' variable is assigned a negative value if an exact match is not found. Since a directory name can be an exact match, it is no longer an error to have a nonnegative 'pos' value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30unpack-trees: ensure full indexDerrick Stolee1-0/+7
The next change will translate full indexes into sparse indexes at write time. The existing logic provides a way for every sparse index to be expanded to a full index at read time. However, there are cases where an index is written and then continues to be used in-memory to perform further updates. unpack_trees() is frequently called after such a write. In particular, commands like 'git reset' do this double-update of the index. Ensure that we have a full index when entering unpack_trees(), but only when command_requires_full_index is true. This is always true at the moment, but we will later relax that after unpack_trees() is updated to handle sparse directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-23entry: extract a header file for entry.c functionsMatheus Tavares1-0/+1
The declarations of entry.c's public functions and structures currently reside in cache.h. Although not many, they contribute to the size of cache.h and, when changed, cause the unnecessary recompilation of modules that don't really use these functions. So let's move them to a new entry.h header. While at it let's also move a comment related to checkout_entry() from entry.c to entry.h as it's more useful to describe the function there. Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-22Merge branch 'dl/stash-show-untracked'Junio C Hamano1-0/+22
"git stash show" learned to optionally show untracked part of the stash. * dl/stash-show-untracked: stash show: learn stash.showIncludeUntracked stash show: teach --include-untracked and --only-untracked
2021-03-19Merge branch 'js/fsmonitor-unpack-fix'Junio C Hamano1-2/+2
The data structure used by fsmonitor interface was not properly duplicated during an in-core merge, leading to use-after-free etc. * js/fsmonitor-unpack-fix: fsmonitor: do not forget to release the token in `discard_index()` fsmonitor: fix memory corruption in some corner cases
2021-03-18checkout: don't follow symlinks when removing entriesMatheus Tavares1-1/+1
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20), symlink.c:check_leading_path() started returning different codes for FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was not adjusted for this change, so it started to follow symlinks on the leading path of to-be-removed entries. Fix that and add a regression test. Note that since 1d718a5108 check_leading_path() no longer differentiates the case where it found a symlink in the path's leading components from the cases where it found a regular file or failed to lstat() the component. So, a side effect of this current patch is that unlink_entry() now returns early in all of these three cases. And because we no longer try to unlink such paths, we also don't get the warning from remove_or_warn(). For the regular file and symlink cases, it's questionable whether the warning was useful in the first place: unlink_entry() removes tracked paths that should no longer be present in the state we are checking out to. If the path had its leading dir replaced by another file, it means that the basename already doesn't exist, so there is no need for a warning. Sure, we are leaving a regular file or symlink behind at the path's dirname, but this file is either untracked now (so again, no need to warn), or it will be replaced by a tracked file during the next phase of this checkout operation. As for failing to lstat() one of the leading components, the basename might still exist only we cannot unlink it (e.g. due to the lack of the required permissions). Since the user expect it to be removed (especially with checkout's --no-overlay option), add back the warning in this more relevant case. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-17fsmonitor: fix memory corruption in some corner casesJohannes Schindelin1-2/+2
In 56c6910028a (fsmonitor: change last update timestamp on the index_state to opaque token, 2020-01-07), we forgot to adjust the part of `unpack_trees()` that copies the FSMonitor "last-update" information that we copy from the source index to the result index since 679f2f9fdd2 (unpack-trees: skip stat on fsmonitor-valid files, 2019-11-20). Since the "last-update" information is no longer a 64-bit number, but a free-form string that has been allocated, we need to duplicate it rather than just copying it. This is important because there _are_ cases when `unpack_trees()` will perform a oneway merge that implicitly calls `refresh_fsmonitor()` (which will allocate that "last-update" token). This happens _after_ that token was copied into the result index. However, we _then_ call `check_updates()` on that index, which will _also_ call `refresh_fsmonitor()`, accessing the "last-update" string, which by now would be released already. In the instance that lead to this patch, this caused a segmentation fault during a lengthy, complicated rebase involving the todo command `reset` that (crucially) had to updated many files. Unfortunately, it seems very hard to trigger that crash, therefore this patch is not accompanied by a regression test. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-08Sync with Git 2.30.2 for CVE-2021-21300Junio C Hamano1-0/+3
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05stash show: teach --include-untracked and --only-untrackedDenton Liu1-0/+22
Stash entries can be made with untracked files via `git stash push --include-untracked`. However, because the untracked files are stored in the third parent of the stash entry and not the stash entry itself, running `git stash show` does not include the untracked files as part of the diff. With --include-untracked, untracked paths, which are recorded in the third-parent if it exists, are shown in addition to the paths that have modifications between the stash base and the working tree in the stash. It is possible to manually craft a malformed stash entry where duplicate untracked files in the stash entry will mask tracked files. We detect and error out in that case via a custom unpack_trees() callback: stash_worktree_untracked_merge(). Also, teach stash the --only-untracked option which only shows the untracked files of a stash entry. This is similar to `git show stash^3` but it is nice to provide a convenient abstraction for it so that users do not have to think about the underlying implementation. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-12Sync with 2.28.1Johannes Schindelin1-0/+3
* maint-2.28: Git 2.28.1 Git 2.27.1 Git 2.26.3 Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.26.3Johannes Schindelin1-0/+3
* maint-2.26: Git 2.26.3 Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.25.5Johannes Schindelin1-0/+3
* maint-2.25: Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.24.4Johannes Schindelin1-0/+3
* maint-2.24: Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.23.4Johannes Schindelin1-0/+3
* maint-2.23: Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.22.5Johannes Schindelin1-0/+3
* maint-2.22: Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.21.4Johannes Schindelin1-0/+3
* maint-2.21: Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.20.5Johannes Schindelin1-0/+3
* maint-2.20: Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.19.6Johannes Schindelin1-0/+3
* maint-2.19: Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.18.5Johannes Schindelin1-0/+3
* maint-2.18: Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.17.6Johannes Schindelin1-0/+3
* maint-2.17: Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12unpack_trees(): start with a fresh lstat cacheMatheus Tavares1-0/+3
We really want to avoid relying on stale information. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2021-01-23sparse-checkout: load sparse-checkout patternsDerrick Stolee1-5/+1
A future feature will want to load the sparse-checkout patterns into a pattern_list, but the current mechanism to do so is a bit complicated. This is made difficult due to needing to find the sparse-checkout file in different ways throughout the codebase. The logic implemented in the new get_sparse_checkout_patterns() was duplicated in populate_from_existing_patterns() in unpack-trees.c. Use the new method instead, keeping the logic around handling the struct unpack_trees_options. The callers to get_sparse_checkout_filename() in builtin/sparse-checkout.c manipulate the sparse-checkout file directly, so it is not appropriate to replace logic in that file with get_sparse_checkout_patterns(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23cache-tree: clean up cache_tree_update()Derrick Stolee1-2/+0
Make the method safer by allocating a cache_tree member for the given index_state if it is not already present. This is preferrable to a BUG() statement or returning with an error because future callers will want to populate an empty cache-tree using this method. Callers can also remove their conditional allocations of cache_tree. Also drop local variables that can be found directly from the 'istate' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04unpack-trees: add trace2 regionsDerrick Stolee1-0/+5
The unpack_trees() method is quite complicated and its performance can change dramatically depending on how it is used. We already have some performance tracing regions, but they have not been updated to the trace2 API. Do so now. We already have trace2 regions in unpack_trees.c:clear_ce_flags(), which uses a linear scan through the index without recursing into trees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: convert remaining callers away from argv_array nameJeff King1-5/+5
We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '*.c' '*.h' | xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: rename files from argv-array to strvecJeff King1-1/+1
This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '*.c' '*.h' | xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-20Merge branch 'en/sparse-checkout'Junio C Hamano1-3/+3
Consistency fix to a topic already in 'master'. * en/sparse-checkout: unpack-trees: also allow get_progress() to work on a different index
2020-05-15unpack-trees: also allow get_progress() to work on a different indexElijah Newren1-3/+3
commit b0a5a12a60 ("unpack-trees: allow check_updates() to work on a different index", 2020-03-27) allowed check_updates() to work on a different index, but it called get_progress() which was hardcoded to work on o->result much like check_updates() had been. Update it to also accept an index parameter and have check_updates() pass that parameter along so that both are working on the same index. Noticed-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13Merge branch 'ds/sparse-updates-oob-access-fix'Junio C Hamano1-5/+5
The code to skip unmerged paths in the index when sparse checkout is in use would have made out-of-bound access of the in-core index when the last path was unmerged, which has been corrected. * ds/sparse-updates-oob-access-fix: unpack-trees: avoid array out-of-bounds error
2020-05-08Merge branch 'ds/sparse-allow-empty-working-tree'Junio C Hamano1-33/+1
The sparse-checkout patterns have been forbidden from excluding all paths, leaving an empty working tree, for a long time. This limitation has been lifted. * ds/sparse-allow-empty-working-tree: sparse-checkout: stop blocking empty workdirs
2020-05-08unpack-trees: avoid array out-of-bounds errorDerrick Stolee1-5/+5
The loop in warn_conflicted_path() that checks for the count of entries with the same path uses "i+count" for the array entry. However, the loop only verifies that the value of count is below the array size. Fix this by adding i to the condition. I hit this condition during a test of the in-tree sparse-checkout feature, so it is exercised by the end of the series. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> [jc: readability fix] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-04sparse-checkout: stop blocking empty workdirsDerrick Stolee1-33/+1
Remove the error condition when updating the sparse-checkout leaves an empty working directory. This behavior was added in 9e1afb167 (sparse checkout: inhibit empty worktree, 2009-08-20). The comment was added in a7bc906f2 (Add explanation why we do not allow to sparse checkout to empty working tree, 2011-09-22) in response to a "dubious" comment in 84563a624 (unpack-trees.c: cosmetic fix, 2010-12-22). With the recent "cone mode" and "git sparse-checkout init [--cone]" command, it is common to set a reasonable sparse-checkout pattern set of /* !/*/ which matches only files at root. If the repository has no such files, then their "git sparse-checkout init" command will fail. Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory. If it is a confusing result, then the user can recover with "git sparse-checkout disable" or "git sparse-checkout set". This is especially simple when using cone mode. Reported-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29Merge branch 'en/sparse-checkout'Junio C Hamano1-55/+200
"sparse-checkout" UI improvements. * en/sparse-checkout: sparse-checkout: provide a new reapply subcommand unpack-trees: failure to set SKIP_WORKTREE bits always just a warning unpack-trees: provide warnings on sparse updates for unmerged paths too unpack-trees: make sparse path messages sound like warnings unpack-trees: split display_error_msgs() into two unpack-trees: rename ERROR_* fields meant for warnings to WARNING_* unpack-trees: move ERROR_WOULD_LOSE_SUBMODULE earlier sparse-checkout: use improved unpack_trees porcelain messages sparse-checkout: use new update_sparsity() function unpack-trees: add a new update_sparsity() function unpack-trees: pull sparse-checkout pattern reading into a new function unpack-trees: do not mark a dirty path with SKIP_WORKTREE unpack-trees: allow check_updates() to work on a different index t1091: make some tests a little more defensive against failures unpack-trees: simplify pattern_list freeing unpack-trees: simplify verify_absent_sparse() unpack-trees: remove unused error type unpack-trees: fix minor typo in comment
2020-04-28Merge branch 'jt/avoid-prefetch-when-able-in-diff'Junio C Hamano1-3/+2
"git diff" in a partial clone learned to avoid lazy loading blob objects in more casese when they are not needed. * jt/avoid-prefetch-when-able-in-diff: diff: restrict when prefetching occurs diff: refactor object read diff: make diff_populate_filespec_options struct promisor-remote: accept 0 as oid_nr in function
2020-04-02promisor-remote: accept 0 as oid_nr in functionJonathan Tan1-3/+2
There are 3 callers to promisor_remote_get_direct() that first check if the number of objects to be fetched is equal to 0. Fold that check into promisor_remote_get_direct(), and in doing so, be explicit as to what promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success, immediately). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: failure to set SKIP_WORKTREE bits always just a warningElijah Newren1-16/+15
Setting and clearing of the SKIP_WORKTREE bit is not only done when users run 'sparse-checkout'; other commands such as 'checkout' also run through unpack_trees() which has logic for handling this special bit. As such, we need to consider how they handle special cases. A couple comparison points should help explain the rationale for changing how unpack_trees() handles these bits: Ignoring sparse checkouts for a moment, if you are switching branches and have dirty changes, it is only considered an error that will prevent the branch switching from being successful if the dirty file happens to be one of the paths with different contents. SKIP_WORKTREE has always been considered advisory; for example, if rebase or merge need or even want to materialize a path as part of their work, they have always been allowed to do so regardless of the SKIP_WORKTREE setting. This has been used for unmerged paths, but it was often used for paths it wasn't needed just because it made the code simpler. It was a best-effort consideration, and when it materialized paths contrary to the SKIP_WORKTREE setting, it was never required to even print a warning message. In the past if you trying to run e.g. 'git checkout' and: 1) you had a path that was materialized and had some dirty changes 2) the path was listed in $GITDIR/info/sparse-checkout 3) this path did not different between the current and target branches then despite the comparison points above, the inability to set SKIP_WORKTREE was treated as a *hard* error that would abort the checkout operation. This is completely inconsistent with how SKIP_WORKTREE is handled elsewhere, and rather annoying for users as leaving the paths materialized in the working copy (with a simple warning) should present no problem at all. Downgrade any errors from inability to toggle the SKIP_WORKTREE bit to a warning and allow the operations to continue. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: provide warnings on sparse updates for unmerged paths tooElijah Newren1-0/+30
When sparse-checkout runs to update the list of sparsity patterns, it gives warnings if it can't remove paths from the working tree because those files have dirty changes. Add a similar warning for unmerged paths as well. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: make sparse path messages sound like warningsElijah Newren1-4/+4
The messages for problems with sparse paths are phrased as errors that cause the operation to abort, even though we are not making the operation abort. Reword the messages to make sense in their new context. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: split display_error_msgs() into twoElijah Newren1-8/+42
display_error_msgs() is never called to show messages of both ERROR_* and WARNING_* types at the same time; it is instead called multiple times, separately for each type. Since we want to display these types differently, make two slightly different versions of this function. A subsequent commit will further modify unpack_trees() and how it calls the new display_warning_msgs(). Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: rename ERROR_* fields meant for warnings to WARNING_*Elijah Newren1-6/+6
We want to treat issues with setting the SKIP_WORKTREE bit as a warning rather than an error; rename the enum values to reflect this intent as a simple step towards that goal. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: move ERROR_WOULD_LOSE_SUBMODULE earlierElijah Newren1-6/+6
A minor change, but we want to convert the sparse messages to warnings and this allows us to group warnings and errors. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: add a new update_sparsity() functionElijah Newren1-0/+77
Previously, the only way to update the SKIP_WORKTREE bits for various paths was invoking `git read-tree -mu HEAD` or calling the same code that this codepath invoked. This however had a number of problems if the index or working directory were not clean. First, let's consider the case: Flipping SKIP_WORKTREE -> !SKIP_WORKTREE (materializing files) If the working tree was clean this was fine, but if there were files or directories or symlinks or whatever already present at the given path then the operation would abort with an error. Let's label this case for later discussion: A) There is an untracked path in the way Now let's consider the opposite case: Flipping !SKIP_WORKTREE -> SKIP_WORKTREE (removing files) If the index and working tree was clean this was fine, but if there were any unclean paths we would run into problems. There are three different cases to consider: B) The path is unmerged C) The path has unstaged changes D) The path has staged changes (differs from HEAD) If any path fell into case B or C, then the whole operation would be aborted with an error. With sparse-checkout, the whole operation would be aborted for case D as well, but for its predecessor of using `git read-tree -mu HEAD` directly, any paths that fell into case D would be removed from the working copy and the index entry for that path would be reset to match HEAD -- which looks and feels like data loss to users (only a few are even aware to ask whether it can be recovered, and even then it requires walking through loose objects trying to match up the right ones). Refusing to remove files that have unsaved user changes is good, but refusing to work on any other paths is very problematic for users. If the user is in the middle of a rebase or has made modifications to files that bring in more dependencies, then for their build to work they need to update the sparse paths. This logic has been preventing them from doing so. Sometimes in response, the user will stage the files and re-try, to no avail with sparse-checkout or to the horror of losing their changes if they are using its predecessor of `git read-tree -mu HEAD`. Add a new update_sparsity() function which will not error out in any of these cases but behaves as follows for the special cases: A) Leave the file in the working copy alone, clear the SKIP_WORKTREE bit, and print a warning (thus leaving the path in a state where status will report the file as modified, which seems logical). B) Do NOT mark this path as SKIP_WORKTREE, and leave it as unmerged. C) Do NOT mark this path as SKIP_WORKTREE and print a warning about the dirty path. D) Mark the path as SKIP_WORKTREE, but do not revert the version stored in the index to match HEAD; leave the contents alone. I tried a different behavior for A (leave the SKIP_WORKTREE bit set), but found it very surprising and counter-intuitive (e.g. the user sees it is present along with all the other files in that directory, tries to stage it, but git add ignores it since the SKIP_WORKTREE bit is set). A & C seem like optimal behavior to me. B may be as well, though I wonder if printing a warning would be an improvement. Some might be slightly surprised by D at first, but given that it does the right thing with `git commit` and even `git commit -a` (`git add` ignores entries that are marked SKIP_WORKTREE and thus doesn't delete them, and `commit -a` is similar), it seems logical to me. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: pull sparse-checkout pattern reading into a new functionElijah Newren1-8/+16
Create a populate_from_existing_patterns() function for reading the path_patterns from $GIT_DIR/info/sparse-checkout so that we can re-use it elsewhere. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: do not mark a dirty path with SKIP_WORKTREEElijah Newren1-1/+4
If a path is dirty, removing from the working tree risks losing data. As such, we want to make sure any such path is not marked with SKIP_WORKTREE. While the current callers of this code detect this case and re-populate with a previous set of sparsity patterns, we want to allow some paths to be marked with SKIP_WORKTREE while others are left unmarked without it being considered an error. The reason this shouldn't be considered an error is that SKIP_WORKTREE has always been an advisory-only setting; merge and rebase for example were free to materialize paths and clear the SKIP_WORKTREE bit in order to accomplish their work even though they kept the SKIP_WORKTREE bit set for other paths. Leaving dirty working files in the working tree is thus a natural extension of what we have already been doing. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: allow check_updates() to work on a different indexElijah Newren1-3/+3
check_updates() previously assumed it was working on o->result. We want to use this function in combination with a different index_state, so take the intended index_state as a parameter. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: simplify pattern_list freeingElijah Newren1-2/+4
commit e091228e17 ("sparse-checkout: update working directory in-process", 2019-11-21) allowed passing a pre-defined set of patterns to unpack_trees(). However, if o->pl was NULL, it would still read the existing patterns and use those. If those patterns were read into a data structure that was allocated, naturally they needed to be free'd. However, despite the same function being responsible for knowing about both the allocation and the free'ing, the logic for tracking whether to free the pattern_list was hoisted to an outer function with an additional flag in unpack_trees_options. Put the logic back in the relevant function and discard the now unnecessary flag. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: simplify verify_absent_sparse()Elijah Newren1-6/+2
verify_absent_sparse() was introduced in commit 08402b0409 ("merge-recursive: distinguish "removed" and "overwritten" messages", 2010-08-11), and has always had exactly one caller which always passes error_type == ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN. This function then checks whether error_type is this value, and if so, sets it instead to ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN. It has been nearly a decade and no other caller has been created, and no other value has ever been passed, so just pass the expected value to begin with. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: remove unused error typeElijah Newren1-4/+0
commit 08402b0409 ("merge-recursive: distinguish "removed" and "overwritten" messages", 2010-08-11) split ERROR_WOULD_LOSE_UNTRACKED into both ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN ERROR_WOULD_LOSE_UNTRACKED_REMOVED and also split ERROR_WOULD_LOSE_ORPHANED into both ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN ERROR_WOULD_LOSE_ORPHANED_REMOVED However, despite the split only three of these four types were used. ERROR_WOULD_LOSE_ORPHANED_REMOVED was not put into use when it was introduced and nothing else has used it in the intervening decade either. Remove it. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: fix minor typo in commentElijah Newren1-1/+1
Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-26Merge branch 'bc/filter-process'Junio C Hamano1-0/+1
Provide more information (e.g. the object of the tree-ish in which the blob being converted appears, in addition to its path, which has already been given) to smudge/clean conversion filters. * bc/filter-process: t0021: test filter metadata for additional cases builtin/reset: compute checkout metadata for reset builtin/rebase: compute checkout metadata for rebases builtin/clone: compute checkout metadata for clones builtin/checkout: compute checkout metadata for checkouts convert: provide additional metadata to filters convert: permit passing additional metadata to filter processes builtin/checkout: pass branch info down to checkout_worktree
2020-03-26Merge branch 'pb/recurse-submodules-fix'Junio C Hamano1-5/+2
Fix "git checkout --recurse-submodules" of a nested submodule hierarchy. * pb/recurse-submodules-fix: t/lib-submodule-update: add test removing nested submodules unpack-trees: check for missing submodule directory in merged_entry unpack-trees: remove outdated description for verify_clean_submodule t/lib-submodule-update: move a test to the right section t/lib-submodule-update: remove outdated test description t7112: remove mention of KNOWN_FAILURE_SUBMODULE_RECURSIVE_NESTED
2020-03-17Merge branch 'en/simplify-check-updates-in-unpack-trees' into maintJunio C Hamano1-12/+14
Code simplification. * en/simplify-check-updates-in-unpack-trees: unpack-trees: exit check_updates() early if updates are not wanted
2020-03-17Merge branch 'jk/clang-sanitizer-fixes' into maintJunio C Hamano1-1/+1
C pedantry ;-) fix. * jk/clang-sanitizer-fixes: obstack: avoid computing offsets from NULL pointer xdiff: avoid computing non-zero offset from NULL pointer avoid computing zero offsets from NULL pointer merge-recursive: use subtraction to flip stage merge-recursive: silence -Wxor-used-as-pow warning
2020-03-16builtin/checkout: compute checkout metadata for checkoutsbrian m. carlson1-0/+1
Provide commit metadata for checkout code paths that use unpack_trees and friends. When we're checking out a commit, use the commit information, but don't provide commit information if we're checking out from the index, since there need not be any particular commit associated with the index, and even if there is one, we can't know what it is. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-19unpack-trees: check for missing submodule directory in merged_entryPhilippe Blain1-2/+2
Using `git checkout --recurse-submodules` to switch between a branch with no submodules and a branch with initialized nested submodules currently causes a fatal error: $ git checkout --recurse-submodules branch-with-nested-submodules fatal: exec '--super-prefix=submodule/nested/': cd to 'nested' failed: No such file or directory error: Submodule 'nested' could not be updated. error: Submodule 'submodule/nested' cannot checkout new HEAD. error: Submodule 'submodule' could not be updated. M submodule Switched to branch 'branch-with-nested-submodules' The checkout succeeds but the worktree and index of the first level submodule are left empty: $ cd submodule $ git -c status.submoduleSummary=1 status HEAD detached at b3ce885 Changes to be committed: (use "git restore --staged <file>..." to unstage) deleted: .gitmodules deleted: first.t deleted: nested fatal: not a git repository: 'nested/.git' Submodule changes to be committed: * nested 1e96f59...0000000: $ git ls-files -s $ # empty $ ls -A .git The reason for the fatal error during the checkout is that a child git process tries to cd into the yet unexisting nested submodule directory. The sequence is the following: 1. The main git process (the one running in the superproject) eventually reaches write_entry() in entry.c, which creates the first level submodule directory and then calls submodule_move_head() in submodule.c, which spawns `git read-tree` in the submodule directory. 2. The first child git process (the one in the submodule of the superproject) eventually calls check_submodule_move_head() at unpack_trees.c:2021, which calls submodule_move_head in dry-run mode, which spawns `git read-tree` in the nested submodule directory. 3. The second child git process tries to chdir() in the yet unexisting nested submodule directory in start_command() at run-command.c:829 and dies before exec'ing. The reason why check_submodule_move_head() is reached in the first child and not in the main process is that it is inside an if(submodule_from_ce()) construct, and submodule_from_ce() returns a valid struct submodule pointer, whereas it returns a null pointer in the main git process. The reason why submodule_from_ce() returns a null pointer in the main git process is because the call to cache_lookup_path() in config_from() (called from submodule_from_path() in submodule_from_ce()) returns a null pointer since the hashmap "for_path" in the submodule_cache of the_repository is not yet populated. It is not populated because both repo_get_oid(repo, GITMODULES_INDEX, &oid) and repo_get_oid(repo, GITMODULES_HEAD, &oid) in config_from_gitmodules() at submodule-config.c:639-640 return -1, as at this stage of the operation, neither the HEAD of the superproject nor its index contain any .gitmodules file. In contrast, in the first child the hashmap is populated because repo_get_oid(repo, GITMODULES_HEAD, &oid) returns 0 as the HEAD of the first level submodule, i.e. .git/modules/submodule/HEAD, points to a commit where .gitmodules is present and records 'nested' as a submodule. Fix this bug by checking that the submodule directory exists before calling check_submodule_move_head() in merged_entry() in the `if(!old)` branch, i.e. if going from a commit with no submodule to a commit with a submodule present. Also protect the other call to check_submodule_move_head() in merged_entry() the same way as it is safer, even though the `else if (!(old->ce_flags & CE_CONFLICTED))` branch of the code is not at play in the present bug. The other calls to check_submodule_move_head() in other functions in unpack_trees.c are all already protected by calls to lstat() somewhere in the program flow so we don't need additional protection for them. All commands in the unpack_trees machinery are affected, i.e. checkout, reset and read-tree when called with the --recurse-submodules flag. This bug was first reported in [1]. [1] https://lore.kernel.org/git/7437BB59-4605-48EC-B05E-E2BDB2D9DABC@gmail.com/ Reported-by: Philippe Blain <levraiphilippeblain@gmail.com> Reported-by: Damien Robert <damien.olivier.robert@gmail.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-19unpack-trees: remove outdated description for verify_clean_submodulePhilippe Blain1-3/+0
The function verify_clean_submodule() learned to verify if a submodule working tree is clean in a7bc845a9a (unpack-trees: check if we can perform the operation for submodules, 2017-03-14), but the commented description above it was not updated to reflect that, such that this description has been outdated since then. Since Git has now learned to optionnally recursively check out submodules during a superproject checkout, remove this outdated description. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14Merge branch 'ds/sparse-checkout-harden'Junio C Hamano1-1/+1
Some rough edges in the sparse-checkout feature, especially around the cone mode, have been cleaned up. * ds/sparse-checkout-harden: sparse-checkout: fix cone mode behavior mismatch sparse-checkout: improve docs around 'set' in cone mode sparse-checkout: escape all glob characters on write sparse-checkout: use C-style quotes in 'list' subcommand sparse-checkout: unquote C-style strings over --stdin sparse-checkout: write escaped patterns in cone mode sparse-checkout: properly match escaped characters sparse-checkout: warn on globs in cone patterns sparse-checkout: detect short patterns sparse-checkout: cone mode does not recognize "**" sparse-checkout: fix documentation typo for core.sparseCheckoutCone clone: fix --sparse option with URLs sparse-checkout: create leading directories t1091: improve here-docs t1091: use check_files to reduce boilerplate
2020-02-14Merge branch 'mt/threaded-grep-in-object-store'Junio C Hamano1-2/+2
Traditionally, we avoided threaded grep while searching in objects (as opposed to files in the working tree) as accesses to the object layer is not thread-safe. This limitation is getting lifted. * mt/threaded-grep-in-object-store: grep: use no. of cores as the default no. of threads grep: move driver pre-load out of critical section grep: re-enable threads in non-worktree case grep: protect packed_git [re-]initialization grep: allow submodule functions to run in parallel submodule-config: add skip_if_read option to repo_read_gitmodules() grep: replace grep_read_mutex by internal obj read lock object-store: allow threaded access to object reading replace-object: make replace operations thread-safe grep: fix racy calls in grep_objects() grep: fix race conditions at grep_submodule() grep: fix race conditions on userdiff calls
2020-02-14Merge branch 'ds/sparse-cone' into maintJunio C Hamano1-2/+2
The code recently added in this release to move to the entry beyond the ones in the same directory in the index in the sparse-cone mode did not count the number of entries to skip over incorrectly, which has been corrected. * ds/sparse-cone: .mailmap: fix GGG authoship screwup unpack-trees: correctly compute result count
2020-02-14Merge branch 'es/unpack-trees-oob-fix' into maintJunio C Hamano1-2/+4
The code that tries to skip over the entries for the paths in a single directory using the cache-tree was not careful enough against corrupt index file. * es/unpack-trees-oob-fix: unpack-trees: watch for out-of-range index position
2020-02-12Merge branch 'jk/clang-sanitizer-fixes'Junio C Hamano1-1/+1
C pedantry ;-) fix. * jk/clang-sanitizer-fixes: obstack: avoid computing offsets from NULL pointer xdiff: avoid computing non-zero offset from NULL pointer avoid computing zero offsets from NULL pointer merge-recursive: use subtraction to flip stage merge-recursive: silence -Wxor-used-as-pow warning
2020-01-31sparse-checkout: fix cone mode behavior mismatchDerrick Stolee1-1/+1
The intention of the special "cone mode" in the sparse-checkout feature is to always match the same patterns that are matched by the same sparse-checkout file as when cone mode is disabled. When a file path is given to "git sparse-checkout set" in cone mode, then the cone mode improperly matches the file as a recursive path. When setting the skip-worktree bits, files were not expecting the MATCHED_RECURSIVE response, and hence these were left out of the matched cone. Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED and add a test that prevents regression. Reported-by: Finn Bryant <finnbryant@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-30Merge branch 'ds/sparse-cone'Junio C Hamano1-2/+2
The code recently added in this release to move to the entry beyond the ones in the same directory in the index in the sparse-cone mode did not count the number of entries to skip over incorrectly, which has been corrected. * ds/sparse-cone: .mailmap: fix GGG authoship screwup unpack-trees: correctly compute result count
2020-01-28avoid computing zero offsets from NULL pointerJeff King1-1/+1
The Undefined Behavior Sanitizer in clang-11 seems to have learned a new trick: it complains about computing offsets from a NULL pointer, even if that offset is 0. This causes numerous test failures. For example, from t1090: unpack-trees.c:1355:41: runtime error: applying zero offset to null pointer ... not ok 6 - in partial clone, sparse checkout only fetches needed blobs The code in question looks like this: struct cache_entry **cache_end = cache + nr; ... while (cache != cache_end) and we sometimes pass in a NULL and 0 for "cache" and "nr". This is conceptually fine, as "cache_end" would be equal to "cache" in this case, and we wouldn't enter the loop at all. But computing even a zero offset violates the C standard. And given the fact that UBSan is noticing this behavior, this might be a potential problem spot if the compiler starts making unexpected assumptions based on undefined behavior. So let's just avoid it, which is pretty easy. In some cases we can just switch to iterating with a numeric index (as we do in sequencer.c here). In other cases (like the cache_end one) the use of an end pointer is more natural; we can keep that by just explicitly checking for the NULL/0 case when assigning the end pointer. Note that there are two ways you can write this latter case, checking for the pointer: cache_end = cache ? cache + nr : cache; or the size: cache_end = nr ? cache + nr : cache; For the case of a NULL/0 ptr/len combo, they are equivalent. But writing it the second way (as this patch does) has the property that if somebody were to incorrectly pass a NULL pointer with a non-zero length, we'd continue to notice and segfault, rather than silently pretending the length was zero. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-22Merge branch 'es/unpack-trees-oob-fix'Junio C Hamano1-2/+4
The code that tries to skip over the entries for the paths in a single directory using the cache-tree was not careful enough against corrupt index file. * es/unpack-trees-oob-fix: unpack-trees: watch for out-of-range index position
2020-01-22Merge branch 'en/simplify-check-updates-in-unpack-trees'Junio C Hamano1-12/+14
Code simplification. * en/simplify-check-updates-in-unpack-trees: unpack-trees: exit check_updates() early if updates are not wanted
2020-01-17submodule-config: add skip_if_read option to repo_read_gitmodules()Matheus Tavares1-2/+2
Currently, submodule-config.c doesn't have an externally accessible function to read gitmodules only if it wasn't already read. But this exact behavior is internally implemented by gitmodules_read_check(), to perform a lazy load. Let's merge this function with repo_read_gitmodules() adding a 'skip_if_read' which allows both internal and external callers to access this functionality. This simplifies a little the code. The added option will also be used in the following patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-10unpack-trees: correctly compute result countDerrick Stolee via GitGitGadget1-2/+2
The clear_ce_flags_dir() method processes the cache entries within a common directory. The returned int is the number of cache entries processed by that directory. When using the sparse-checkout feature in cone mode, we can skip the pattern matching for entries in the directories that are entirely included or entirely excluded. eb42feca (unpack-trees: hash less in cone mode, 2019-11-21) introduced this performance feature. The old mechanism relied on the counts returned by calling clear_ce_flags_1(), but the new mechanism calculated the number of rows by subtracting "cache_end" from "cache" to find the size of the range. However, the equation is wrong because it divides by sizeof(struct cache_entry *). This is not how pointer arithmetic works! A coverity build of Git for Windows in preparation for the 2.25.0 release found this issue with the warning, "Pointer differences, such as cache_end - cache, are automatically scaled down by the size (8 bytes) of the pointed-to type (struct cache_entry *). Most likely, the division by sizeof(struct cache_entry *) is extraneous and should be eliminated." This warning is correct. This leaves us with the question "how did this even work?" The problem that occurs with this incorrect pointer arithmetic is a performance-only bug, and a very slight one at that. Since the entry count returned by clear_ce_flags_dir() is reduced by a factor of 8, the loop in clear_ce_flags_1() will re-process entries from those directories. By inserting global counters into unpack-tree.c and tracing them with trace2_data_intmax() (in a private change, for testing), I was able to see count how many times the loop inside clear_ce_flags_1() processed an entry and how many times clear_ce_flags_dir() was called. Each of these are reduced by at least a factor of 8 with the current change. A factor larger than 8 happens when multiple levels of directories are repeated. Specifically, in the Linux kernel repo, the command git sparse-checkout set LICENSES restricts the working directory to only the files at root and in the LICENSES directory. Here are the measured counts: clear_ce_flags_1 loop blocks: Before: 11,520 After: 1,621 clear_ce_flags_dir calls: Before: 7,048 After: 606 While these are dramatic counts, the time spent in clear_ce_flags_1() is under one millisecond in each case, so the improvement is not measurable as an end-to-end time. Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-08unpack-trees: watch for out-of-range index positionEmily Shaffer1-2/+4
It's possible in a case where the index file contains a tree extension but no blobs within that tree exist for index_pos_by_traverse_info() to segfault. If the name_entry passed into index_pos_by_traverse_info() has no blobs inside, AND is alphabetically later than all blobs currently in the index file, index_pos_by_traverse_info() will segfault. For example, an index file which looks something like this: aaa#0 bbb/aaa#0 [Extensions] TREE: zzz In this example, 'index_name_pos(..., "zzz/", ...)' will return '-4', indicating that "zzz/" could be inserted at position 3. However, when the checks which ensure that the insertion position of "zzz/" look for a blob at that position beginning with "zzz/", the index cache is accessed out of range, causing a segfault. This kind of index state is not typically generated during user operations, and is in fact an edge case of the state being checked for in the conditional where it was added. However, since the entry for the BUG() line is ambiguous, tell some additional context to help Git developers debug the failure later. When we know the name of the dir we were trying to look up, it becomes possible to examine the index file in a hex util to determine what went wrong; the position gives a hint about where to start looking. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-07unpack-trees: exit check_updates() early if updates are not wantedElijah Newren1-12/+14
check_updates() has a lot of code that repeatedly checks whether o->update or o->dry_run are set. (Note that o->dry_run is a near-synonym for !o->update, but not quite as per commit 2c9078d05bf2 ("unpack-trees: add the dry_run flag to unpack_trees_options", 2011-05-25).) In fact, this function almost turns into a no-op whenever the condition !o->update || o->dry_run is met. Simplify the code by checking this condition at the beginning of the function, and when it is true, do the few things that are relevant and return early. There are a few things that make the conversion not quite obvious: * The fact that check_updates() does not actually turn into a no-op when updates are not wanted may be slightly surprising. However, commit 33ecf7eb61 (Discard "deleted" cache entries after using them to update the working tree, 2008-02-07) put the discarding of unused cache entries in check_updates() so we still need to keep the call to remove_marked_cache_entries(). It's possible this call belongs in another function, but it is certainly needed as tests will fail if it is removed. * The original called remove_scheduled_dirs() unconditionally. Technically, commit 7847892716 (unlink_entry(): introduce schedule_dir_for_removal(), 2009-02-09) should have made that call conditional, but it didn't matter in practice because remove_scheduled_dirs() becomes a no-op when all the calls to unlink_entry() are skipped. As such, we do not need to call it. * When (o->dry_run && o->update), the original would have two calls to git_attr_set_direction() surrounding a bunch of skipped updates. These two calls to git_attr_set_direction() cancel each other out and thus can be omitted when o->dry_run is true just as they already are when !o->update. * The code would previously call setup_collided_checkout_detection() and report_collided_checkout() even when o->dry_run. However, this was just an expensive no-op because setup_collided_checkout_detection() merely cleared the CE_MATCHED flag for each cache entry, and report_collided_checkout() reported which ones had it set. Since a dry-run would skip all the checkout_entry() calls, CE_MATCHED would never get set and thus no collisions would be reported. Since we can't detect the collisions anyway without doing updates, skipping the collisions detection setup and reporting is an optimization. * The code previously would call get_progress() and display_progress() even when (!o->update || o->dry_run). This served to show how long it took to skip all the updates, which is somewhat useless. Since we are skipping the updates, we can skip showing how long it takes to skip them. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-25Merge branch 'ds/sparse-cone'Junio C Hamano1-33/+77
Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command. * ds/sparse-cone: (21 commits) sparse-checkout: improve OS ls compatibility sparse-checkout: respect core.ignoreCase in cone mode sparse-checkout: check for dirty status sparse-checkout: update working directory in-process for 'init' sparse-checkout: cone mode should not interact with .gitignore sparse-checkout: write using lockfile sparse-checkout: use in-process update for disable subcommand sparse-checkout: update working directory in-process sparse-checkout: sanitize for nested folders unpack-trees: add progress to clear_ce_flags() unpack-trees: hash less in cone mode sparse-checkout: init and set in cone mode sparse-checkout: use hashmaps for cone patterns sparse-checkout: add 'cone' mode trace2: add region in clear_ce_flags sparse-checkout: create 'disable' subcommand sparse-checkout: add '--stdin' option to set subcommand sparse-checkout: 'set' subcommand clone: add --sparse mode sparse-checkout: create 'init' subcommand ...
2019-12-09Sync with Git 2.24.1Junio C Hamano1-1/+2
2019-12-06Sync with 2.23.1Johannes Schindelin1-1/+2
* maint-2.23: (44 commits) Git 2.23.1 Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters ...
2019-12-06Sync with 2.22.2Johannes Schindelin1-1/+2
* maint-2.22: (43 commits) Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors ...
2019-12-06Sync with 2.21.1Johannes Schindelin1-1/+2
* maint-2.21: (42 commits) Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh ...
2019-12-06Sync with 2.20.2Johannes Schindelin1-1/+2
* maint-2.20: (36 commits) Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories ...
2019-12-06Sync with 2.19.3Johannes Schindelin1-1/+2
* maint-2.19: (34 commits) Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams ...
2019-12-06Sync with 2.18.2Johannes Schindelin1-1/+2
* maint-2.18: (33 commits) Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up ...
2019-12-06Sync with 2.17.3Johannes Schindelin1-1/+2
* maint-2.17: (32 commits) Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names ...
2019-12-06Sync with 2.16.6Johannes Schindelin1-1/+2
* maint-2.16: (31 commits) Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses ...
2019-12-06Sync with 2.15.4Johannes Schindelin1-1/+2
* maint-2.15: (29 commits) Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment ...
2019-12-06Sync with 2.14.6Johannes Schindelin1-1/+2
* maint-2.14: (28 commits) Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment test-path-utils: offer to run a protectNTFS/protectHFS benchmark ...
2019-12-05Merge branch 'us/unpack-trees-fsmonitor'Junio C Hamano1-1/+5
Users of oneway_merge() (like "reset --hard") learned to take advantage of fsmonitor to avoid unnecessary lstat(2) calls. * us/unpack-trees-fsmonitor: unpack-trees: skip stat on fsmonitor-valid files
2019-12-05unpack-trees: let merged_entry() pass through do_add_entry()'s errorsJohannes Schindelin1-1/+2
A `git clone` will end with exit code 0 when `merged_entry()` returns a positive value during a call of `unpack_trees()` to `traverse_trees()`. The reason is that `unpack_trees()` will interpret a positive value not to be an error. The problem is, however, that `add_index_entry()` (which is called by `merged_entry()` can report an error, and we really should fail the entire clone in such a case. Let's fix this problem, in preparation for a Windows-specific patch disallowing `mkdir()` with directory names that contain a trailing space (which is illegal on NTFS): we want `git clone` to abort when a path cannot be checked out due to that condition. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-11-22sparse-checkout: update working directory in-processDerrick Stolee1-2/+3
The sparse-checkout builtin used 'git read-tree -mu HEAD' to update the skip-worktree bits in the index and to update the working directory. This extra process is overly complex, and prone to failure. It also requires that we write our changes to the sparse-checkout file before trying to update the index. Remove this extra process call by creating a direct call to unpack_trees() in the same way 'git read-tree -mu HEAD' does. In addition, provide an in-memory list of patterns so we can avoid reading from the sparse-checkout file. This allows us to test a proposed change to the file before writing to it. An earlier version of this patch included a bug when the 'set' command failed due to the "Sparse checkout leaves no entry on working directory" error. It would not rollback the index.lock file, so the replay of the old sparse-checkout specification would fail. A test in t1091 now covers that scenario. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22unpack-trees: add progress to clear_ce_flags()Derrick Stolee1-15/+41
When a large repository has many sparse-checkout patterns, the process for updating the skip-worktree bits can take long enough that a user gets confused why nothing is happening. Update the clear_ce_flags() method to write progress. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22unpack-trees: hash less in cone modeDerrick Stolee1-15/+23
The sparse-checkout feature in "cone mode" can use the fact that the recursive patterns are "connected" to the root via parent patterns to decide if a directory is entirely contained in the sparse-checkout or entirely removed. In these cases, we can skip hashing the paths within those directories and simply set the skipworktree bit to the correct value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: use hashmaps for cone patternsDerrick Stolee1-0/+1
The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match. As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here: /$folder/ !/$folder/*/ This resulted in a sparse-checkout file sith 8,296 patterns. Running 'git read-tree -mu HEAD' on this file had the following performance: core.sparseCheckout=false: 0.21 s (0.00 s) core.sparseCheckout=true: 3.75 s (3.50 s) core.sparseCheckoutCone=true: 0.23 s (0.01 s) The times in parentheses above correspond to the time spent in the first clear_ce_flags() call, according to the trace2 performance traces. While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22trace2: add region in clear_ce_flagsJeff Hostetler1-1/+9
When Git updates the working directory with the sparse-checkout feature enabled, the unpack_trees() method calls clear_ce_flags() to update the skip-wortree bits on the cache entries. This check can be expensive, depending on the patterns used. Add trace2 regions around the method, including some flag information, so we can get granular performance data during experiments. This data will be used to measure improvements to the pattern-matching algorithms for sparse-checkout. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-21unpack-trees: skip stat on fsmonitor-valid filesUtsav Shah1-1/+5
The index might be aware that a file hasn't modified via fsmonitor, but unpack-trees did not pay attention to it and checked via ie_match_stat which can be inefficient on certain filesystems. This significantly slows down commands that run oneway_merge, like checkout and reset --hard. This patch makes oneway_merge check whether a file is considered unchanged through fsmonitor and skips ie_match_stat on it. unpack-trees also now correctly copies over fsmonitor validity state from the source index. Finally, for correctness, we force a refresh of fsmonitor state in tweak_fsmonitor. After this change, commands like stash (that use reset --hard internally) go from 8s or more to ~2s on a 250k file repository on a mac. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Kevin Willford <Kevin.Willford@microsoft.com> Signed-off-by: Utsav Shah <utsav@dropbox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10Fix spelling errors in code commentsElijah Newren1-2/+2
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04treewide: remove duplicate #include directivesRené Scharfe1-1/+0
Found with "git grep '^#include ' '*.c' | sort | uniq -d". Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-30Merge branch 'ds/include-exclude'Junio C Hamano1-30/+37
The internal code originally invented for ".gitignore" processing got reshuffled and renamed to make it less tied to "excluding" and stress more that it is about "matching", as it has been reused for things like sparse checkout specification that want to check if a path is "included". * ds/include-exclude: unpack-trees: rename 'is_excluded_from_list()' treewide: rename 'exclude' methods to 'pattern' treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_' treewide: rename 'struct exclude_list' to 'struct pattern_list' treewide: rename 'struct exclude' to 'struct path_pattern'
2019-09-18Merge branch 'cc/multi-promisor'Junio C Hamano1-4/+4
Teach the lazy clone machinery that there can be more than one promisor remote and consult them in order when downloading missing objects on demand. * cc/multi-promisor: Move core_partial_clone_filter_default to promisor-remote.c Move repository_format_partial_clone to promisor-remote.c Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} remote: add promisor and partial clone config to the doc partial-clone: add multiple remotes in the doc t0410: test fetching from many promisor remotes builtin/fetch: remove unique promisor remote limitation promisor-remote: parse remote.*.partialclonefilter Use promisor_remote_get_direct() and has_promisor_remote() promisor-remote: use repository_format_partial_clone promisor-remote: add promisor_remote_reinit() promisor-remote: implement promisor_remote_get_direct() Add initial support for many promisor remotes fetch-object: make functions return an error code t0410: remove pipes after git commands
2019-09-05unpack-trees: rename 'is_excluded_from_list()'Derrick Stolee1-16/+23
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! Now that this library is renamed to use 'struct pattern_list' and 'struct pattern', we can now rename the method used by the sparse-checkout feature to determine which paths should appear in the working directory. The method is_excluded_from_list() is only used by the sparse-checkout logic in unpack-trees and list-objects-filter. The confusing part is that it returned 1 for "excluded" (i.e. it matches the list of exclusions) but that really manes that the path matched the list of patterns for _inclusion_ in the working directory. Rename the method to be path_matches_pattern_list() and have it return an explicit 'enum pattern_match_result'. Here, the values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree with the previous integer values. This shift allows future consumers to better understand what the retur values mean, and provides more type checking for handling those values. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05treewide: rename 'exclude' methods to 'pattern'Derrick Stolee1-2/+2
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames several methods defined in dir.h to make more sense with the renamed 'struct exclude_list' to 'struct pattern_list' and 'struct exclude' to 'struct path_pattern': * last_exclude_matching() -> last_matching_pattern() * parse_exclude() -> parse_path_pattern() In addition, the word 'exclude' was replaced with 'pattern' in the methods below: * add_exclude_list() * add_excludes_from_file_to_list() * add_excludes_from_file() * add_excludes_from_blob_to_list() * add_exclude() * clear_exclude_list() A few methods with the word "exclude" remain. These will be handled seperately. In particular, the method "is_excluded()" is concretely about the .gitignore file relative to a specific directory. This is the important boundary between library and consumer: is_excluded() cares about .gitignore, but is_excluded() calls last_matching_pattern() to make that decision. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>