aboutsummaryrefslogtreecommitdiffstats
path: root/read-cache.c
AgeCommit message (Collapse)AuthorFilesLines
2025-11-24Merge branch 'bc/submodule-force-same-hash'Junio C Hamano1-3/+0
Adding a repository that uses a different hash function is a no-no, but "git submodule add" did nt prevent it, which has been corrected. * bc/submodule-force-same-hash: read-cache: drop submodule check from add_to_cache() object-file: disallow adding submodules of different hash algo
2025-11-15read-cache: drop submodule check from add_to_cache()Jeff King1-3/+0
In add_to_cache(), we treat any directories as submodules, and complain if we can't resolve their HEAD. This call to resolve_gitlink_ref() was added by f937bc2f86 (add: error appropriately on repository with no commits, 2019-04-09), with the goal of improving the error message for empty repositories. But we already resolve the submodule HEAD in index_path(), which is where we find the actual oid we're going to use. Resolving it again here introduces some downsides: 1. It's more work, since we have to open up the submodule repository's files twice. 2. There are call paths that get to index_path() without going through add_to_cache(). For instance, we'd want a similar informative message if "git diff empty" finds that it can't resolve the submodule's HEAD. (In theory we can also get there through update-index, but AFAICT it refuses to consider directories as submodules at all, and just complains about them). 3. The resolution in index_path() catches more errors that we don't handle here. In particular, it will validate that the object format for the submodule matches that of the superproject. This isn't a bug, since our call in add_to_cache() throws away the oid it gets without looking at it. But it certainly caused confusion for me when looking at where the object-format check should go. So instead of resolving the submodule HEAD in add_to_cache(), let's just teach the call in index_path() to actually produce an error message (which it already does for other cases). That's probably what f937bc2f86 should have done in the first place, and it gives us a single point of resolution when adding a submodule to the index. The resulting output is slightly more verbose, as we propagate the error up the call stack, but I think that's OK (and again, matches many other errors we get when indexing fails). I've left the text of the error message as-is, though it is perhaps overly specific. There are many reasons that resolving the submodule HEAD might fail, though outside of corruption or system errors it is probably most likely that the submodule HEAD is simply on an unborn branch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-08Merge branch 'ps/rust-balloon'Junio C Hamano1-2/+4
Dip our toes a bit to (optionally) use Rust implemented helper called from our C code. * ps/rust-balloon: ci: enable Rust for breaking-changes jobs ci: convert "pedantic" job into full build with breaking changes BreakingChanges: announce Rust becoming mandatory varint: reimplement as test balloon for Rust varint: use explicit width for integers help: report on whether or not Rust is enabled Makefile: introduce infrastructure to build internal Rust library Makefile: reorder sources after includes meson: add infrastructure to build internal Rust library
2025-10-02varint: use explicit width for integersPatrick Steinhardt1-2/+4
The varint subsystem currently uses implicit widths for integers. On the one hand we use `uintmax_t` for the actual value. On the other hand, we use `int` for the length of the encoded varint. Both of these have known maximum values, as we only support at most 16 bytes when encoding varints. Thus, we know that we won't ever exceed `uint64_t` for the actual value and `uint8_t` for the prefix length. Refactor the code to use explicit widths. Besides making the logic platform-independent, it also makes our life a bit easier in the next commit, where we reimplement "varint.c" in Rust. Suggested-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16odb: add transaction interfaceJustin Tobler1-2/+2
Transactions are managed via the {begin,end}_odb_transaction() function in the object-file subsystem and its implementation is specific to the files object source. Introduce odb_transaction_{begin,commit}() in the odb subsystem to provide an eventual object source agnostic means to manage transactions. Update call sites to instead manage transactions through the odb subsystem. Also rename {begin,end}_odb_transaction() functions to object_file_transaction_{begin,commit}() to clarify the object source it supports. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16object-file: relocate ODB transaction codeJustin Tobler1-1/+0
The bulk-checkin subsystem provides various functions to manage ODB transactions. Apart from {begin,end}_odb_transaction(), these functions are only used by the object-file subsystem to manage aspects of a transaction implementation specific to the files object source. Relocate all the transaction code in bulk-checkin to object-file. This simplifies the exposed transaction interface by reducing it to only {begin,end}_odb_transaction(). Function and type names are adjusted in the subsequent commit to better fit the new location. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-25bulk-checkin: remove global transaction stateJustin Tobler1-2/+3
Object database transactions in the bulk-checkin subsystem rely on global state to track transaction status. Stop relying on global state and instead store the transaction in the `struct object_database`. Functions that operate on transactions are updated to now wire transaction state. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-05Merge branch 'ps/object-file-wo-the-repository'Junio C Hamano1-1/+1
Reduce implicit assumption and dependence on the_repository in the object-file subsystem. * ps/object-file-wo-the-repository: object-file: get rid of `the_repository` in index-related functions object-file: get rid of `the_repository` in `force_object_loose()` object-file: get rid of `the_repository` in `read_loose_object()` object-file: get rid of `the_repository` in loose object iterators object-file: remove declaration for `for_each_file_in_obj_subdir()` object-file: inline `for_each_loose_file_in_objdir_buf()` object-file: get rid of `the_repository` when writing objects odb: introduce `odb_write_object()` loose: write loose objects map via their source object-file: get rid of `the_repository` in `finalize_object_file()` object-file: get rid of `the_repository` in `loose_object_info()` object-file: get rid of `the_repository` when freshening objects object-file: inline `check_and_freshen()` functions object-file: get rid of `the_repository` in `has_loose_object()` object-file: stop using `the_hash_algo` object-file: fix -Wsign-compare warnings
2025-07-23config: drop `git_config_get_bool()` wrapperPatrick Steinhardt1-2/+2
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_bool()`. All callsites are adjusted so that they use `repo_config_get_bool(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16odb: introduce `odb_write_object()`Patrick Steinhardt1-1/+1
We do not have a backend-agnostic way to write objects into an object database. While there is `write_object_file()`, this function is rather specific to the loose object format. Introduce `odb_write_object()` to plug this gap. For now, this function is a simple wrapper around `write_object_file()` and doesn't even use the passed-in object database yet. This will change in subsequent commits, where `write_object_file()` is converted so that it works on top of an `odb_source`. `odb_write_object()` will then become responsible for deciding which source an object shall be written to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15Merge branch 'ps/object-store'Junio C Hamano1-7/+7
Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`
2025-07-07read-cache: report lock error when refreshing indexHan Young1-1/+2
In the repo_refresh_and_write_index of read-cache.c, we return -1 to indicate that writing the index to disk failed. However, callers do not use this information. Commands such as stash print "could not write index" and then exit, which does not help to discover the exact problem. We can let repo_hold_locked_index print the error message if the locking failed. Signed-off-by: Han Young <hanyang.tony@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `repo_read_object_file()`Patrick Steinhardt1-3/+3
Rename `repo_read_object_file()` to `odb_read_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt1-3/+3
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt1-1/+1
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-15has_dir_name(): make code more obviousJohannes Schindelin1-42/+13
One thing that might be non-obvious to readers (or to analyzers like CodeQL) is that the function essentially does nothing when the Git index is empty, and in particular that it does not look at the value of `len_eq_last` (which would be uninitialized at that point). Let's make this much easier to understand, by returning early if the Git index is empty, and by avoiding empty `else` blocks. This commit changes indentation and is hence best viewed using `--ignore-space-change`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup'Junio C Hamano1-3/+3
Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-23Merge branch 'js/range-check-codeql-workaround'Junio C Hamano1-2/+2
Work around false positive from CodeQL checker. * js/range-check-codeql-workaround: read-cache: check range before dereferencing an array element
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt1-1/+1
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: split up concerns of `HASH_*` flagsPatrick Steinhardt1-2/+2
The functions `hash_object_file()`, `write_object_file()` and `index_fd()` reuse the same set of flags to alter their behaviour. This not only adds confusion, but given that every function only supports a subset of the flags it becomes very hard to see which flags can be passed to what function. Last but not least, this entangles the implementation of all three function families. Split up concerns by creating separate flags for each of the function families. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-28read-cache: check range before dereferencing an array elementJohannes Schindelin1-2/+2
Before accessing an array element at a given index, we should make sure that the index is within the desired bounds, otherwise it makes little sense to access the array element in the first place. In this instance, testing whether `ce->name[common]` is the trailing NUL byte is technically different from testing whether `common` is within the bounds of `previous_name`. It is also redundant, as the range-check guarantees that `previous_name->buf[common]` cannot be NUL and therefore the condition `ce->name[common] == previous_name->buf[common]` would not be met if `ce->name[common]` evaluated to NUL. However, in the interest of reducing the cognitive load to reason about the correctness of this loop (so that I can focus on interesting projects again), I'll simply move the range-check to the beginning of the loop condition and keep the redundant NUL check. This acquiesces CodeQL's `cpp/offset-use-before-range-check` rule. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt1-1/+1
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10csum-file: stop depending on `the_repository`Patrick Steinhardt1-1/+1
There are multiple sites in "csum-file.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. Refactor the code to stop using `the_repository` by adapting functions to receive required data as parameters. Adapt callsites accordingly by either using `the_repository->hash_algo`, or by using a context-provided hash algorithm in case the subsystem already got rid of its dependency on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-05Merge branch 'ps/path-sans-the-repository'Junio C Hamano1-8/+18
The path.[ch] API takes an explicit repository parameter passed throughout the callchain, instead of relying on the_repository singleton instance. * ps/path-sans-the-repository: path: adjust last remaining users of `the_repository` environment: move access to "core.sharedRepository" into repo settings environment: move access to "core.hooksPath" into repo settings repo-settings: introduce function to clear struct path: drop `git_path()` in favor of `repo_git_path()` rerere: let `rerere_path()` write paths into a caller-provided buffer path: drop `git_common_path()` in favor of `repo_common_path()` worktree: return allocated string from `get_worktree_git_dir()` path: drop `git_path_buf()` in favor of `repo_git_path_replace()` path: drop `git_pathdup()` in favor of `repo_git_path()` path: drop unused `strbuf_git_path()` function path: refactor `repo_submodule_path()` family of functions submodule: refactor `submodule_to_gitdir()` to accept a repo path: refactor `repo_worktree_path()` family of functions path: refactor `repo_git_path()` family of functions path: refactor `repo_common_path()` family of functions
2025-02-28path: adjust last remaining users of `the_repository`Patrick Steinhardt1-1/+1
With the preceding refactorings we now only have a couple of implicit users of `the_repository` left in the "path" subsystem, all of which depend on global state via `calc_shared_perm()`. Make the dependency on `the_repository` explicit by passing the repo as a parameter instead and adjust callers accordingly. Note that this change bubbles up into a couple of subsystems that were previously declared as free from `the_repository`. Instead of marking all of them as `the_repository`-dependent again, we instead use the repository that is available in the calling context. There are three exceptions though with "copy.c", "pack-write.c" and "tempfile.c". Adjusting these would require us to adapt callsites all over the place, so this is left for a future iteration. Mark "path.c" as free from `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-28path: drop `git_path()` in favor of `repo_git_path()`Patrick Steinhardt1-7/+17
Remove `git_path()` in favor of the `repo_git_path()` family of functions, which makes the implicit dependency on `the_repository` go away. Note that `git_path()` returned a string allocated via `get_pathname()`, which uses a rotating set of statically allocated buffers. Consequently, callers didn't have to free the returned string. The same isn't true for `repo_common_path()`, so we also have to add logic to free the returned strings. This refactoring also allows us to remove `repo_common_pathv()` as well as `get_pathname()` from the public interface. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31global: adapt callers to use generic hash context helpersPatrick Steinhardt1-7/+7
Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31hash: stop typedeffing the hash contextPatrick Steinhardt1-6/+6
We generally avoid using `typedef` in the Git codebase. One exception though is the `git_hash_ctx`, likely because it used to be a union rather than a struct until the preceding commit refactored it. But now that it is a normal `struct` there isn't really a need for a typedef anymore. Drop the typedef and adapt all callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18resolve-undo: stop using `the_repository`Patrick Steinhardt1-2/+2
Stop using `the_repository` in the "resolve-undo" subsystem by passing in the hash algorithm when reading or writing resolve-undo information. While we could trivially update the caller to pass in the hash algorithm used by the index itself, we instead pass in `the_hash_algo`. This is mostly done to stay consistent with the rest of the code in that file, which isn't prepared to handle arbitrary repositories, either. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18progress: stop using `the_repository`Patrick Steinhardt1-1/+2
Stop using `the_repository` in the "progress" subsystem by passing in a repository when initializing `struct progress`. Furthermore, store a pointer to the repository in that struct so that we can pass it to the trace2 API when logging information. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-22Merge branch 'ps/cache-tree-w-broken-index-entry'Taylor Blau1-2/+3
Fail gracefully instead of crashing when attempting to write the contents of a corrupt in-core index as a tree object. * ps/cache-tree-w-broken-index-entry: unpack-trees: detect mismatching number of cache-tree/index entries cache-tree: detect mismatching number of index entries cache-tree: refactor verification to return error codes
2024-10-10Merge branch 'ps/leakfixes-part-8'Junio C Hamano1-0/+1
More leakfixes. * ps/leakfixes-part-8: (23 commits) builtin/send-pack: fix leaking list of push options remote: fix leaking push reports t/helper: fix leaks in proc-receive helper pack-write: fix return parameter of `write_rev_file_order()` revision: fix leaking saved parents revision: fix memory leaks when rewriting parents midx-write: fix leaking buffer pack-bitmap-write: fix leaking OID array pseudo-merge: fix leaking strmap keys pseudo-merge: fix various memory leaks line-log: fix several memory leaks diff: improve lifecycle management of diff queues builtin/revert: fix leaking `gpg_sign` and `strategy` config t/helper: fix leaking repository in partial-clone helper builtin/clone: fix leaking repo state when cloning with bundle URIs builtin/pack-redundant: fix various memory leaks builtin/stash: fix leaking `pathspec_from_file` submodule: fix leaking submodule entry list wt-status: fix leaking buffer with sparse directories shell: fix leaking strings ...
2024-10-07cache-tree: refactor verification to return error codesPatrick Steinhardt1-2/+3
The function `cache_tree_verify()` will `BUG()` whenever it finds that the cache-tree extension of the index is corrupt. The function is thus inherently untestable because the resulting call to `abort()` will be detected by our testing framework and labelled an error. And rightfully so: it shouldn't ever be possible to hit bugs, as they should indicate a programming error rather than corruption of on-disk state. Refactor the function to instead return error codes. This also ensures that the function can be used e.g. by git-fsck(1) without the whole process dying. Furthermore, this refactoring plugs some memory leaks when returning early by creating a common exit path. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-04Merge branch 'ds/read-cache-mempool-leakfix'Junio C Hamano1-0/+1
Leakfix. * ds/read-cache-mempool-leakfix: read-cache: free threaded memory pool
2024-10-01read-cache: free threaded memory poolDerrick Stolee1-0/+1
In load_cache_entries_threaded(), each thread allocates its own memory pool. This pool needs to be cleaned up while closing the threads down, or it will be leaked. This ce_mem_pool pointer could theoretically be converted to an inline copy of the struct, but the use of a pointer helps with existing lazy- initialization logic. Adjusting that behavior only to avoid this pointer would be a much bigger change. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30read-cache: fix leaking hash context in `do_write_index()`Patrick Steinhardt1-0/+1
When writing an index with the EOIE extension we allocate a separate hash context. We never free that context though, causing a memory leak. Plug it. This leak is exposed by t9210, but plugging it alone does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12environment: make `get_git_dir()` accept a repositoryPatrick Steinhardt1-2/+4
The `get_git_dir()` function retrieves the path to the Git directory for `the_repository`. Make it accept a `struct repository` such that it can work on arbitrary repositories and make it part of the repository subsystem. This reduces our reliance on `the_repository` and clarifies scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-26Merge branch 'ps/maintenance-detach-fix'Junio C Hamano1-3/+9
Maintenance tasks other than "gc" now properly go background when "git maintenance" runs them. * ps/maintenance-detach-fix: run-command: fix detaching when running auto maintenance builtin/maintenance: add a `--detach` flag builtin/gc: add a `--detach` flag builtin/gc: stop processing log file on signal builtin/gc: fix leaking config values builtin/gc: refactor to read config into structure config: fix constness of out parameter for `git_config_get_expiry()`
2024-08-23Merge branch 'ps/config-wo-the-repository'Junio C Hamano1-11/+11
Use of API functions that implicitly depend on the_repository object in the config subsystem has been rewritten to pass a repository object through the callchain. * ps/config-wo-the-repository: config: hide functions using `the_repository` by default global: prepare for hiding away repo-less config functions config: don't depend on `the_repository` with branch conditions config: don't have setters depend on `the_repository` config: pass repo to functions that rename or copy sections config: pass repo to `git_die_config()` config: pass repo to `git_config_get_expiry_in_days()` config: pass repo to `git_config_get_expiry()` config: pass repo to `git_config_get_max_percent_split_change()` config: pass repo to `git_config_get_split_index()` config: pass repo to `git_config_get_index_threads()` config: expose `repo_config_clear()` config: introduce missing setters that take repo as parameter path: hide functions using `the_repository` by default path: stop relying on `the_repository` in `worktree_git_path()` path: stop relying on `the_repository` when reporting garbage hooks: remove implicit dependency on `the_repository` editor: do not rely on `the_repository` for interactive edits path: expose `do_git_common_path()` as `repo_common_pathv()` path: expose `do_git_path()` as `repo_git_pathv()`
2024-08-16config: fix constness of out parameter for `git_config_get_expiry()`Patrick Steinhardt1-3/+9
The type of the out parameter of `git_config_get_expiry()` is a pointer to a constant string, which creates the impression that ownership of the returned data wasn't transferred to the caller. This isn't true though and thus quite misleading. Adapt the parameter to be of type `char **` and adjust callers accordingly. While at it, refactor `get_shared_index_expire_date()` to drop the static `shared_index_expire` variable. It is only used in that function, and furthermore we would only hit the code where we parse the expiry date a single time because we already use a static `prepared` variable to track whether we did parse it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14read-cache: fix leaking hashfile when writing index failsPatrick Steinhardt1-39/+58
In `do_write_index()`, we use a `struct hashfile` to write the index with a trailer hash. In case the write fails though, we never clean up the allocated `hashfile` state and thus leak memory. Refactor the code to have a common exit path where we can free this and other allocated memory. While at it, refactor our use of `strbuf`s such that we reuse the same buffer to avoid some unneeded allocations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13config: pass repo to `git_config_get_expiry()`Patrick Steinhardt1-2/+2
Refactor `git_config_get_expiry()` to accept a `struct repository` such that we can get rid of the implicit dependency on `the_repository`. Rename the function accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13config: pass repo to `git_config_get_max_percent_split_change()`Patrick Steinhardt1-1/+1
Refactor `git_config_get_max_percent_split_change()` to accept a `struct repository` such that we can get rid of the implicit dependency on `the_repository`. Rename the function accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13config: pass repo to `git_config_get_split_index()`Patrick Steinhardt1-1/+1
Refactor `git_config_get_split_index()` to accept a `struct repository` such that we can get rid of the implicit dependency on `the_repository`. Rename the function accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13config: pass repo to `git_config_get_index_threads()`Patrick Steinhardt1-4/+4
Refactor `git_config_get_index_threads()` to accept a `struct repository` such that we can get rid of the implicit dependency on `the_repository`. Rename the function accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13hooks: remove implicit dependency on `the_repository`Patrick Steinhardt1-3/+3
We implicitly depend on `the_repository` in our hook subsystem because we use `strbuf_git_path()` to compute hook paths. Remove this dependency by accepting a `struct repository` as parameter instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+3
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14hash: require hash algorithm in `is_empty_{blob,tree}_oid()`Patrick Steinhardt1-1/+1
Both functions `is_empty_{blob,tree}_oid()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14hash: require hash algorithm in `oidread()` and `oidclr()`Patrick Steinhardt1-3/+5
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`Patrick Steinhardt1-4/+4
Many of our hash functions have two variants, one receiving a `struct git_hash_algo` and one that derives it via `the_repository`. Adapt all of those functions to always require the hash algorithm as input and drop the variants that do not accept one. As those functions are now independent of `the_repository`, we can move them from "hash.h" to "hash-ll.h". Note that both in this and subsequent commits in this series we always just pass `the_repository->hash_algo` as input even if it is obvious that there is a repository in the context that we should be using the hash from instead. This is done to be on the safe side and not introduce any regressions. All callsites should eventually be amended to use a repo passed via parameters, but this is outside the scope of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functionsPatrick Steinhardt1-1/+1
The functions `is_empty_{blob,tree}_sha1()` are mostly unused, except for a single callsite in "read-cache.c". Most callsites have long since been converted to use the equivalents that accept a `struct object_id` instead of a string. Adapt the remaining callsite and drop those functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: refactor `resolve_gitlink_ref()` to accept a repositoryPatrick Steinhardt1-2/+3
In `resolve_gitlink_ref()` we implicitly rely on `the_repository` to look up the submodule ref store. Now that we can look up submodule ref stores for arbitrary repositories we can improve this function to instead accept a repository as parameter for which we want to resolve the gitlink. Do so and adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-29Sync with 2.44.1Johannes Schindelin1-53/+19
* maint-2.44: (41 commits) Git 2.44.1 Git 2.43.4 Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel ...
2024-04-19Sync with 2.43.4Johannes Schindelin1-53/+19
* maint-2.43: (40 commits) Git 2.43.4 Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories ...
2024-04-19Sync with 2.42.2Johannes Schindelin1-53/+19
* maint-2.42: (39 commits) Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' ...
2024-04-19Sync with 2.41.1Johannes Schindelin1-53/+19
* maint-2.41: (38 commits) Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs ...
2024-04-19Sync with 2.40.2Johannes Schindelin1-53/+19
* maint-2.40: (39 commits) Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default ...
2024-04-19Sync with 2.39.4Johannes Schindelin1-53/+19
* maint-2.39: (38 commits) Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default fetch/clone: detect dubious ownership of local repositories ...
2024-04-17has_dir_name(): do not get confused by characters < '/'Filip Hejsek1-53/+19
There is a bug in directory/file ("D/F") conflict checking optimization: It assumes that such a conflict cannot happen if a newly added entry's path is lexicgraphically "greater than" the last already-existing index entry _and_ contains a directory separator that comes strictly after the common prefix (`len > len_eq_offset`). This assumption is incorrect, though: `a-` sorts _between_ `a` and `a/b`, their common prefix is `a`, the slash comes after the common prefix, and there is still a file/directory conflict. Let's re-design this logic, taking these facts into consideration: - It is impossible for a file to sort after another file with whose directory it conflicts because the trailing NUL byte is always smaller than any other character. - Since there are quite a number of ASCII characters that sort before the slash (e.g. `-`, `.`, the space character), looking at the last already-existing index entry is not enough to determine whether there is a D/F conflict when the first character different from the existing last index entry's path is a slash. If it is not a slash, there cannot be a file/directory conflict. And if the existing index entry's first different character is a slash, it also cannot be a file/directory conflict because the optimization requires the newly-added entry's path to sort _after_ the existing entry's, and the conflicting file's path would not. So let's fall back to the regular binary search whenever the newly-added item's path differs in a slash character. If it does not, and it sorts after the last index entry, there is no D/F conflict and the new index entry can be safely appended. This fix also nicely simplifies the logic and makes it much easier to reason about, while the impact on performance should be negligible: After this fix, the optimization will be skipped only when index entry's paths differ in a slash and a space, `!`, `"`, `#`, `$`, `%`, `&`, `'`, | ( `)`, `*`, `+`, `,`, `-`, or `.`, which should be a rare situation. Signed-off-by: Filip Hejsek <filip.hejsek@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-04-03revision: optionally record matches with pathspec elementsJunio C Hamano1-3/+5
Unlike "git add" and other end-user facing commands, where it is diagnosed as an error to give a pathspec with an element that does not match any path, the diff machinery does not care if some elements of the pathspec do not match. Given that the diff machinery is heavily used in pathspec-limited "git log" machinery, and it is common for a path to come and go while traversing the project history, this is usually a good thing. However, in some cases we would want to know if all the pathspec elements matched. For example, "git add -u <pathspec>" internally uses the machinery used by "git diff-files" to decide contents from what paths to add to the index, and as an end-user facing command, "git add -u" would want to report an unmatched pathspec element. Add a new .ps_matched member next to the .prune_data member in "struct rev_info" so that we can optionally keep track of the use of .prune_data pathspec elements that can be inspected by the caller. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano1-2/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren1-2/+0
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-15cache: add fake_lstat()Junio C Hamano1-0/+27
At times, we may already know that a path represented by a cache_entry ce has no changes via some out-of-line means, like fsmonitor, and yet need the control to go through a codepath that requires us to have "struct stat" obtained by lstat() on the path, for various purposes (e.g. "ie_match_stat()" wants cached stat-info is still current wrt "struct stat", "diff" wants to know st_mode). The callers of lstat() on a tracked file, when its cache_entry knows it is up-to-date, can instead call this helper to pretend that it called lstat() by faking the "struct stat" information. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17Merge branch 'cw/compat-util-header-cleanup'Junio C Hamano1-2/+0
Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header
2023-07-08Merge branch 'js/empty-index-fixes'Junio C Hamano1-6/+9
A few places failed to differenciate the case where the index is truly empty (nothing added) and we haven't yet read from the on-disk index file, which have been corrected. * js/empty-index-fixes: commit -a -m: allow the top-level tree to become empty again split-index: accept that a base index can be empty do_read_index(): always mark index as initialized unless erroring out
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan1-1/+0
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05treewide: remove unnecessary includes for wrapper.hCalvin Wan1-1/+0
Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29split-index: accept that a base index can be emptyJohannes Schindelin1-6/+8
We are about to fix an ancient bug where `do_read_index()` pretended that the index was not initialized when there are no index entries. Before the `index_state` structure gained the `initialized` flag in 913e0e99b6a (unpack_trees(): protect the handcrafted in-core index from read_cache(), 2008-08-23), that was the best we could do (even if it was incorrect: it is totally possible to read a Git index file that contains no index entries). This pattern was repeated also in 998330ac2e7 (read-cache: look for shared index files next to the index, too, 2021-08-26), which we fix here by _not_ mistaking an empty base index for a missing `sharedindex.*` file. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29do_read_index(): always mark index as initialized unless erroring outJohannes Schindelin1-0/+1
In 913e0e99b6a (unpack_trees(): protect the handcrafted in-core index from read_cache(), 2008-08-23) a flag was introduced into the `index_state` structure to indicate whether it had been initialized (or more correctly: read and parsed). There was one code path that was not handled, though: when the index file does not yet exist (but the `must_exist` parameter is set to 0 to indicate that that's okay). In this instance, Git wants to go forward with a new, pristine Git index, almost as if the file had existed and contained no index entries or extensions. Since Git wants to handle this situation the same as if an "empty" Git index file existed, let's set the `initialized` flag also in that case. This is necessary to prepare for fixing the bug where the condition `cache_nr == 0` is incorrectly used as an indicator that the index was already read, and the condition `initialized != 0` needs to be used instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren1-1/+1
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21repository: remove unnecessary include of path.hElijah Newren1-0/+1
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren1-1/+1
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache*.h: move declarations for read-cache.c functions from cache.hElijah Newren1-0/+1
For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21preload-index.h: move declarations for preload-index.c from elsewhereElijah Newren1-0/+1
We already have a preload-index.c file; move the declarations for the functions in that file into a new preload-index.h. These were previously split between cache.h and repository.h. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21name-hash.h: move declarations for name-hash.c from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21statinfo: move stat_{data,validity} functions from cache/read-cacheElijah Newren1-84/+0
These functions do not depend upon struct cache_entry or struct index_state in any way, and it seems more logical to break them out into this file, especially since statinfo.h already has the struct stat_data declaration. Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache: move shared add/checkout/commit codeElijah Newren1-0/+102
The function add_files_to_cache(), plus associated helper functions, were defined in builtin/add.c, but also shared with builtin/checkout.c and builtin/commit.c. Move these shared functions to read-cache.c. Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache: move shared commit and ls-files codeElijah Newren1-0/+137
The function overlay_tree_on_index(), plus associated helper functions, were defined in builtin/ls-files.c, but also shared with builtin/commit.c. Move these shared functions to read-cache.c. Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano1-68/+13
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-04-25Merge branch 'en/header-split-cache-h'Junio C Hamano1-0/+5
Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-24commit.h: reduce unnecessary includesElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24cache,tree: move basic name compare functions from read-cache to treeElijah Newren1-68/+0
None of base_name_compare(), df_name_compare(), or name_compare() depended upon a cache_entry or index_state in any way. By moving these functions to tree.h, half a dozen other files can stop depending upon cache.h (though that change will be made in a later commit). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.cElijah Newren1-1/+12
Since cmp_cache_name_compare() was comparing cache_entry structs, it was associated with the cache rather than with trees. Move the function. As a side effect, we can make cache_name_stage_compare() static as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24symlinks.h: move declarations for symlinks.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on mem-pool.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on oid-array.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on trace.h & trace2.hElijah Newren1-0/+1
Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano1-0/+3
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano1-4/+5
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04Merge branch 'js/split-index-fixes'Junio C Hamano1-17/+32
The index files can become corrupt under certain conditions when the split-index feature is in use, especially together with fsmonitor, which have been corrected. * js/split-index-fixes: unpack-trees: take care to propagate the split-index flag fsmonitor: avoid overriding `cache_changed` bits split-index; stop abusing the `base_oid` to strip the "link" extension split-index & fsmonitor: demonstrate a bug
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-4/+5
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28libs: use "struct repository *" argument, not "the_repository"Ævar Arnfjörð Bjarmason1-2/+2
As can easily be seen from grepping in our sources, we had these uses of "the_repository" in various library code in cases where the function in question was already getting a "struct repository *" argument. Let's use that argument instead. Out of these changes only the changes to "cache-tree.c", "commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly applied before the migration away from the "repo_*()" wrapper macros in the preceding commits. The rest aren't new, as we'd previously implicitly refer to "the_repository", but it's now more obvious that we were doing the wrong thing all along, and should have used the parameter instead. The change to change "get_index_format_default(the_repository)" in "read-cache.c" to use the "r" variable instead should arguably have been part of [1], or in the subsequent cleanup in [2]. Let's do it here, as can be seen from the initial code in [3] it's not important that we use "the_repository" there, but would prefer to always use the current repository. This change excludes the "the_repository" use in "upload-pack.c"'s upload_pack_advertise(), as the in-flight [4] makes that change. 1. ee1f0c242ef (read-cache: add index.skipHash config option, 2023-01-06) 2. 6269f8eaad0 (treewide: always have a valid "index_state.repo" member, 2023-01-17) 3. 7211b9e7534 (repo-settings: consolidate some config settings, 2019-08-13) 4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "object-store.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-2/+3
Apply the part of "the_repository.pending.cocci" pertaining to "object-store.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "cache.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "cache.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27split-index; stop abusing the `base_oid` to strip the "link" extensionJohannes Schindelin1-17/+32
When a split-index is in effect, the `$GIT_DIR/index` file needs to contain a "link" extension that contains all the information about the split-index, including the information about the shared index. However, in some cases Git needs to suppress writing that "link" extension (i.e. to fall back to writing a full index) even if the in-memory index structure _has_ a `split_index` configured. This is the case e.g. when "too many not shared" index entries exist. In such instances, the current code sets the `base_oid` field of said `split_index` structure to all-zero to indicate that `do_write_index()` should skip writing the "link" extension. This can lead to problems later on, when the in-memory index is still used to perform other operations and eventually wants to write a split-index, detects the presence of the `split_index` and reuses that, too (under the assumption that it has been initialized correctly and still has a non-null `base_oid`). Let's stop zeroing out the `base_oid` to indicate that the "link" extension should not be written. One might be tempted to simply call `discard_split_index()` instead, under the assumption that Git decided to write a non-split index and therefore the `split_index` structure might no longer be wanted. However, that is not possible because that would release index entries in `split_index->base` that are likely to still be in use. Therefore we cannot do that. The next best thing we _can_ do is to introduce a bit field to indicate specifically which index extensions (not) to write. So that's what we do here. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21wrapper.h: move declarations for wrapper.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren1-0/+1
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15Merge branch 'rs/size-t-fixes'Junio C Hamano1-6/+7
Type fixes. * rs/size-t-fixes: pack-objects: use strcspn(3) in name_cmp_len() read-cache: use size_t for {base,df}_name_compare()
2023-02-06read-cache: use size_t for {base,df}_name_compare()René Scharfe1-6/+7
Support names of any length in base_name_compare() and df_name_compare() by using size_t for their length parameters. They pass the length on to memcmp(3), which also takes it as a size_t. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-17treewide: always have a valid "index_state.repo" memberÆvar Arnfjörð Bjarmason1-12/+5
When the "repo" member was added to "the_index" in [1] the repo_read_index() was made to populate it, but the unpopulated "the_index" variable didn't get the same treatment. Let's do that in initialize_the_repository() when we set it up, and likewise for all of the current callers initialized an empty "struct index_state". This simplifies code that needs to deal with "the_index" or a custom "struct index_state", we no longer need to second-guess this part of the "index_state" deep in the stack. A recent example of such second-guessing is the "istate->repo ? istate->repo : the_repository" code in [2]. We can now simply use "istate->repo". We're doing this by making use of the INDEX_STATE_INIT() macro (and corresponding function) added in [3], which now have mandatory "repo" arguments. Because we now call index_state_init() in repository.c's initialize_the_repository() we don't need to handle the case where we have a "repo->index" whose "repo" member doesn't match the "repo" we're setting up, i.e. the "Complete the double-reference" code in repo_read_index() being altered here. That logic was originally added in [1], and was working around the lack of what we now have in initialize_the_repository(). For "fsmonitor-settings.c" we can remove the initialization of a NULL "r" argument to "the_repository". This was added back in [4], and was needed at the time for callers that would pass us the "r" from an "istate->repo". Before this change such a change to "fsmonitor-settings.c" would segfault all over the test suite (e.g. in t0002-gitfile.sh). This change has wider eventual implications for "fsmonitor-settings.c". The reason the other lazy loading behavior in it is required (starting with "if (!r->settings.fsmonitor) ..." is because of the previously passed "r" being "NULL". I have other local changes on top of this which move its configuration reading to "prepare_repo_settings()" in "repo-settings.c", as we could now start to rely on it being called for our "r". But let's leave all of that for now, and narrowly remove this particular part of the lazy-loading. 1. 1fd9ae517c4 (repository: add repo reference to index_state, 2021-01-23) 2. ee1f0c242ef (read-cache: add index.skipHash config option, 2023-01-06) 3. 2f6b1eb794e (cache API: add a "INDEX_STATE_INIT" macro/function, add release_index(), 2023-01-12) 4. 1e0ea5c4316 (fsmonitor: config settings are repository-specific, 2022-03-25) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-17Merge branch 'ab/cache-api-cleanup' into ab/cache-api-cleanup-usersJunio C Hamano1-19/+24
* ab/cache-api-cleanup: cache API: add a "INDEX_STATE_INIT" macro/function, add release_index() read-cache.c: refactor set_new_index_sparsity() for subsequent commit sparse-index API: BUG() out on NULL ensure_full_index() sparse-index.c: expand_to_path() can assume non-NULL "istate" builtin/difftool.c: { 0 }-initialize rather than using memset()
2023-01-16Merge branch 'ds/omit-trailing-hash-in-index'Junio C Hamano1-1/+13
Introduce an optional configuration to allow the trailing hash that protects the index file from bit flipping. * ds/omit-trailing-hash-in-index: features: feature.manyFiles implies fast index writes test-lib-functions: add helper for trailing hash read-cache: add index.skipHash config option hashfile: allow skipping the hash function
2023-01-16cache API: add a "INDEX_STATE_INIT" macro/function, add release_index()Ævar Arnfjörð Bjarmason1-13/+18
Hopefully in some not so distant future, we'll get advantages from always initializing the "repo" member of the "struct index_state". To make that easier let's introduce an initialization macro & function. The various ad-hoc initialization of the structure can then be changed over to it, and we can remove the various "0" assignments in discard_index() in favor of calling index_state_init() at the end. While not strictly necessary, let's also change the CALLOC_ARRAY() of various "struct index_state *" to use an ALLOC_ARRAY() followed by index_state_init() instead. We're then adding the release_index() function and converting some callers (including some of these allocations) over to it if they either won't need to use their "struct index_state" again, or are just about to call index_state_init(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13read-cache.c: refactor set_new_index_sparsity() for subsequent commitÆvar Arnfjörð Bjarmason1-6/+6
Refactor code added to set_new_index_sparsity() in [1] to eliminate indentation resulting from putting the body of his function within the "if" block. Let's instead return early if we have no istate->repo. This trivial change makes the subsequent commit's diff smaller. 1. 491df5f679f (read-cache: set sparsity when index is new, 2022-05-10) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-07features: feature.manyFiles implies fast index writesDerrick Stolee1-1/+2
The recent addition of the index.skipHash config option allows index writes to speed up by skipping the hash computation for the trailing checksum. This is particularly critical for repositories with many files at HEAD, so add this config option to two cases where users in that scenario may opt-in to such behavior: 1. The feature.manyFiles config option enables some options that are helpful for repositories with many files at HEAD. 2. 'scalar register' and 'scalar reconfigure' set config options that optimize for large repositories. In both of these cases, set index.skipHash=true to gain this speedup. Add tests that demonstrate the proper way that index.skipHash=true can override feature.manyFiles=true. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-07read-cache: add index.skipHash config optionDerrick Stolee1-1/+12
The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. This trade-off between file stability at rest and write-time performance is not easy to balance. The index is an interesting case for a couple reasons: 1. Writes block users. Writing the index takes place in many user- blocking foreground operations. The speed improvement directly impacts their use. Other file formats are typically written in the background (commit-graph, multi-pack-index) or are super-critical to correctness (pack-files). 2. Index files are short lived. It is rare that a user leaves an index for a long time with many staged changes. Outside of staged changes, the index can be completely destroyed and rewritten with minimal impact to the user. Following a similar approach to one used in the microsoft/git fork [1], add a new config option (index.skipHash) that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 We load this config from the repository config given by istate->repo, with a fallback to the_repository if it is not set. While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. This used to be a check on every index read, but was split out to just the index in a33fc72fe91 (read-cache: force_verify_index_checksum, 2017-04-14) and released first in Git 2.13.0. Document the versions that relaxed these restrictions, with the optimistic expectation that this change will be included in Git 2.40.0. Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.skipHash=true on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. Test this new config option, both at a command-line level and within a submodule. The confirmation is currently limited to confirm that 'git fsck' does not complain about the index. Future updates will make this test more robust. It is critical that this test is placed before the test_index_version tests, since those tests obliterate the .git/config file and hence lose the setting from GIT_TEST_DEFAULT_HASH, if set. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-14Merge branch 'ab/various-leak-fixes'Junio C Hamano1-0/+5
Various leak fixes. * ab/various-leak-fixes: built-ins: use free() not UNLEAK() if trivial, rm dead code revert: fix parse_options_concat() leak cherry-pick: free "struct replay_opts" members rebase: don't leak on "--abort" connected.c: free the "struct packed_git" sequencer.c: fix "opts->strategy" leak in read_strategy_opts() ls-files: fix a --with-tree memory leak revision API: call graph_clear() in release_revisions() unpack-file: fix ancient leak in create_temp_file() built-ins & libs & helpers: add/move destructors, fix leaks dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache" read-cache.c: clear and free "sparse_checkout_patterns" commit: discard partial cache before (re-)reading it {reset,merge}: call discard_index() before returning tests: mark tests as passing with SANITIZE=leak
2022-11-21read-cache.c: clear and free "sparse_checkout_patterns"Ævar Arnfjörð Bjarmason1-0/+5
The "sparse_checkout_patterns" member was added to the "struct index_state" in 836e25c51b2 (sparse-checkout: hold pattern list in index, 2021-03-30), but wasn't added to discard_index(). Let's do that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21read-cache API & users: make discard_index() return voidÆvar Arnfjörð Bjarmason1-3/+1
The discard_index() function has not returned non-zero since 7a51ed66f65 (Make on-disk index representation separate from in-core one, 2008-01-14), but we've had various code in-tree still acting as though that might be the case. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-28read-cache: avoid misaligned reads in index v4Victoria Dye1-19/+42
The process for reading the index into memory from disk is to first read its contents into a single memory-mapped file buffer (type 'char *'), then sequentially convert each on-disk index entry into a corresponding incore 'cache_entry'. To access the contents of the on-disk entry for processing, a moving pointer within the memory-mapped file is cast to type 'struct ondisk_cache_entry *'. In index v4, the entries in the on-disk index file are written *without* aligning their first byte to a 4-byte boundary; entries are a variable length (depending on the entry name and whether or not extended flags are used). As a result, casting the 'char *' buffer pointer to 'struct ondisk_cache_entry *' then accessing its contents in a 'SANITIZE=undefined' build can trigger the following error: read-cache.c:1886:46: runtime error: member access within misaligned address <address> for type 'struct ondisk_cache_entry', which requires 4 byte alignment Avoid this error by reading fields directly from the 'char *' buffer, using the 'offsetof' individual fields in 'struct ondisk_cache_entry'. Additionally, add documentation describing why the new approach avoids the misaligned address error, as well as advice on how to improve the implementation in the future. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-26Merge branch 'vd/sparse-reset-checkout-fixes' into maintJunio C Hamano1-0/+5
Fixes to sparse index compatibility work for "reset" and "checkout" commands. source: <pull.1312.v3.git.1659985672.gitgitgadget@gmail.com> * vd/sparse-reset-checkout-fixes: unpack-trees: unpack new trees as sparse directories cache.h: create 'index_name_pos_sparse()' oneway_diff: handle removed sparse directories checkout: fix nested sparse directory diff in sparse index
2022-08-10Merge branch 'tk/untracked-cache-with-uall' into maintJunio C Hamano1-0/+2
Fix for a bug that makes write-tree to fail to write out a non-existent index as a tree, introduced in 2.37. source: <20220722212232.833188-1-martin.agren@gmail.com> * tk/untracked-cache-with-uall: read-cache: make `do_read_index()` always set up `istate->repo`
2022-08-08cache.h: create 'index_name_pos_sparse()'Victoria Dye1-0/+5
Add 'index_name_pos_sparse()', which behaves the same as 'index_name_pos()', except that it does not expand a sparse index to search for an entry inside a sparse directory. 'index_entry_exists()' was originally implemented in 20ec2d034c (reset: make sparse-aware (except --mixed), 2021-11-29) as an alternative to 'index_name_pos()' to allow callers to search for an index entry without expanding a sparse index. However, that particular use case only required knowing whether the requested entry existed, so 'index_entry_exists()' does not return the index positioning information provided by 'index_name_pos()'. This patch implements 'index_name_pos_sparse()' to accommodate callers that need the positioning information of 'index_name_pos()', but do not want to expand the index. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-22read-cache: make `do_read_index()` always set up `istate->repo`Martin Ågren1-1/+4
If there is no index file, e.g., because the repository has just been created, we return zero early (unless `must_exist` makes us die instead.) This early return means we do not set up `istate->repo`. With `core.untrackedCache=true`, the recent e6a653554b ("untracked-cache: support '--untracked-files=all' if configured", 2022-03-31) will eventually pass down `istate->repo` as a null pointer to `repo_config_get_string()`, causing a segmentation fault. If we do hit this early return, set up `istate->repo` similar to when we actually read the index. Reported-by: Joey Hess <id@joeyh.name> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-16read_index_from(): avoid memory leakJohannes Schindelin1-3/+3
In 998330ac2e7c (read-cache: look for shared index files next to the index, too, 2021-08-26), we added code that allocates memory to store the base path of a shared index, but we never released that memory. Reported by Coverity. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-13Merge branch 'zh/read-cache-copy-name-entry-fix'Junio C Hamano1-2/+0
Remove redundant copying (with index v3 and older) or possible over-reading beyond end of mmapped memory (with index v4) has been corrected. * zh/read-cache-copy-name-entry-fix: read-cache.c: reduce unnecessary cache entry name copying
2022-06-06read-cache.c: reduce unnecessary cache entry name copyingZheNing Hu1-2/+0
575fa8a3 (read-cache: read data in a hash-independent way, 2019-02-19) added a new code to copy from the on-disk data into the name member of the in-core cache entry, which is already done immediately after that in a way that takes prefix-compression into account. Remove this code, as it is not just unnecessary, but also can be reading beyond the on-disk data, when we are copying very long prefix string from the previous entry. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: rewrote the log message with Réne's findings] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-03Merge branch 'ds/sparse-sparse-checkout'Junio C Hamano1-3/+3
"sparse-checkout" learns to work well with the sparse-index feature. * ds/sparse-sparse-checkout: sparse-checkout: integrate with sparse index p2000: add test for 'git sparse-checkout [add|set]' sparse-index: complete partial expansion sparse-index: partially expand directories sparse-checkout: --no-sparse-index needs a full index cache-tree: implement cache_tree_find_path() sparse-index: introduce partially-sparse indexes sparse-index: create expand_index() t1092: stress test 'git sparse-checkout set' t1092: refactor 'sparse-index contents' test
2022-05-23sparse-index: introduce partially-sparse indexesDerrick Stolee1-3/+3
A future change will present a temporary, in-memory mode where the index can both contain sparse directory entries but also not be completely collapsed to the smallest possible sparse directories. This will be necessary for modifying the sparse-checkout definition while using a sparse index. For now, convert the single-bit member 'sparse_index' in 'struct index_state' to be a an 'enum sparse_index_mode' with three modes: * INDEX_EXPANDED (0): No sparse directories exist. This is always the case for repositories that do not use cone-mode sparse-checkout. * INDEX_COLLAPSED: Sparse directories may exist. Files outside the sparse-checkout cone are reduced to sparse directory entries whenever possible. * INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file entries outside the sparse-checkout cone may exist. Running convert_to_sparse() may further reduce those files to sparse directory entries. The main reason to store this extra information is to allow convert_to_sparse() to short-circuit when the index is already in INDEX_EXPANDED mode but to actually do the necessary work when in INDEX_PARTIALLY_SPARSE mode. The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-10read-cache: set sparsity when index is newVictoria Dye1-1/+17
When the index read in 'do_read_index()' does not exist on-disk, mark the index "sparse" if the executing command does not require a full index and sparse index is otherwise enabled. Some commands (such as 'git stash -u') implicitly create a new index (when the 'GIT_INDEX_FILE' variable points to a non-existent file) and perform some operation on it. However, when this index is created, it isn't created with the same sparsity settings as the repo index. As a result, while these indexes may be sparse during the operation, they are always expanded before being written to disk. We can avoid that expansion by defaulting the index to "sparse", in which case it will only be expanded if the full index is needed. Note that the function 'set_new_index_sparsity()' is created despite having only a single caller because additional callers will be added in a subsequent patch. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-04Merge branch 'vd/mv-refresh-stat'Junio C Hamano1-2/+15
"git mv" failed to refresh the cached stat information for the entry it moved. * vd/mv-refresh-stat: mv: refresh stat info for moved entry
2022-03-29mv: refresh stat info for moved entryVictoria Dye1-2/+15
Update the stat info of the moved index entry in 'rename_index_entry_at()' if the entry is up-to-date with the index. Internally, 'git mv' uses 'rename_index_entry_at()' to move the source index entry to the destination. However, it directly copies the stat info of the original cache entry, which will not reflect the 'ctime' of the file renaming operation that happened as part of the move. If a file is otherwise up-to-date with the index, that difference in 'ctime' will make the entry appear out-of-date until the next index-refreshing operation (e.g., 'git status'). Some commands, such as 'git reset', use the cached stat information to determine whether a file is up-to-date; if this information is incorrect, the command will fail when it should pass. In order to ensure a moved entry is evaluated as 'up-to-date' when appropriate, refresh the destination index entry's stat info in 'git mv' if and only if the file is up-to-date. Note that the test added in 't7001-mv.sh' requires a "sleep 1" to ensure the 'ctime' of the file creation will be definitively older than the 'ctime' of the renamed file in 'git mv'. Reported-by: Maximilian Reichel <reichemn@icloud.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-25Merge branch 'ns/core-fsyncmethod'Junio C Hamano1-6/+13
Replace core.fsyncObjectFiles with two new configuration variables, core.fsync and core.fsyncMethod. * ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped
2022-03-16Merge branch 'ab/object-file-api-updates'Junio C Hamano1-1/+1
Object-file API shuffling. * ab/object-file-api-updates: object-file API: pass an enum to read_object_with_reference() object-file.c: add a literal version of write_object_file_prepare() object-file API: have hash_object_file() take "enum object_type" object API: rename hash_object_file_literally() to write_*() object-file API: split up and simplify check_object_signature() object API users + docs: check <0, not !0 with check_object_signature() object API docs: move check_object_signature() docs to cache.h object API: correct "buf" v.s. "map" mismatch in *.c and *.h object-file API: have write_object_file() take "enum object_type" object-file API: add a format_object_header() function object-file API: return "void", not "int" from hash_object_file() object-file.c: split up declaration of unrelated variables
2022-03-10core.fsync: new option to harden the indexNeeraj Singh1-6/+13
This commit introduces the new ability for the user to harden the index. In the event of a system crash, the index must be durable for the user to actually find a file that has been added to the repo and then deleted from the working tree. We use the presence of the COMMIT_LOCK flag and absence of the alternate_index_output as a proxy for determining whether we're updating the persistent index of the repo or some temporary index. We don't sync these temporary indexes. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-10core.fsync: introduce granular fsync control infrastructureNeeraj Singh1-1/+1
This commit introduces the infrastructure for the core.fsync configuration knob. The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components. If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects. This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values. This change introduces the currently unused fsync_component helper, which will be used by a later patch that adds fsyncing to the refs backend. Actual configuration and documentation of the fsync components list are in other patches in the series to separate review of the underlying mechanism from the policy of how it's configured. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25object-file API: have write_object_file() take "enum object_type"Ævar Arnfjörð Bjarmason1-1/+1
Change the write_object_file() function to take an "enum object_type" instead of a "const char *type". Its callers either passed {commit,tree,blob,tag}_type and can pass the corresponding OBJ_* type instead, or were hardcoding strings like "blob". This avoids the back & forth fragility where the callers of write_object_file() would have the enum type, and convert it themselves via type_name(). We do have to now do that conversion ourselves before calling write_object_file_prepare(), but those codepaths will be similarly adjusted in subsequent commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-17Merge branch 'vd/sparse-clean-etc'Junio C Hamano1-3/+7
"git update-index", "git checkout-index", and "git clean" are taught to work better with the sparse checkout feature. * vd/sparse-clean-etc: update-index: reduce scope of index expansion in do_reupdate update-index: integrate with sparse index update-index: add tests for sparse-checkout compatibility checkout-index: integrate with sparse index checkout-index: add --ignore-skip-worktree-bits option checkout-index: expand sparse checkout compatibility tests clean: integrate with sparse index reset: reorder wildcard pathspec conditions reset: fix validation in sparse index test
2022-02-09Merge branch 'js/sparse-vs-split-index'Junio C Hamano1-0/+3
Mark in various places in the code that the sparse index and the split index features are mutually incompatible. * js/sparse-vs-split-index: split-index: it really is incompatible with the sparse index t1091: disable split index sparse-index: sparse index is disallowed when split index is active
2022-02-09Merge branch 'ab/config-based-hooks-2'Junio C Hamano1-1/+2
More "config-based hooks". * ab/config-based-hooks-2: run-command: remove old run_hook_{le,ve}() hook API receive-pack: convert push-to-checkout hook to hook.h read-cache: convert post-index-change to use hook.h commit: convert {pre-commit,prepare-commit-msg} hook to hook.h git-p4: use 'git hook' to run hooks send-email: use 'git hook run' for 'sendemail-validate' git hook run: add an --ignore-missing flag hooks: convert worktree 'post-checkout' hook to hook library hooks: convert non-worktree 'post-checkout' hook to hook library merge: convert post-merge to use hook.h am: convert applypatch-msg to use hook.h rebase: convert pre-rebase to use hook.h hook API: add a run_hooks_l() wrapper am: convert {pre,post}-applypatch to use hook.h gc: use hook library for pre-auto-gc hook hook API: add a run_hooks() wrapper hook: add 'run' subcommand
2022-01-23split-index: it really is incompatible with the sparse indexJohannes Schindelin1-0/+3
... at least for now. So let's error out if we are even trying to initialize the split index when the index is sparse, or when trying to write the split index extension for a sparse index. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-13update-index: integrate with sparse indexVictoria Dye1-3/+7
Enable use of the sparse index with `update-index`. Most variations of `update-index` work without explicitly expanding the index or making any other updates in or outside of `update-index.c`. The one usage requiring additional changes is `--cacheinfo`; if a file inside a sparse directory was specified, the index would not be expanded until after the cache tree is invalidated, leading to a mismatch between the index and cache tree. This scenario is handled by rearranging `add_index_entry_with_check`, allowing `index_name_stage_pos` to expand the index *before* attempting to invalidate the relevant cache tree path, avoiding cache tree/index corruption. Signed-off-by: Victoria Dye <vdye@github.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-07read-cache: convert post-index-change to use hook.hEmily Shaffer1-1/+2
Move the post-index-change hook away from run-command.h to and over to the new hook.h library. This removes the last direct user of "run_hook_ve()" outside of run-command.c ("run_hook_le()" still uses it). So we can make the function static now. A subsequent commit will remove this code entirely when "run_hook_le()" itself goes away. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-07update-index: refresh should rewrite index in case of racy timestampsMarc Strapetz1-1/+1
'git update-index --refresh' and '--really-refresh' should force writing of the index file if racy timestamps have been encountered, as 'git status' already does [1]. Note that calling 'git update-index --refresh' still does not guarantee that there will be no more racy timestamps afterwards (the same holds true for 'git status'): - calling 'git update-index --refresh' immediately after touching and adding a file may still leave racy timestamps if all three operations occur within the racy-tolerance (usually 1 second unless USE_NSEC has been defined) - calling 'git update-index --refresh' for timestamps which are set into the future will leave them racy To guarantee that such racy timestamps will be resolved would require to wait until the system clock has passed beyond these timestamps and only then write the index file. Especially for future timestamps, this does not seem feasible because of possibly long delays/hangs. [1] https://lore.kernel.org/git/d3dd805c-7c1d-30a9-6574-a7bfcb7fc013@syntevo.com/ Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-10Merge branch 'vd/sparse-reset'Junio C Hamano1-7/+20
Various operating modes of "git reset" have been made to work better with the sparse index. * vd/sparse-reset: unpack-trees: improve performance of next_cache_entry reset: make --mixed sparse-aware reset: make sparse-aware (except --mixed) reset: integrate with sparse index reset: expand test coverage for sparse checkouts sparse-index: update command for expand/collapse test reset: preserve skip-worktree bit in mixed reset reset: rename is_missing to !is_in_reset_tree
2021-12-10Merge branch 'vd/sparse-sparsity-fix-on-read'Junio C Hamano1-0/+8
Ensure that the sparseness of the in-core index matches the index.sparse configuration specified by the repository immediately after the on-disk index file is read. * vd/sparse-sparsity-fix-on-read: sparse-index: update do_read_index to ensure correct sparsity sparse-index: add ensure_correct_sparsity function sparse-index: avoid unnecessary cache tree clearing test-read-cache.c: prepare_repo_settings after config init
2021-11-29reset: make sparse-aware (except --mixed)Victoria Dye1-7/+20
Remove `ensure_full_index` guard on `prime_cache_tree` and update `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in the cache tree. While processing a tree's entries, `prime_cache_tree_rec` must determine whether a directory entry is sparse or not by searching for it in the index (*without* expanding the index). If a matching sparse directory index entry is found, no subtrees are added to the cache tree entry and the entry count is set to 1 (representing the sparse directory itself). Otherwise, the tree is assumed to not be sparse and its subtrees are recursively added to the cache tree. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-24sparse-index: update do_read_index to ensure correct sparsityVictoria Dye1-0/+8
Unless `command_requires_full_index` forces index expansion, ensure in-core index sparsity matches config settings on read by calling `ensure_correct_sparsity`. This makes the behavior of the in-core index more consistent between different methods of updating sparsity: manually changing the `index.sparse` config setting vs. executing `git sparse-checkout --[no-]sparse-index init` Although index sparsity is normally updated with `git sparse-checkout init`, ensuring correct sparsity after a manual `index.sparse` change has some practical benefits: 1. It allows for command-by-command sparsity toggling with `-c index.sparse=<true|false>`, e.g. when troubleshooting issues with the sparse index. 2. It prevents users from experiencing abnormal slowness after setting `index.sparse` to `true` due to use of a full index in all commands until the on-disk index is updated. Helped-by: Junio C Hamano <gitster@pobox.com> Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-25Merge branch 'rs/add-dry-run-without-objects'Junio C Hamano1-1/+1
Stop "git add --dry-run" from creating new blob and tree objects. * rs/add-dry-run-without-objects: add: don't write objects with --dry-run
2021-10-18Merge branch 'rs/make-verify-path-really-verify-again'Junio C Hamano1-15/+30
Recent sparse-index work broke safety against attempts to add paths with trailing slashes to the index, which has been corrected. * rs/make-verify-path-really-verify-again: read-cache: let verify_path() reject trailing dir separators again read-cache: add verify_path_internal() t3905: show failure to ignore sub-repo
2021-10-12add: don't write objects with --dry-runRené Scharfe1-1/+1
When the option --dry-run/-n is given, "git add" doesn't change the index, but still writes out new object files. Only hash the latter without writing instead to make the run as dry as possible. Use this opportunity to also make the hash_flags variable unsigned, to match the index_path() parameter it is used as. Reported-by: git.mexon@spamgourmet.com Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-11Merge branch 'sg/test-split-index-fix'Junio C Hamano1-10/+27
Test updates. * sg/test-split-index-fix: read-cache: fix GIT_TEST_SPLIT_INDEX tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests read-cache: look for shared index files next to the index, too t1600-index: disable GIT_TEST_SPLIT_INDEX t1600-index: don't run git commands upstream of a pipe t1600-index: remove unnecessary redirection
2021-10-07read-cache: let verify_path() reject trailing dir separators againRené Scharfe1-3/+3
6e773527b6 (sparse-index: convert from full to sparse, 2021-03-30) made verify_path() accept trailing directory separators for directories, which is necessary for sparse directory entries. This clemency causes "git stash" to stumble over sub-repositories, though, and there may be more unintended side-effects. Avoid them by restoring the old verify_path() behavior and accepting trailing directory separators only in places that are supposed to handle sparse directory entries. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07read-cache: add verify_path_internal()René Scharfe1-13/+28
Turn verify_path() into an internal function that distinguishes between valid paths and those with trailing directory separators and rename it to verify_path_internal(). Provide a wrapper with the old behavior under the old name. No functional change intended. The new function will be used in the next patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-06Merge branch 'ab/repo-settings-cleanup'Junio C Hamano1-5/+14
Code cleanup. * ab/repo-settings-cleanup: repository.h: don't use a mix of int and bitfields repo-settings.c: simplify the setup read-cache & fetch-negotiator: check "enum" values in switch() environment.c: remove test-specific "ignore_untracked..." variable wrapper.c: add x{un,}setenv(), and use xsetenv() in environment.c
2021-09-22repo-settings.c: simplify the setupÆvar Arnfjörð Bjarmason1-2/+6
Simplify the setup code in repo-settings.c in various ways, making the code shorter, easier to read, and requiring fewer hacks to do the same thing as it did before: Since 7211b9e7534 (repo-settings: consolidate some config settings, 2019-08-13) we have memset() the whole "settings" structure to -1 in prepare_repo_settings(), and subsequently relied on the -1 value. Most of the fields did not need to be initialized to -1, and because we were doing that we had the enum labels "UNTRACKED_CACHE_UNSET" and "FETCH_NEGOTIATION_UNSET" purely to reflect the resulting state created this memset() in prepare_repo_settings(). No other code used or relied on them, more on that below. For the rest most of the subsequent "are we -1, then read xyz" can simply be removed by re-arranging what we read first. E.g. when setting the "index.version" setting we should have first read "feature.experimental", so that it (and "feature.manyfiles") can provide a default for our "index.version". Instead the code setting it, added when "feature.manyFiles"[1] was created, was using the UPDATE_DEFAULT_BOOL() macro added in an earlier commit[2]. That macro is now gone, since it was only needed for this pattern of reading things in the wrong order. This also fixes an (admittedly obscure) logic error where we'd conflate an explicit "-1" value in the config with our own earlier memset() -1. We can also remove the UPDATE_DEFAULT_BOOL() wrapper added in [3]. Using it is redundant to simply using the return value from repo_config_get_bool(), which is non-zero if the provided key exists in the config. Details on edge cases relating to the memset() to -1, continued from "more on that below" above: * UNTRACKED_CACHE_KEEP: In [4] the "unset" and "keep" handling for core.untrackedCache was consolidated. But it while we understand the "keep" value, we don't handle it differently than the case of any other unknown value. So let's retain UNTRACKED_CACHE_KEEP and remove the UNTRACKED_CACHE_UNSET setting (which was always implicitly UNTRACKED_CACHE_KEEP before). We don't need to inform any code after prepare_repo_settings() that the setting was "unset", as far as anyone else is concerned it's core.untrackedCache=keep. if "core.untrackedcache" isn't present in the config. * FETCH_NEGOTIATION_UNSET & FETCH_NEGOTIATION_NONE: Since these two two enum fields added in [5] don't rely on the memzero() setting them to "-1" anymore we don't have to provide them with explicit values. 1. c6cc4c5afd2 (repo-settings: create feature.manyFiles setting, 2019-08-13) 2. 31b1de6a09b (commit-graph: turn on commit-graph by default, 2019-08-13) 3. 31b1de6a09b (commit-graph: turn on commit-graph by default, 2019-08-13) 4. ad0fb659993 (repo-settings: parse core.untrackedCache, 2019-08-13) 5. aaf633c2ad1 (repo-settings: create feature.experimental setting, 2019-08-13) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-22read-cache & fetch-negotiator: check "enum" values in switch()Ævar Arnfjörð Bjarmason1-5/+10
Change tweak_untracked_cache() in "read-cache.c" to use a switch() to have the compiler assert that we checked all possible values in the "enum untracked_cache_setting" type, and likewise remove the "default" case in fetch_negotiator_init() in favor of checking for "FETCH_NEGOTIATION_UNSET" and "FETCH_NEGOTIATION_NONE". As will be discussed in a subsequent we'll only ever have either of these set to FETCH_NEGOTIATION_NONE, FETCH_NEGOTIATION_UNSET and UNTRACKED_CACHE_UNSET within the prepare_repo_settings() function itself. In preparation for fixing that code let's add a BUG() here to mark this as unreachable code. See ad0fb659993 (repo-settings: parse core.untrackedCache, 2019-08-13) for when the "unset" and "keep" handling for core.untrackedCache was consolidated, and aaf633c2ad1 (repo-settings: create feature.experimental setting, 2019-08-13) for the addition of the "default" pattern in "fetch-negotiator.c". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07read-cache: fix GIT_TEST_SPLIT_INDEXSZEDER Gábor1-9/+14
Running tests with GIT_TEST_SPLIT_INDEX=1 is supposed to turn on the split index feature and trigger index splitting (mostly) randomly. Alas, this has been broken since 6e37c8ed3c (read-cache.c: fix writing "link" index ext with null base oid, 2019-02-13), and GIT_TEST_SPLIT_INDEX=1 hasn't triggered any index splitting since then. This patch makes GIT_TEST_SPLIT_INDEX work again, though it doesn't restore the pre-6e37c8ed3c behavior. To understand the bug, the fix, and the behavior change we first have to look at how GIT_TEST_SPLIT_INDEX used to work before 6e37c8ed3c: There are two places where we check the value of GIT_TEST_SPLIT_INDEX, and before 6e37c8ed3c they worked like this: 1) In the lower-level do_write_index(), where, if GIT_TEST_SPLIT_INDEX is enabled, we call init_split_index(). This call merely allocates and zero-initializes 'istate->split_index', but does nothing else (i.e. doesn't fill the base/shared index with cache entries, doesn't actually write a shared index file, etc.). Pertinent to this issue, the hash of the base index remains all zeroed out. 2) In the higher-level write_locked_index(), but only when 'istate->split_index' has already been initialized. Then, if GIT_TEST_SPLIT_INDEX is enabled, it randomly sets the flag that triggers index splitting later in this function. This randomness comes from the first byte of the hash of the base index via an 'if ((first_byte & 15) < 6)' condition. However, if 'istate->split_index' hasn't been initialized (i.e. it is still NULL), then write_locked_index() just calls do_write_locked_index(), which internally calls the above mentioned do_write_index(). This means that while GIT_TEST_SPLIT_INDEX=1 usually triggered index splitting randomly, the first two index writes were always deterministic (though I suspect this was unintentional): - The initial index write never splits the index. During the first index write write_locked_index() is called with 'istate->split_index' still uninitialized, so the check in 2) is not executed. It still calls do_write_index(), though, which then executes the check in 1). The resulting all zero base index hash then leads to the 'link' extension being written to '.git/index', though a shared index file is not written: $ rm .git/index $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file $ test-tool dump-split-index .git/index own c6ef71168597caec8553c83d9d0048f1ef416170 base 0000000000000000000000000000000000000000 100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 0 file replacements: deletions: $ ls -l .git/sharedindex.* ls: cannot access '.git/sharedindex.*': No such file or directory - The second index write always splits the index. When the index written in the previous point is read, 'istate->split_index' is initialized because of the presence of the 'link' extension. So during the second write write_locked_index() does run the check in 2), and the first byte of the all zero base index hash always fulfills the randomness condition, which in turn always triggers the index splitting. - Subsequent index writes will find the 'link' extension with a real non-zero base index hash, so from then on the check in 2) is executed and the first byte of the base index hash is as random as it gets (coming from the SHA-1 of index data including timestamps and inodes...). All this worked until 6e37c8ed3c came along, and stopped writing the 'link' extension if the hash of the base index was all zero: $ rm .git/index $ GIT_TEST_SPLIT_INDEX=1 git update-index --add file $ test-tool dump-split-index .git/index own abbd6f6458d5dee73ae8e210ca15a68a390c6fd7 not a split index $ ls -l .git/sharedindex.* ls: cannot access '.git/sharedindex.*': No such file or directory So, since the first index write with GIT_TEST_SPLIT_INDEX=1 doesn't write a 'link' extension, in the second index write 'istate->split_index' remains uninitialized, and the check in 2) is not executed, and ultimately the index is never split. Fix this by modifying write_locked_index() to make sure to check GIT_TEST_SPLIT_INDEX even if 'istate->split_index' is still uninitialized, and initialize it if necessary. The check for GIT_TEST_SPLIT_INDEX and separate init_split_index() call in do_write_index() thus becomes unnecessary, so remove it. Furthermore, add a test to 't1700-split-index.sh' to make sure that GIT_TEST_SPLIT_INDEX=1 will keep working (though only check the index splitting on the first index write, because after that it will be random). Note that this change does not restore the pre-6e37c8ed3c behaviour, as it will deterministically split the index already on the first index write. Since GIT_TEST_SPLIT_INDEX is purely a developer aid, there is no backwards compatibility issue here. The new behaviour did trigger test failures in 't0003-attributes.sh' and 't1600-index.sh', though, which have been fixed in preparatory patches in this series. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07read-cache: look for shared index files next to the index, tooSZEDER Gábor1-1/+13
When reading a split index git always looks for its referenced shared base index in the gitdir of the current repository, even when reading an alternate index specified via GIT_INDEX_FILE, and even when that alternate index file is the "main" '.git/index' file of an other repository. However, if that split index and its referenced shared index files were written by a git command running entirely in that other repository, then, naturally, the shared index file is written to that other repository's gitdir. Consequently, a git command attempting to read that shared index file while running in a different repository won't be able find it and will error out. I'm not sure in what use case it is necessary to read the index of one repository by a git command running in a different repository, but it is certainly possible to do so, and in fact the test 'bare repository: check that --cached honors index' in 't0003-attributes.sh' does exactly that. If GIT_TEST_SPLIT_INDEX=1 were to split the index in just the right moment [1], then this test would indeed fail, because the referenced shared index file could not be found. Let's look for the referenced shared index file not only in the gitdir of the current directory, but, if the shared index is not there, right next to the split index as well. [1] We haven't seen this issue trigger a failure in t0003 yet, because: - While GIT_TEST_SPLIT_INDEX=1 is supposed to trigger index splitting randomly, the first index write has always been deterministic and it has never split the index. - That alternate index file in the other repository is written only once in the entire test script, so it's never split. However, the next patch will fix GIT_TEST_SPLIT_INDEX, and while doing so it will slightly change its behavior to always split the index already on the first index write, and t0003 would always fail without this patch. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-07sparse-index: add SPARSE_INDEX_MEMORY_ONLY flagDerrick Stolee1-2/+2
The convert_to_sparse() method checks for the GIT_TEST_SPARSE_INDEX environment variable or the "index.sparse" config setting before converting the index to a sparse one. This is for ease of use since all current consumers are preparing to compress the index before writing it to disk. If these settings are not enabled, then convert_to_sparse() silently returns without doing anything. We will add a consumer in the next change that wants to use the sparse index as an in-memory data structure, regardless of whether the on-disk format should be sparse. To that end, create the SPARSE_INDEX_MEMORY_ONLY flag that will skip these config checks when enabled. All current consumers are modified to pass '0' in the new 'flags' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-04Merge branch 'ah/plugleaks'Junio C Hamano1-0/+1
Leak plugging. * ah/plugleaks: reset: clear_unpack_trees_porcelain to plug leak builtin/rebase: fix options.strategy memory lifecycle builtin/merge: free found_ref when done builtin/mv: free or UNLEAK multiple pointers at end of cmd_mv convert: release strbuf to avoid leak read-cache: call diff_setup_done to avoid leak ref-filter: also free head for ATOM_HEAD to avoid leak diffcore-rename: move old_dir/new_dir definition to plug leak builtin/for-each-repo: remove unnecessary argv copy to plug leak builtin/submodule--helper: release unused strbuf to avoid leak environment: move strbuf into block to plug leak fmt-merge-msg: free newly allocated temporary strings when done
2021-08-02Merge branch 'jt/bulk-prefetch'Junio C Hamano1-0/+23
"git read-tree" had a codepath where blobs are fetched one-by-one from the promisor remote, which has been corrected to fetch in bulk. * jt/bulk-prefetch: cache-tree: prefetch in partial clone read-tree unpack-trees: refactor prefetching code
2021-07-28Merge branch 'ds/status-with-sparse-index'Junio C Hamano1-2/+8
"git status" codepath learned to work with sparsely populated index without hydrating it fully. * ds/status-with-sparse-index: t1092: document bad sparse-checkout behavior fsmonitor: integrate with sparse index wt-status: expand added sparse directory entries status: use sparse-index throughout status: skip sparse-checkout percentage with sparse-index diff-lib: handle index diffs with sparse dirs dir.c: accept a directory as part of cone-mode patterns unpack-trees: unpack sparse directory entries unpack-trees: rename unpack_nondirectories() unpack-trees: compare sparse directories correctly unpack-trees: preserve cache_bottom t1092: add tests for status/add and sparse files t1092: expand repository data shape t1092: replace incorrect 'echo' with 'cat' sparse-index: include EXTENDED flag when expanding sparse-index: skip indexes with unmerged entries
2021-07-26read-cache: call diff_setup_done to avoid leakAndrzej Hunt1-0/+1
repo_diff_setup() calls through to diff.c's static prep_parse_options(), which in turn allocates a new array into diff_opts.parseopts. diff_setup_done() is responsible for freeing that array, and has the benefit of verifying diff_opts too - hence we add a call to diff_setup_done() to avoid leaking parseopts. Output from the leak as found while running t0090 with LSAN: Direct leak of 7120 byte(s) in 1 object(s) allocated from: #0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3 #1 0xa8bf89 in do_xmalloc wrapper.c:41:8 #2 0x7a7bae in prep_parse_options diff.c:5636:2 #3 0x7a7bae in repo_diff_setup diff.c:4611:2 #4 0x93716c in repo_index_has_changes read-cache.c:2518:3 #5 0x872233 in unclean merge-ort-wrappers.c:12:14 #6 0x872233 in merge_ort_recursive merge-ort-wrappers.c:53:6 #7 0x5d5b11 in try_merge_strategy builtin/merge.c:752:12 #8 0x5d0b6b in cmd_merge builtin/merge.c:1666:9 #9 0x4ce83e in run_builtin git.c:475:11 #10 0x4ccafe in handle_builtin git.c:729:3 #11 0x4cb01c in run_argv git.c:818:4 #12 0x4cb01c in cmd_main git.c:949:19 #13 0x6bdc2d in main common-main.c:52:11 #14 0x7f551eb51349 in __libc_start_main (/lib64/libc.so.6+0x24349) SUMMARY: AddressSanitizer: 7120 byte(s) leaked in 1 allocation(s) Signed-off-by: Andrzej Hunt <andrzej@ahunt.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-23unpack-trees: refactor prefetching codeJonathan Tan1-0/+23
Refactor the prefetching code in unpack-trees.c into its own function, because it will be used elsewhere in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-16Merge branch 'ew/mmap-failures'Junio C Hamano1-1/+2
Error message update. * ew/mmap-failures: xmmap: inform Linux users of tuning knobs on ENOMEM
2021-07-14status: use sparse-index throughoutDerrick Stolee1-2/+8
By testing 'git -c core.fsmonitor= status -uno', we can check for the simplest index operations that can be made sparse-aware. The necessary implementation details are already integrated with sparse-checkout, so modify command_requires_full_index to be zero for cmd_status(). In refresh_index(), we loop through the index entries to refresh their stat() information. However, sparse directories have no stat() information to populate. Ignore these entries. This allows 'git status' to no longer expand a sparse index to a full one. This is further tested by dropping the "-uno" option and adding an untracked file into the worktree. The performance test p2000-sparse-checkout-operations.sh demonstrates these improvements: Test HEAD~1 HEAD ----------------------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.31(0.30+0.05) 0.31(0.29+0.06) +0.0% 2000.3: git status (full-index-v4) 0.31(0.29+0.07) 0.34(0.30+0.08) +9.7% 2000.4: git status (sparse-index-v3) 2.35(2.28+0.10) 0.04(0.04+0.05) -98.3% 2000.5: git status (sparse-index-v4) 2.35(2.24+0.15) 0.05(0.04+0.06) -97.9% Note that since HEAD~1 was expanding the sparse index by parsing trees, it was artificially slower than the full index case. Thus, the 98% improvement is misleading, and instead we should celebrate the 0.34s to 0.05s improvement of 85%. This is more indicative of the peformance gains we are expecting by using a sparse index. Note: we are dropping the assignment of core.fsmonitor here. This is not necessary for the test script as we are not altering the config any other way. Correct integration with FS Monitor will be validated in later changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-08Merge branch 'ab/progress-cleanup'Junio C Hamano1-6/+3
Code clean-up. * ab/progress-cleanup: read-cache.c: don't guard calls to progress.c API
2021-06-29xmmap: inform Linux users of tuning knobs on ENOMEMEric Wong1-1/+2
Linux users may benefit from additional information on how to avoid ENOMEM from mmap despite the system having enough RAM to accomodate them. We can't reliably unmap pack windows to work around the issue since malloc and other library routines may mmap without our knowledge. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-08read-cache.c: don't guard calls to progress.c APIÆvar Arnfjörð Bjarmason1-6/+3
Don't guard the calls to the progress.c API with "if (progress)". The API itself will check this. This doesn't change any behavior, but makes this code consistent with the rest of the codebase. See ae9af12287b (status: show progress bar if refreshing the index takes too long, 2018-09-15) for the commit that added the pattern we're changing here. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19read-cache: delete unused hashing methodsDerrick Stolee1-64/+0
These methods were marked as MAYBE_UNUSED in the previous change to avoid a complicated diff. Delete them entirely, since we now use the hashfile API instead of this custom hashing code. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19read-cache: use hashfile instead of git_hash_ctxDerrick Stolee1-71/+66
The do_write_index() method in read-cache.c has its own hashing logic and buffering mechanism. Specifically, the ce_write() method was introduced by 4990aadc (Speed up index file writing by chunking it nicely, 2005-04-20) and similar mechanisms were introduced a few months later in c38138cd (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26). Based on the timing, in the early days of the Git codebase, I figured that these roughly equivalent code paths were never unified only because it got lost in the shuffle. The hashfile API has since been used extensively in other file formats, such as pack-indexes, multi-pack-indexes, and commit-graphs. Therefore, it seems prudent to unify the index writing code to use the same mechanism. I discovered this disparity while trying to create a new index format that uses the chunk-format API. That API uses a hashfile as its base, so it is incompatible with the custom code in read-cache.c. This rewrite is rather straightforward. It replaces all writes to the temporary file with writes to the hashfile struct. This takes care of many of the direct interactions with the_hash_algo. There are still some git_hash_ctx uses remaining: the extension headers are hashed for use in the End of Index Entries (EOIE) extension. This use of the git_hash_ctx is left as-is. There are multiple reasons to not use a hashfile here, including the fact that the data is not actually writing to a file, just a hash computation. These hashes do not block our adoption of the chunk-format API in a future change to the index, so leave it as-is. The internals of the algorithms are mostly identical. Previously, the hashfile API used a smaller 8KB buffer instead of the 128KB buffer from read-cache.c. The previous change already unified these sizes. There is one subtle point: we do not pass the CSUM_FSYNC to the finalize_hashfile() method, which differs from most consumers of the hashfile API. The extra fsync() call indicated by this flag causes a significant peformance degradation that is noticeable for quick commands that write the index, such as "git add". Other consumers can absorb this cost with their more complicated data structure organization, and further writing structures such as pack-files and commit-graphs is rarely in the critical path for common user interactions. Some static methods become orphaned in this diff, so I marked them as MAYBE_UNUSED. The diff is much harder to read if they are deleted during this change. Instead, they will be deleted in the following change. In addition to the test suite passing, I computed indexes using the previous binaries and the binaries compiled after this change, and found the index data to be exactly equal. Finally, I did extensive performance testing of "git update-index --force-write" on repos of various sizes, including one with over 2 million paths at HEAD. These tests demonstrated less than 1% difference in behavior. As expected, the performance should be considered unchanged. The previous changes to increase the hashfile buffer size from 8K to 128K ensured this change would not create a peformance regression. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-16Merge branch 'mt/parallel-checkout-part-3'Junio C Hamano1-4/+10
The final part of "parallel checkout". * mt/parallel-checkout-part-3: ci: run test round with parallel-checkout enabled parallel-checkout: add tests related to .gitattributes t0028: extract encoding helpers to lib-encoding.sh parallel-checkout: add tests related to path collisions parallel-checkout: add tests for basic operations checkout-index: add parallel checkout support builtin/checkout.c: complete parallel checkout support make_transient_cache_entry(): optionally alloc from mem_pool
2021-05-10Merge branch 'bc/hash-transition-interop-part-1'Junio C Hamano1-2/+2
SHA-256 transition. * bc/hash-transition-interop-part-1: hex: print objects using the hash algorithm member hex: default to the_hash_algo on zero algorithm value builtin/pack-objects: avoid using struct object_id for pack hash commit-graph: don't store file hashes as struct object_id builtin/show-index: set the algorithm for object IDs hash: provide per-algorithm null OIDs hash: set, copy, and use algo field in struct object_id builtin/pack-redundant: avoid casting buffers to struct object_id Use the final_oid_fn to finalize hashing of object IDs hash: add a function to finalize object IDs http-push: set algorithm when reading object ID Always use oidread to read into struct object_id hash: add an algo member to struct object_id
2021-05-07Merge branch 'mt/add-rm-in-sparse-checkout'Junio C Hamano1-0/+3
"git add" and "git rm" learned not to touch those paths that are outside of sparse checkout. * mt/add-rm-in-sparse-checkout: rm: honor sparse checkout patterns add: warn when asked to update SKIP_WORKTREE entries refresh_index(): add flag to ignore SKIP_WORKTREE entries pathspec: allow to ignore SKIP_WORKTREE entries on index matching add: make --chmod and --renormalize honor sparse checkouts t3705: add tests for `git add` in sparse checkouts add: include magic part of pathspec on --refresh error
2021-05-07Merge branch 'ad/cygwin-no-backslashes-in-paths'Junio C Hamano1-1/+1
Cygwin pathname handling fix. * ad/cygwin-no-backslashes-in-paths: cygwin: disallow backslashes in file names
2021-05-05make_transient_cache_entry(): optionally alloc from mem_poolMatheus Tavares1-4/+10
Allow make_transient_cache_entry() to optionally receive a mem_pool struct in which it should allocate the entry. This will be used in the following patch, to store some transient entries which should persist until parallel checkout finishes. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'ds/sparse-index-protections'Junio C Hamano1-7/+72
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-04-30cygwin: disallow backslashes in file namesAdam Dinwoodie1-1/+1
The backslash character is not a valid part of a file name on Windows. If, in Windows, Git attempts to write a file that has a backslash character in the filename, it will be incorrectly interpreted as a directory separator. This caused CVE-2019-1354 in MinGW, as this behaviour can be manipulated to cause the checkout to write to files it ought not write to, such as adding code to the .git/hooks directory. This was fixed by e1d911dd4c (mingw: disallow backslash characters in tree objects' file names, 2019-09-12). However, the vulnerability also exists in Cygwin: while Cygwin mostly provides a POSIX-like path system, it will still interpret a backslash as a directory separator. To avoid this vulnerability, CVE-2021-29468, extend the previous fix to also apply to Cygwin. Similarly, extend the test case added by the previous version of the commit. The test suite doesn't have an easy way to say "run this test if in MinGW or Cygwin", so add a new test prerequisite that covers both. As well as checking behaviour in the presence of paths containing backslashes, the existing test also checks behaviour in the presence of paths that differ only by the presence of a trailing ".". MinGW follows normal Windows application behaviour and treats them as the same path, but Cygwin more closely emulates *nix systems (at the expense of compatibility with native Windows applications) and will create and distinguish between such paths. Gate the relevant bit of that test accordingly. Reported-by: RyotaK <security@ryotak.me> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27Always use oidread to read into struct object_idbrian m. carlson1-2/+2
In the future, we'll want oidread to automatically set the hash algorithm member for an object ID we read into it, so ensure we use oidread instead of hashcpy everywhere we're copying a hash value into a struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-14read-cache: ensure full indexDerrick Stolee1-0/+4
Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-14read-cache: expand on query into sparse-directory entryDerrick Stolee1-0/+21
Callers to index_name_pos() or index_name_stage_pos() have a specific path in mind. If that happens to be a path with an ancestor being a sparse-directory entry, it can lead to unexpected results. In the case that we did not find the requested path, check to see if the position _before_ the inserted position is a sparse directory entry that matches the initial segment of the input path (including the directory separator at the end of the directory name). If so, then expand the index to be a full index and search again. This expansion will only happen once per index read. Future enhancements could be more careful to expand only the necessary sparse directory entry, but then we would have a special "not fully sparse, but also not fully expanded" mode that could affect writing the index to file. Since this only occurs if a specific file is requested outside of the sparse checkout definition, this is unlikely to be a common situation. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-14*: remove 'const' qualifier for struct index_stateDerrick Stolee1-5/+5
Several methods specify that they take a 'struct index_state' pointer with the 'const' qualifier because they intend to only query the data, not change it. However, we will be introducing a step very low in the method stack that might modify a sparse-index to become a full index in the case that our queries venture inside a sparse-directory entry. This change only removes the 'const' qualifiers that are necessary for the following change which will actually modify the implementation of index_name_stage_pos(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08refresh_index(): add flag to ignore SKIP_WORKTREE entriesMatheus Tavares1-0/+3
refresh_index() doesn't update SKIP_WORKTREE entries, but it still matches them against the given pathspecs, marks the matches on the seen[] array, check if unmerged, etc. In the following patch, one caller will need refresh_index() to ignore SKIP_WORKTREE entries entirely, so add a flag that implements this behavior. While we are here, also realign the REFRESH_* flags and convert the hex values to the more natural bit shift format, which makes it easier to spot holes. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: convert from full to sparseDerrick Stolee1-2/+24
If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: add 'sdir' index extensionDerrick Stolee1-0/+9
The index format does not currently allow for sparse directory entries. This violates some expectations that older versions of Git or third-party tools might not understand. We need an indicator inside the index file to warn these tools to not interact with a sparse index unless they are aware of sparse directory entries. Add a new _required_ index extension, 'sdir', that indicates that the index may contain sparse directory entries. This allows us to continue to use the differences in index formats 2, 3, and 4 before we create a new index version 5 in a later change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: implement ensure_full_index()Derrick Stolee1-0/+9
We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-19Merge branch 'rs/calloc-array'Junio C Hamano1-4/+4
CALLOC_ARRAY() macro replaces many uses of xcalloc(). * rs/calloc-array: cocci: allow xcalloc(1, size) use CALLOC_ARRAY git-compat-util.h: drop trailing semicolon from macro definition
2021-03-19Merge branch 'js/fsmonitor-unpack-fix'Junio C Hamano1-0/+1
The data structure used by fsmonitor interface was not properly duplicated during an in-core merge, leading to use-after-free etc. * js/fsmonitor-unpack-fix: fsmonitor: do not forget to release the token in `discard_index()` fsmonitor: fix memory corruption in some corner cases
2021-03-17fsmonitor: do not forget to release the token in `discard_index()`Johannes Schindelin1-0/+1
In 56c6910028a (fsmonitor: change last update timestamp on the index_state to opaque token, 2020-01-07), we forgot to adjust `discard_index()` to release the "last-update" token: it is no longer a 64-bit number, but a free-form string that has been allocated. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13use CALLOC_ARRAYRené Scharfe1-4/+4
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-01Merge branch 'ns/raise-write-index-buffer-size'Junio C Hamano1-1/+1
Raise the buffer size used when writing the index file out from (obviously too small) 8kB to (clearly sufficiently large) 128kB. * ns/raise-write-index-buffer-size: read-cache: make the index write buffer size 128K
2021-03-01Merge branch 'jh/fsmonitor-prework'Junio C Hamano1-3/+21
Preliminary changes to fsmonitor integration. * jh/fsmonitor-prework: fsmonitor: refactor initialization of fsmonitor_last_update token fsmonitor: allow all entries for a folder to be invalidated fsmonitor: log FSMN token when reading and writing the index fsmonitor: log invocation of FSMonitor hook to trace2 read-cache: log the number of scanned files to trace2 read-cache: log the number of lstat calls to trace2 preload-index: log the number of lstat calls to trace2 p7519: add trace logging during perf test p7519: move watchman cleanup earlier in the test p7519: fix watchman watch-list test on Windows p7519: do not rely on "xargs -d" in test
2021-02-24read-cache: make the index write buffer size 128KNeeraj Singh1-1/+1
Writing an index 8K at a time invokes the OS filesystem and caching code very frequently, introducing noticeable overhead while writing large indexes. When experimenting with different write buffer sizes on Windows writing the Windows OS repo index (260MB), most of the benefit came by bumping the index write buffer size to 64K. I picked 128K to ensure that we're past the knee of the curve. With this change, the time under do_write_index for an index with 3M files goes from ~1.02s to ~0.72s. Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16read-cache: log the number of scanned files to trace2Jeff Hostetler1-3/+10
Report the number of files in the working directory that were read and their hashes verified in `refresh_index()`. FSMonitor improves the performance of commands like `git status` by avoiding scanning the disk for changed files. Let's measure this. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16read-cache: log the number of lstat calls to trace2Jeff Hostetler1-3/+14
Report the total number of calls made to lstat() inside of refresh_index(). FSMonitor improves the performance of commands like `git status` by avoiding scanning the disk for changed files. This can be seen in `refresh_index()`. Let's measure this. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-06read-cache: try not to peek into `struct {lock_,temp}file`Martin Ågren1-6/+6
Similar to the previous commits, try to avoid peeking into the `struct lock_file`. We also have some `struct tempfile`s -- let's avoid looking into those as well. Note that `do_write_index()` takes a tempfile and that when we call it, we either have a tempfile which we can easily hand down, or we have a lock file, from which we need to somehow obtain the internal tempfile. So we need to leave that one instance of peeking-into. Nevertheless, this commit leaves us not relying on exactly how the path of the tempfile / lock file is stored internally. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-06read-cache: fix mem-pool allocation for multi-threaded index loadingRené Scharfe1-1/+1
44c7e1a7e0 (mem-pool: use more standard initialization and finalization, 2020-08-15) moved the allocation of the mem-pool structure to callers. It also added an allocation to load_cache_entries_threaded(), but for an unrelated mem-pool. Fix that by allocating the correct one instead -- the one that is initialized two lines later. Reported-by: Sandor Bodo-Merle <sbodomerle@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18mem-pool: use more standard initialization and finalizationElijah Newren1-8/+13
A typical memory type, such as strbuf, hashmap, or string_list can be stored on the stack or embedded within another structure. mem_pool cannot be, because of how mem_pool_init() and mem_pool_discard() are written. mem_pool_init() does essentially the following (simplified for purposes of explanation here): void mem_pool_init(struct mem_pool **pool...) { *pool = xcalloc(1, sizeof(*pool)); It seems weird to require that mem_pools can only be accessed through a pointer. It also seems slightly dangerous: unlike strbuf_release() or strbuf_reset() or string_list_clear(), all of which put the data structure into a state where it can be re-used after the call, mem_pool_discard(pool) will leave pool pointing at free'd memory. read-cache (and split-index) are the only current users of mem_pools, and they haven't fallen into a use-after-free mistake here, but it seems likely to be problematic for future users especially since several of the current callers of mem_pool_init() will only call it when the mem_pool* is not already allocated (i.e. is NULL). This type of mechanism also prevents finding synchronization points where one can free existing memory and then resume more operations. It would be natural at such points to run something like mem_pool_discard(pool...); and, if necessary, mem_pool_init(&pool...); and then carry on continuing to use the pool. However, this fails badly if several objects had a copy of the value of pool from before these commands; in such a case, those objects won't get the updated value of pool that mem_pool_init() overwrites pool with and they'll all instead be reading and writing from free'd memory. Modify mem_pool_init()/mem_pool_discard() to behave more like strbuf_init()/strbuf_release() or string_list_init()/string_list_clear() In particular: (1) make mem_pool_init() just take a mem_pool* and have it only worry about allocating struct mp_blocks, not the struct mem_pool itself, (2) make mem_pool_discard() free the memory that the pool was responsible for, but leave it in a state where it can be used to allocate more memory afterward (without the need to call mem_pool_init() again). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16read-cache: remove bogus shortcutRené Scharfe1-14/+0
has_dir_name() has some optimizations for the case where entries are added to an index in the correct order. They kick in if the new entry sorts after the last one. One of them exits early if the last entry has a longer name than the directory of the new entry. Here's its comment: /* * The directory prefix lines up with part of * a longer file or directory name, but sorts * after it, so this sub-directory cannot * collide with a file. * * last: xxx/yy-file (because '-' sorts before '/') * this: xxx/yy/abc */ However, a file named xxx/yy would be sorted before xxx/yy-file because '-' sorts after NUL, so the length check against the last entry is not sufficient to rule out a collision. Remove it. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-10Merge branch 'js/mingw-loosen-overstrict-tree-entry-checks'Junio C Hamano1-6/+6
Further tweak to a "no backslash in indexed paths" for Windows port we applied earlier. * js/mingw-loosen-overstrict-tree-entry-checks: mingw: safeguard better against backslashes in file names
2020-01-10mingw: safeguard better against backslashes in file namesJohannes Schindelin via GitGitGadget1-6/+6
In 224c7d70fa1 (mingw: only test index entries for backslashes, not tree entries, 2019-12-31), we relaxed the check for backslashes in tree entries to check only index entries. However, the code change was incorrect: it was added to `add_index_entry_with_check()`, not to `add_index_entry()`, so under certain circumstances it was possible to side-step the protection. Besides, the description of that commit purported that all index entries would be checked when in fact they were only checked when being added to the index (there are code paths that do not do that, constructing "transient" index entries). In any case, it was pointed out in one insightful review at https://github.com/git-for-windows/git/pull/2437#issuecomment-566771835 that it would be a much better idea to teach `verify_path()` to perform the check for a backslash. This is safer, even if it comes with two notable drawbacks: - `verify_path()` cannot say _what_ is wrong with the path, therefore the user will no longer be told that there was a backslash in the path, only that the path was invalid. - The `git apply` command also calls the `verify_path()` function, and might have been able to handle Windows-style paths (i.e. with backslashes instead of forward slashes). This will no longer be possible unless the user (temporarily) sets `core.protectNTFS=false`. Note that `git add <windows-path>` will _still_ work because `normalize_path_copy_len()` will convert the backslashes to forward slashes before hitting the code path that creates an index entry. The clear advantage is that `verify_path()`'s purpose is to check the validity of the file name, therefore we naturally tap into all the code paths that need safeguarding, also implicitly into future code paths. The benefits of that approach outweigh the downsides, so let's move the check from `add_index_entry_with_check()` to `verify_path()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-06Merge branch 'js/mingw-loosen-overstrict-tree-entry-checks'Junio C Hamano1-0/+5
An earlier update to Git for Windows declared that a tree object is invalid if it has a path component with backslash in it, which was overly strict, which has been corrected. The only protection the Windows users need is to prevent such path (or any path that their filesystem cannot check out) from entering the index. * js/mingw-loosen-overstrict-tree-entry-checks: mingw: only test index entries for backslashes, not tree entries
2020-01-02mingw: only test index entries for backslashes, not tree entriesJohannes Schindelin1-0/+5
During a clone of a repository that contained a file with a backslash in its name in the past, as of v2.24.1(2), Git for Windows prints errors like this: error: filename in tree entry contains backslash: '\' The idea is to prevent Git from even trying to write files with backslashes in their file names: while these characters are valid in file names on other platforms, on Windows it is interpreted as directory separator (which would obviously lead to ambiguities, e.g. when there is a file `a\b` and there is also a file `a/b`). Arguably, this is the wrong layer for that error: As long as the user never checks out the files whose names contain backslashes, there should not be any problem in the first place. So let's loosen the requirements: we now leave tree entries with backslashes in their file names alone, but we do require any entries that are added to the Git index to contain no backslashes on Windows. Note: just as before, the check is guarded by `core.protectNTFS` (to allow overriding the check by toggling that config setting), and it is _only_ performed on Windows, as the backslash is not a directory separator elsewhere, even when writing to NTFS-formatted volumes. An alternative approach would be to try to prevent creating files with backslashes in their file names. However, that comes with its own set of problems. For example, `git config -f C:\ProgramData\Git\config ...` is a very valid way to specify a custom config location, and we obviously do _not_ want to prevent that. Therefore, the approach chosen in this patch would appear to be better. This addresses https://github.com/git-for-windows/git/issues/2435 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>