aboutsummaryrefslogtreecommitdiffstats
path: root/read-cache.c
AgeCommit message (Collapse)AuthorFilesLines
2021-05-19read-cache: delete unused hashing methodsDerrick Stolee1-64/+0
These methods were marked as MAYBE_UNUSED in the previous change to avoid a complicated diff. Delete them entirely, since we now use the hashfile API instead of this custom hashing code. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19read-cache: use hashfile instead of git_hash_ctxDerrick Stolee1-71/+66
The do_write_index() method in read-cache.c has its own hashing logic and buffering mechanism. Specifically, the ce_write() method was introduced by 4990aadc (Speed up index file writing by chunking it nicely, 2005-04-20) and similar mechanisms were introduced a few months later in c38138cd (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26). Based on the timing, in the early days of the Git codebase, I figured that these roughly equivalent code paths were never unified only because it got lost in the shuffle. The hashfile API has since been used extensively in other file formats, such as pack-indexes, multi-pack-indexes, and commit-graphs. Therefore, it seems prudent to unify the index writing code to use the same mechanism. I discovered this disparity while trying to create a new index format that uses the chunk-format API. That API uses a hashfile as its base, so it is incompatible with the custom code in read-cache.c. This rewrite is rather straightforward. It replaces all writes to the temporary file with writes to the hashfile struct. This takes care of many of the direct interactions with the_hash_algo. There are still some git_hash_ctx uses remaining: the extension headers are hashed for use in the End of Index Entries (EOIE) extension. This use of the git_hash_ctx is left as-is. There are multiple reasons to not use a hashfile here, including the fact that the data is not actually writing to a file, just a hash computation. These hashes do not block our adoption of the chunk-format API in a future change to the index, so leave it as-is. The internals of the algorithms are mostly identical. Previously, the hashfile API used a smaller 8KB buffer instead of the 128KB buffer from read-cache.c. The previous change already unified these sizes. There is one subtle point: we do not pass the CSUM_FSYNC to the finalize_hashfile() method, which differs from most consumers of the hashfile API. The extra fsync() call indicated by this flag causes a significant peformance degradation that is noticeable for quick commands that write the index, such as "git add". Other consumers can absorb this cost with their more complicated data structure organization, and further writing structures such as pack-files and commit-graphs is rarely in the critical path for common user interactions. Some static methods become orphaned in this diff, so I marked them as MAYBE_UNUSED. The diff is much harder to read if they are deleted during this change. Instead, they will be deleted in the following change. In addition to the test suite passing, I computed indexes using the previous binaries and the binaries compiled after this change, and found the index data to be exactly equal. Finally, I did extensive performance testing of "git update-index --force-write" on repos of various sizes, including one with over 2 million paths at HEAD. These tests demonstrated less than 1% difference in behavior. As expected, the performance should be considered unchanged. The previous changes to increase the hashfile buffer size from 8K to 128K ensured this change would not create a peformance regression. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-16Merge branch 'mt/parallel-checkout-part-3'Junio C Hamano1-4/+10
The final part of "parallel checkout". * mt/parallel-checkout-part-3: ci: run test round with parallel-checkout enabled parallel-checkout: add tests related to .gitattributes t0028: extract encoding helpers to lib-encoding.sh parallel-checkout: add tests related to path collisions parallel-checkout: add tests for basic operations checkout-index: add parallel checkout support builtin/checkout.c: complete parallel checkout support make_transient_cache_entry(): optionally alloc from mem_pool
2021-05-10Merge branch 'bc/hash-transition-interop-part-1'Junio C Hamano1-2/+2
SHA-256 transition. * bc/hash-transition-interop-part-1: hex: print objects using the hash algorithm member hex: default to the_hash_algo on zero algorithm value builtin/pack-objects: avoid using struct object_id for pack hash commit-graph: don't store file hashes as struct object_id builtin/show-index: set the algorithm for object IDs hash: provide per-algorithm null OIDs hash: set, copy, and use algo field in struct object_id builtin/pack-redundant: avoid casting buffers to struct object_id Use the final_oid_fn to finalize hashing of object IDs hash: add a function to finalize object IDs http-push: set algorithm when reading object ID Always use oidread to read into struct object_id hash: add an algo member to struct object_id
2021-05-07Merge branch 'mt/add-rm-in-sparse-checkout'Junio C Hamano1-0/+3
"git add" and "git rm" learned not to touch those paths that are outside of sparse checkout. * mt/add-rm-in-sparse-checkout: rm: honor sparse checkout patterns add: warn when asked to update SKIP_WORKTREE entries refresh_index(): add flag to ignore SKIP_WORKTREE entries pathspec: allow to ignore SKIP_WORKTREE entries on index matching add: make --chmod and --renormalize honor sparse checkouts t3705: add tests for `git add` in sparse checkouts add: include magic part of pathspec on --refresh error
2021-05-07Merge branch 'ad/cygwin-no-backslashes-in-paths'Junio C Hamano1-1/+1
Cygwin pathname handling fix. * ad/cygwin-no-backslashes-in-paths: cygwin: disallow backslashes in file names
2021-05-05make_transient_cache_entry(): optionally alloc from mem_poolMatheus Tavares1-4/+10
Allow make_transient_cache_entry() to optionally receive a mem_pool struct in which it should allocate the entry. This will be used in the following patch, to store some transient entries which should persist until parallel checkout finishes. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'ds/sparse-index-protections'Junio C Hamano1-7/+72
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-04-30cygwin: disallow backslashes in file namesAdam Dinwoodie1-1/+1
The backslash character is not a valid part of a file name on Windows. If, in Windows, Git attempts to write a file that has a backslash character in the filename, it will be incorrectly interpreted as a directory separator. This caused CVE-2019-1354 in MinGW, as this behaviour can be manipulated to cause the checkout to write to files it ought not write to, such as adding code to the .git/hooks directory. This was fixed by e1d911dd4c (mingw: disallow backslash characters in tree objects' file names, 2019-09-12). However, the vulnerability also exists in Cygwin: while Cygwin mostly provides a POSIX-like path system, it will still interpret a backslash as a directory separator. To avoid this vulnerability, CVE-2021-29468, extend the previous fix to also apply to Cygwin. Similarly, extend the test case added by the previous version of the commit. The test suite doesn't have an easy way to say "run this test if in MinGW or Cygwin", so add a new test prerequisite that covers both. As well as checking behaviour in the presence of paths containing backslashes, the existing test also checks behaviour in the presence of paths that differ only by the presence of a trailing ".". MinGW follows normal Windows application behaviour and treats them as the same path, but Cygwin more closely emulates *nix systems (at the expense of compatibility with native Windows applications) and will create and distinguish between such paths. Gate the relevant bit of that test accordingly. Reported-by: RyotaK <security@ryotak.me> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27Always use oidread to read into struct object_idbrian m. carlson1-2/+2
In the future, we'll want oidread to automatically set the hash algorithm member for an object ID we read into it, so ensure we use oidread instead of hashcpy everywhere we're copying a hash value into a struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-14read-cache: ensure full indexDerrick Stolee1-0/+4
Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-14read-cache: expand on query into sparse-directory entryDerrick Stolee1-0/+21
Callers to index_name_pos() or index_name_stage_pos() have a specific path in mind. If that happens to be a path with an ancestor being a sparse-directory entry, it can lead to unexpected results. In the case that we did not find the requested path, check to see if the position _before_ the inserted position is a sparse directory entry that matches the initial segment of the input path (including the directory separator at the end of the directory name). If so, then expand the index to be a full index and search again. This expansion will only happen once per index read. Future enhancements could be more careful to expand only the necessary sparse directory entry, but then we would have a special "not fully sparse, but also not fully expanded" mode that could affect writing the index to file. Since this only occurs if a specific file is requested outside of the sparse checkout definition, this is unlikely to be a common situation. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-14*: remove 'const' qualifier for struct index_stateDerrick Stolee1-5/+5
Several methods specify that they take a 'struct index_state' pointer with the 'const' qualifier because they intend to only query the data, not change it. However, we will be introducing a step very low in the method stack that might modify a sparse-index to become a full index in the case that our queries venture inside a sparse-directory entry. This change only removes the 'const' qualifiers that are necessary for the following change which will actually modify the implementation of index_name_stage_pos(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08refresh_index(): add flag to ignore SKIP_WORKTREE entriesMatheus Tavares1-0/+3
refresh_index() doesn't update SKIP_WORKTREE entries, but it still matches them against the given pathspecs, marks the matches on the seen[] array, check if unmerged, etc. In the following patch, one caller will need refresh_index() to ignore SKIP_WORKTREE entries entirely, so add a flag that implements this behavior. While we are here, also realign the REFRESH_* flags and convert the hex values to the more natural bit shift format, which makes it easier to spot holes. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: convert from full to sparseDerrick Stolee1-2/+24
If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: add 'sdir' index extensionDerrick Stolee1-0/+9
The index format does not currently allow for sparse directory entries. This violates some expectations that older versions of Git or third-party tools might not understand. We need an indicator inside the index file to warn these tools to not interact with a sparse index unless they are aware of sparse directory entries. Add a new _required_ index extension, 'sdir', that indicates that the index may contain sparse directory entries. This allows us to continue to use the differences in index formats 2, 3, and 4 before we create a new index version 5 in a later change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: implement ensure_full_index()Derrick Stolee1-0/+9
We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-19Merge branch 'rs/calloc-array'Junio C Hamano1-4/+4
CALLOC_ARRAY() macro replaces many uses of xcalloc(). * rs/calloc-array: cocci: allow xcalloc(1, size) use CALLOC_ARRAY git-compat-util.h: drop trailing semicolon from macro definition
2021-03-19Merge branch 'js/fsmonitor-unpack-fix'Junio C Hamano1-0/+1
The data structure used by fsmonitor interface was not properly duplicated during an in-core merge, leading to use-after-free etc. * js/fsmonitor-unpack-fix: fsmonitor: do not forget to release the token in `discard_index()` fsmonitor: fix memory corruption in some corner cases
2021-03-17fsmonitor: do not forget to release the token in `discard_index()`Johannes Schindelin1-0/+1
In 56c6910028a (fsmonitor: change last update timestamp on the index_state to opaque token, 2020-01-07), we forgot to adjust `discard_index()` to release the "last-update" token: it is no longer a 64-bit number, but a free-form string that has been allocated. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13use CALLOC_ARRAYRené Scharfe1-4/+4
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-01Merge branch 'ns/raise-write-index-buffer-size'Junio C Hamano1-1/+1
Raise the buffer size used when writing the index file out from (obviously too small) 8kB to (clearly sufficiently large) 128kB. * ns/raise-write-index-buffer-size: read-cache: make the index write buffer size 128K
2021-03-01Merge branch 'jh/fsmonitor-prework'Junio C Hamano1-3/+21
Preliminary changes to fsmonitor integration. * jh/fsmonitor-prework: fsmonitor: refactor initialization of fsmonitor_last_update token fsmonitor: allow all entries for a folder to be invalidated fsmonitor: log FSMN token when reading and writing the index fsmonitor: log invocation of FSMonitor hook to trace2 read-cache: log the number of scanned files to trace2 read-cache: log the number of lstat calls to trace2 preload-index: log the number of lstat calls to trace2 p7519: add trace logging during perf test p7519: move watchman cleanup earlier in the test p7519: fix watchman watch-list test on Windows p7519: do not rely on "xargs -d" in test
2021-02-24read-cache: make the index write buffer size 128KNeeraj Singh1-1/+1
Writing an index 8K at a time invokes the OS filesystem and caching code very frequently, introducing noticeable overhead while writing large indexes. When experimenting with different write buffer sizes on Windows writing the Windows OS repo index (260MB), most of the benefit came by bumping the index write buffer size to 64K. I picked 128K to ensure that we're past the knee of the curve. With this change, the time under do_write_index for an index with 3M files goes from ~1.02s to ~0.72s. Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16read-cache: log the number of scanned files to trace2Jeff Hostetler1-3/+10
Report the number of files in the working directory that were read and their hashes verified in `refresh_index()`. FSMonitor improves the performance of commands like `git status` by avoiding scanning the disk for changed files. Let's measure this. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16read-cache: log the number of lstat calls to trace2Jeff Hostetler1-3/+14
Report the total number of calls made to lstat() inside of refresh_index(). FSMonitor improves the performance of commands like `git status` by avoiding scanning the disk for changed files. This can be seen in `refresh_index()`. Let's measure this. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-06read-cache: try not to peek into `struct {lock_,temp}file`Martin Ågren1-6/+6
Similar to the previous commits, try to avoid peeking into the `struct lock_file`. We also have some `struct tempfile`s -- let's avoid looking into those as well. Note that `do_write_index()` takes a tempfile and that when we call it, we either have a tempfile which we can easily hand down, or we have a lock file, from which we need to somehow obtain the internal tempfile. So we need to leave that one instance of peeking-into. Nevertheless, this commit leaves us not relying on exactly how the path of the tempfile / lock file is stored internally. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-06read-cache: fix mem-pool allocation for multi-threaded index loadingRené Scharfe1-1/+1
44c7e1a7e0 (mem-pool: use more standard initialization and finalization, 2020-08-15) moved the allocation of the mem-pool structure to callers. It also added an allocation to load_cache_entries_threaded(), but for an unrelated mem-pool. Fix that by allocating the correct one instead -- the one that is initialized two lines later. Reported-by: Sandor Bodo-Merle <sbodomerle@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18mem-pool: use more standard initialization and finalizationElijah Newren1-8/+13
A typical memory type, such as strbuf, hashmap, or string_list can be stored on the stack or embedded within another structure. mem_pool cannot be, because of how mem_pool_init() and mem_pool_discard() are written. mem_pool_init() does essentially the following (simplified for purposes of explanation here): void mem_pool_init(struct mem_pool **pool...) { *pool = xcalloc(1, sizeof(*pool)); It seems weird to require that mem_pools can only be accessed through a pointer. It also seems slightly dangerous: unlike strbuf_release() or strbuf_reset() or string_list_clear(), all of which put the data structure into a state where it can be re-used after the call, mem_pool_discard(pool) will leave pool pointing at free'd memory. read-cache (and split-index) are the only current users of mem_pools, and they haven't fallen into a use-after-free mistake here, but it seems likely to be problematic for future users especially since several of the current callers of mem_pool_init() will only call it when the mem_pool* is not already allocated (i.e. is NULL). This type of mechanism also prevents finding synchronization points where one can free existing memory and then resume more operations. It would be natural at such points to run something like mem_pool_discard(pool...); and, if necessary, mem_pool_init(&pool...); and then carry on continuing to use the pool. However, this fails badly if several objects had a copy of the value of pool from before these commands; in such a case, those objects won't get the updated value of pool that mem_pool_init() overwrites pool with and they'll all instead be reading and writing from free'd memory. Modify mem_pool_init()/mem_pool_discard() to behave more like strbuf_init()/strbuf_release() or string_list_init()/string_list_clear() In particular: (1) make mem_pool_init() just take a mem_pool* and have it only worry about allocating struct mp_blocks, not the struct mem_pool itself, (2) make mem_pool_discard() free the memory that the pool was responsible for, but leave it in a state where it can be used to allocate more memory afterward (without the need to call mem_pool_init() again). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16read-cache: remove bogus shortcutRené Scharfe1-14/+0
has_dir_name() has some optimizations for the case where entries are added to an index in the correct order. They kick in if the new entry sorts after the last one. One of them exits early if the last entry has a longer name than the directory of the new entry. Here's its comment: /* * The directory prefix lines up with part of * a longer file or directory name, but sorts * after it, so this sub-directory cannot * collide with a file. * * last: xxx/yy-file (because '-' sorts before '/') * this: xxx/yy/abc */ However, a file named xxx/yy would be sorted before xxx/yy-file because '-' sorts after NUL, so the length check against the last entry is not sufficient to rule out a collision. Remove it. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-10Merge branch 'js/mingw-loosen-overstrict-tree-entry-checks'Junio C Hamano1-6/+6
Further tweak to a "no backslash in indexed paths" for Windows port we applied earlier. * js/mingw-loosen-overstrict-tree-entry-checks: mingw: safeguard better against backslashes in file names
2020-01-10mingw: safeguard better against backslashes in file namesJohannes Schindelin via GitGitGadget1-6/+6
In 224c7d70fa1 (mingw: only test index entries for backslashes, not tree entries, 2019-12-31), we relaxed the check for backslashes in tree entries to check only index entries. However, the code change was incorrect: it was added to `add_index_entry_with_check()`, not to `add_index_entry()`, so under certain circumstances it was possible to side-step the protection. Besides, the description of that commit purported that all index entries would be checked when in fact they were only checked when being added to the index (there are code paths that do not do that, constructing "transient" index entries). In any case, it was pointed out in one insightful review at https://github.com/git-for-windows/git/pull/2437#issuecomment-566771835 that it would be a much better idea to teach `verify_path()` to perform the check for a backslash. This is safer, even if it comes with two notable drawbacks: - `verify_path()` cannot say _what_ is wrong with the path, therefore the user will no longer be told that there was a backslash in the path, only that the path was invalid. - The `git apply` command also calls the `verify_path()` function, and might have been able to handle Windows-style paths (i.e. with backslashes instead of forward slashes). This will no longer be possible unless the user (temporarily) sets `core.protectNTFS=false`. Note that `git add <windows-path>` will _still_ work because `normalize_path_copy_len()` will convert the backslashes to forward slashes before hitting the code path that creates an index entry. The clear advantage is that `verify_path()`'s purpose is to check the validity of the file name, therefore we naturally tap into all the code paths that need safeguarding, also implicitly into future code paths. The benefits of that approach outweigh the downsides, so let's move the check from `add_index_entry_with_check()` to `verify_path()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-06Merge branch 'js/mingw-loosen-overstrict-tree-entry-checks'Junio C Hamano1-0/+5
An earlier update to Git for Windows declared that a tree object is invalid if it has a path component with backslash in it, which was overly strict, which has been corrected. The only protection the Windows users need is to prevent such path (or any path that their filesystem cannot check out) from entering the index. * js/mingw-loosen-overstrict-tree-entry-checks: mingw: only test index entries for backslashes, not tree entries
2020-01-02mingw: only test index entries for backslashes, not tree entriesJohannes Schindelin1-0/+5
During a clone of a repository that contained a file with a backslash in its name in the past, as of v2.24.1(2), Git for Windows prints errors like this: error: filename in tree entry contains backslash: '\' The idea is to prevent Git from even trying to write files with backslashes in their file names: while these characters are valid in file names on other platforms, on Windows it is interpreted as directory separator (which would obviously lead to ambiguities, e.g. when there is a file `a\b` and there is also a file `a/b`). Arguably, this is the wrong layer for that error: As long as the user never checks out the files whose names contain backslashes, there should not be any problem in the first place. So let's loosen the requirements: we now leave tree entries with backslashes in their file names alone, but we do require any entries that are added to the Git index to contain no backslashes on Windows. Note: just as before, the check is guarded by `core.protectNTFS` (to allow overriding the check by toggling that config setting), and it is _only_ performed on Windows, as the backslash is not a directory separator elsewhere, even when writing to NTFS-formatted volumes. An alternative approach would be to try to prevent creating files with backslashes in their file names. However, that comes with its own set of problems. For example, `git config -f C:\ProgramData\Git\config ...` is a very valid way to specify a custom config location, and we obviously do _not_ want to prevent that. Therefore, the approach chosen in this patch would appear to be better. This addresses https://github.com/git-for-windows/git/issues/2435 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09Sync with Git 2.24.1Junio C Hamano1-0/+11
2019-12-06Sync with 2.23.1Johannes Schindelin1-0/+11
* maint-2.23: (44 commits) Git 2.23.1 Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters ...
2019-12-06Sync with 2.22.2Johannes Schindelin1-0/+11
* maint-2.22: (43 commits) Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors ...
2019-12-06Sync with 2.21.1Johannes Schindelin1-0/+11
* maint-2.21: (42 commits) Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh ...
2019-12-06Sync with 2.20.2Johannes Schindelin1-0/+11
* maint-2.20: (36 commits) Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories ...
2019-12-06Sync with 2.19.3Johannes Schindelin1-0/+11
* maint-2.19: (34 commits) Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams ...
2019-12-06Sync with 2.18.2Johannes Schindelin1-0/+11
* maint-2.18: (33 commits) Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up ...
2019-12-06Sync with 2.17.3Johannes Schindelin1-0/+11
* maint-2.17: (32 commits) Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names ...
2019-12-06Sync with 2.16.6Johannes Schindelin1-0/+11
* maint-2.16: (31 commits) Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses ...
2019-12-06Sync with 2.15.4Johannes Schindelin1-0/+11
* maint-2.15: (29 commits) Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment ...
2019-12-06Sync with 2.14.6Johannes Schindelin1-0/+11
* maint-2.14: (28 commits) Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment test-path-utils: offer to run a protectNTFS/protectHFS benchmark ...
2019-12-05mingw: refuse to access paths with trailing spaces or periodsJohannes Schindelin1-0/+3
When creating a directory on Windows whose path ends in a space or a period (or chains thereof), the Win32 API "helpfully" trims those. For example, `mkdir("abc ");` will return success, but actually create a directory called `abc` instead. This stems back to the DOS days, when all file names had exactly 8 characters plus exactly 3 characters for the file extension, and the only way to have shorter names was by padding with spaces. Sadly, this "helpful" behavior is a bit inconsistent: after a successful `mkdir("abc ");`, a `mkdir("abc /def")` will actually _fail_ (because the directory `abc ` does not actually exist). Even if it would work, we now have a serious problem because a Git repository could contain directories `abc` and `abc `, and on Windows, they would be "merged" unintentionally. As these paths are illegal on Windows, anyway, let's disallow any accesses to such paths on that Operating System. For practical reasons, this behavior is still guarded by the config setting `core.protectNTFS`: it is possible (and at least two regression tests make use of it) to create commits without involving the worktree. In such a scenario, it is of course possible -- even on Windows -- to create such file names. Among other consequences, this patch disallows submodules' paths to end in spaces on Windows (which would formerly have confused Git enough to try to write into incorrect paths, anyway). While this patch does not fix a vulnerability on its own, it prevents an attack vector that was exploited in demonstrations of a number of recently-fixed security bugs. The regression test added to `t/t7417-submodule-path-url.sh` reflects that attack vector. Note that we have to adjust the test case "prevent git~1 squatting on Windows" in `t/t7415-submodule-names.sh` because of a very subtle issue. It tries to clone two submodules whose names differ only in a trailing period character, and as a consequence their git directories differ in the same way. Previously, when Git tried to clone the second submodule, it thought that the git directory already existed (because on Windows, when you create a directory with the name `b.` it actually creates `b`), but with this patch, the first submodule's clone will fail because of the illegal name of the git directory. Therefore, when cloning the second submodule, Git will take a different code path: a fresh clone (without an existing git directory). Both code paths fail to clone the second submodule, both because the the corresponding worktree directory exists and is not empty, but the error messages are worded differently. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05is_ntfs_dotgit(): only verify the leading segmentJohannes Schindelin1-0/+8
The config setting `core.protectNTFS` is specifically designed to work not only on Windows, but anywhere, to allow for repositories hosted on, say, Linux servers to be protected against NTFS-specific attack vectors. As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated paths (but does not do the same for paths separated by forward slashes), under the assumption that the backslash might not be a valid directory separator on the _current_ Operating System. However, the two callers, `verify_path()` and `fsck_tree()`, are supposed to feed only individual path segments to the `is_ntfs_dotgit()` function. This causes a lot of duplicate scanning (and very inefficient scanning, too, as the inner loop of `is_ntfs_dotgit()` was optimized for readability rather than for speed. Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of splitting the paths by backslashes as directory separators on the callers of said function. Consequently, the `verify_path()` function, which already splits the path by directory separators, now treats backslashes as directory separators _explicitly_ when `core.protectNTFS` is turned on, even on platforms where the backslash is _not_ a directory separator. Note that we have to repeat some code in `verify_path()`: if the backslash is not a directory separator on the current Operating System, we want to allow file names like `\`, but we _do_ want to disallow paths that are clearly intended to cause harm when the repository is cloned on Windows. The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now needs to look for backslashes in tree entries' names specifically when `core.protectNTFS` is turned on. While it would be tempting to completely disallow backslashes in that case (much like `fsck` reports names containing forward slashes as "full paths"), this would be overzealous: when `core.protectNTFS` is turned on in a non-Windows setup, backslashes are perfectly valid characters in file names while we _still_ want to disallow tree entries that are clearly designed to exploit NTFS-specific behavior. This simplification will make subsequent changes easier to implement, such as turning `core.protectNTFS` on by default (not only on Windows) or protecting against attack vectors involving NTFS Alternate Data Streams. Incidentally, this change allows for catching malicious repositories that contain tree entries of the form `dir\.gitmodules` already on the server side rather than only on the client side (and previously only on Windows): in contrast to `is_ntfs_dotgit()`, the `is_ntfs_dotgitmodules()` function already expects the caller to split the paths by directory separators. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-11-10Fix spelling errors in code commentsElijah Newren1-1/+1
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-15Merge branch 'js/azure-pipelines-msvc'Junio C Hamano1-2/+2
CI updates. * js/azure-pipelines-msvc: ci: also build and test with MS Visual Studio on Azure Pipelines ci: really use shallow clones on Azure Pipelines tests: let --immediate and --write-junit-xml play well together test-tool run-command: learn to run (parts of) the testsuite vcxproj: include more generated files vcxproj: only copy `git-remote-http.exe` once it was built msvc: work around a bug in GetEnvironmentVariable() msvc: handle DEVELOPER=1 msvc: ignore some libraries when linking compat/win32/path-utils.h: add #include guards winansi: use FLEX_ARRAY to avoid compiler warning msvc: avoid using minus operator on unsigned types push: do not pretend to return `int` from `die_push_simple()`
2019-10-07Merge branch 'tg/stash-refresh-index'Junio C Hamano1-0/+21
"git stash" learned to write refreshed index back to disk. * tg/stash-refresh-index: stash: make sure to write refreshed cache merge: use refresh_and_write_cache factor out refresh_and_write_cache function
2019-10-06msvc: avoid using minus operator on unsigned typesJohannes Schindelin1-2/+2
MSVC complains about this with `-Wall`, which can be taken as a sign that this is indeed a real bug. The symptom is: C4146: unary minus operator applied to unsigned type, result still unsigned Let's avoid this warning in the minimal way, e.g. writing `-1 - <unsigned value>` instead of `-<unsigned value> - 1`. Note that the change in the `estimate_cache_size()` function is needed because MSVC considers the "return type" of the `sizeof()` operator to be `size_t`, i.e. unsigned, and therefore it cannot be negated using the unary minus operator. Even worse, that arithmetic is doing extra work, in vain. We want to calculate the entry extra cache size as the difference between the size of the `cache_entry` structure minus the size of the `ondisk_cache_entry` structure, padded to the appropriate alignment boundary. To that end, we start by assigning that difference to the `per_entry` variable, and then abuse the `len` parameter of the `align_padding_size()` macro to take the negative size of the ondisk entry size. Essentially, we try to avoid passing the already calculated difference to that macro by passing the operands of that difference instead, when the macro expects operands of an addition: #define align_padding_size(size, len) \ ((size + (len) + 8) & ~7) - (size + len) Currently, we pass A and -B to that macro instead of passing A - B and 0, where A - B is already stored in the `per_entry` variable, ready to be used. This is neither necessary, nor intuitive. Let's fix this, and have code that is both easier to read and that also does not trigger MSVC's warning. While at it, we take care of reporting overflows (which are unlikely, but hey, defensive programming is good!). We _also_ take pains of casting the unsigned value to signed: otherwise, the signed operand (i.e. the `-1`) would be cast to unsigned before doing the arithmetic. Helped-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-20factor out refresh_and_write_cache functionThomas Gummerer1-0/+21
Getting the lock for the index, refreshing it and then writing it is a pattern that happens more than once throughout the codebase, and isn't trivial to get right. Factor out the refresh_and_write_cache function from builtin/am.c to read-cache.c, so it can be re-used in other places in a subsequent commit. Note that we return different error codes for failing to refresh the cache, and failing to write the index. The current caller only cares about failing to write the index. However for other callers we're going to convert in subsequent patches we will need this distinction. Helped-by: Martin Ågren <martin.agren@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-09Merge branch 'ds/feature-macros'Junio C Hamano1-15/+15
A mechanism to affect the default setting for a (related) group of configuration variables is introduced. * ds/feature-macros: repo-settings: create feature.experimental setting repo-settings: create feature.manyFiles setting repo-settings: parse core.untrackedCache commit-graph: turn on commit-graph by default t6501: use 'git gc' in quiet mode repo-settings: consolidate some config settings
2019-08-13repo-settings: parse core.untrackedCacheDerrick Stolee1-10/+9
The core.untrackedCache config setting is slightly complicated, so clarify its use and centralize its parsing into the repo settings. The default value is "keep" (returned as -1), which persists the untracked cache if it exists. If the value is set as "false" (returned as 0), then remove the untracked cache if it exists. If the value is set as "true" (returned as 1), then write the untracked cache and persist it. Instead of relying on magic values of -1, 0, and 1, split these options into an enum. This allows the use of "-1" as a default value. After parsing the config options, if the value is unset we can initialize it to UNTRACKED_CACHE_KEEP. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13repo-settings: consolidate some config settingsDerrick Stolee1-5/+6
There are a few important config settings that are not loaded during git_default_config. These are instead loaded on-demand. Centralize these config options to a single scan, and store all of the values in a repo_settings struct. The values for each setting are initialized as negative to indicate "unset". This centralization will be particularly important in a later change to introduce "meta" config settings that change the defaults for these config settings. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-29Merge branch 'rs/avoid-overflow-in-midpoint-computation' into maintJunio C Hamano1-1/+1
Code clean-up to avoid signed integer overlaps during binary search. * rs/avoid-overflow-in-midpoint-computation: cleanup: fix possible overflow errors in binary search, part 2
2019-07-29Merge branch 'vn/xmmap-gently' into maintJunio C Hamano1-1/+1
Clean-up an error codepath. * vn/xmmap-gently: read-cache.c: do not die if mmap fails
2019-07-25Merge branch 'vn/xmmap-gently'Junio C Hamano1-1/+1
Clean-up an error codepath. * vn/xmmap-gently: read-cache.c: do not die if mmap fails
2019-07-14read-cache.c: do not die if mmap failsVarun Naik1-1/+1
do_read_index() mmaps the index, or tries to die with an error message on failure. It should call xmmap_gently(), which returns MAP_FAILED, rather than xmmap(), which dies with its own error message. An easy way to cause this mmap to fail is by setting $GIT_INDEX_FILE to a path to a directory and then invoking any command that reads from the index. Signed-off-by: Varun Naik <vcnaik94@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-09Merge branch 'rs/avoid-overflow-in-midpoint-computation'Junio C Hamano1-1/+1
Code clean-up to avoid signed integer overlaps during binary search. * rs/avoid-overflow-in-midpoint-computation: cleanup: fix possible overflow errors in binary search, part 2
2019-06-17Merge branch 'js/fsmonitor-unflake'Junio C Hamano1-5/+5
The data collected by fsmonitor was not properly written back to the on-disk index file, breaking t7519 tests occasionally, which has been corrected. * js/fsmonitor-unflake: mark_fsmonitor_valid(): mark the index as changed if needed fill_stat_cache_info(): prepare for an fsmonitor fix
2019-06-13Merge branch 'jk/unused-params-final-batch'Junio C Hamano1-2/+2
* jk/unused-params-final-batch: verify-commit: simplify parameters to run_gpg_verify() show-branch: drop unused parameter from show_independent() rev-list: drop unused void pointer from finish_commit() remove_all_fetch_refspecs(): drop unused "remote" parameter receive-pack: drop unused "commands" from prepare_shallow_update() pack-objects: drop unused rev_info parameters name-rev: drop unused parameters from is_better_name() mktree: drop unused length parameter wt-status: drop unused status parameter read-cache: drop unused parameter from threaded load clone: drop dest parameter from copy_alternates() submodule: drop unused prefix parameter from some functions builtin: consistently pass cmd_* prefix to parse_options cmd_{read,write}_tree: rename "unused" variable that is used
2019-06-13cleanup: fix possible overflow errors in binary search, part 2René Scharfe1-1/+1
Calculating the sum of two array indexes to find the midpoint between them can overflow, i.e. code like this is unsafe for big arrays: mid = (first + last) >> 1; Make sure the intermediate value stays within the boundaries instead, like this: mid = first + ((last - first) >> 1); The loop condition of the binary search makes sure that 'last' is always greater than 'first', so this is safe as long as 'first' is not negative. And that can be verified easily using the pre-context of each change, except for name-hash.c, so add an assertion to that effect there. The unsafe calculations were found with: git grep '(.*+.*) *>> *1' This is a continuation of 19716b21a4 (cleanup: fix possible overflow errors in binary search, 2017-10-08). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28mark_fsmonitor_valid(): mark the index as changed if neededJohannes Schindelin1-2/+2
Without this bug fix, t7519's four "status doesn't detect unreported modifications" test cases would fail occasionally (and, oddly enough, *a lot* more frequently on Windows). The reason is that these test cases intentionally use the side effect of `git status` to re-write the index if any updates were detected: they first clean the worktree, run `git status` to update the index as well as show the output to the casual reader, then make the worktree dirty again and expect no changes to reported if running with a mocked fsmonitor hook. The problem with this strategy was that the index was written during said `git status` on the clean worktree for the *wrong* reason: not because the index was marked as changed (it wasn't), but because the recorded mtimes were racy with the index' own mtime. As the mtime granularity on Windows is 100 nanoseconds (see e.g. https://docs.microsoft.com/en-us/windows/desktop/SysInfo/file-times), the mtimes of the files are often enough *not* racy with the index', so that that `git status` call currently does not always update the index (including the fsmonitor extension), causing the test case to fail. The obvious fix: if we change *any* index entry's `CE_FSMONITOR_VALID` flag, we should also mark the index as changed. That will cause the index to be written upon `git status`, *including* an updated fsmonitor extension. Side note: Even though the reader might think that the t7519 issue should be *much* more prevalent on Linux, given that the ext4 filesystem (that seems to be used by every Linux distribution) stores mtimes in nanosecond precision. However, ext4 uses `current_kernel_time()` (see https://unix.stackexchange.com/questions/11599#comment762968_11599; it is *amazingly* hard to find any proper source of information about such ext4 questions) whose accuracy seems to depend on many factors but is safely worse than the 100-nanosecond granularity of NTFS (again, it is *horribly* hard to find anything remotely authoritative about this question). So it seems that the racy index condition that hid the bug fixed by this patch simply is a lot more likely on Linux than on Windows. But not impossible ;-) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28fill_stat_cache_info(): prepare for an fsmonitor fixJohannes Schindelin1-3/+3
We will need to pass down the `struct index_state` to `mark_fsmonitor_valid()` for an upcoming bug fix, and this here function calls that there function, so we need to extend the signature of `fill_stat_cache_info()` first. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-19Merge branch 'js/fsmonitor-refresh-after-discarding-index'Junio C Hamano1-0/+1
The fsmonitor interface got out of sync after the in-core index file gets discarded, which has been corrected. * js/fsmonitor-refresh-after-discarding-index: fsmonitor: force a refresh after the index was discarded fsmonitor: demonstrate that it is not refreshed after discard_index()
2019-05-13Merge branch 'jh/trace2'Junio C Hamano1-1/+1
A few embarrassing bugfixes. * jh/trace2: trace2: fix up a missing "leave" entry point trace2: fix incorrect function pointer check
2019-05-13read-cache: drop unused parameter from threaded loadJeff King1-2/+2
The load_cache_entries_threaded() function takes a src_offset parameter that it doesn't use. This has been there since its inception in 77ff1127a4 (read-cache: load cache entries on worker threads, 2018-10-10). Digging on the mailing list, that parameter was part of an earlier iteration of the series[1], but became unnecessary when the code switched to using the IEOT extension. [1] https://public-inbox.org/git/20180906210227.54368-5-benpeart@microsoft.com/ Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-10trace2: fix up a missing "leave" entry pointÆvar Arnfjörð Bjarmason1-1/+1
Fix a trivial bug that's been here since the shared/do_write_index tracing was added in 42fee7a388 ("trace2:data: add trace2 instrumentation to index read/write", 2019-02-22). We should have enter/leave points, not two enter/enter points. This resulted in an "enter" event without a corresponding "leave" event. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-09Merge branch 'km/empty-repo-is-still-a-repo'Junio C Hamano1-0/+3
Running "git add" on a repository created inside the current repository is an explicit indication that the user wants to add it as a submodule, but when the HEAD of the inner repository is on an unborn branch, it cannot be added as a submodule. Worse, the files in its working tree can be added as if they are a part of the outer repository, which is not what the user wants. These problems are being addressed. * km/empty-repo-is-still-a-repo: add: error appropriately on repository with no commits dir: do not traverse repositories with no commits submodule: refuse to add repository with no commits
2019-05-08fsmonitor: force a refresh after the index was discardedJohannes Schindelin1-0/+1
With this change, the `index_state` struct becomes the new home for the flag that says whether the fsmonitor hook has been run, i.e. it is now per-index. It also gets re-set when the index is discarded, fixing the bug demonstrated by the "test_expect_failure" test added in the preceding commit. In that case fsmonitor-enabled Git would miss updates under certain circumstances, see that preceding commit for details. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25Merge branch 'bc/hash-transition-16'Junio C Hamano1-45/+29
Conversion from unsigned char[20] to struct object_id continues. * bc/hash-transition-16: (35 commits) gitweb: make hash size independent Git.pm: make hash size independent read-cache: read data in a hash-independent way dir: make untracked cache extension hash size independent builtin/difftool: use parse_oid_hex refspec: make hash size independent archive: convert struct archiver_args to object_id builtin/get-tar-commit-id: make hash size independent get-tar-commit-id: parse comment record hash: add a function to lookup hash algorithm by length remote-curl: make hash size independent http: replace sha1_to_hex http: compute hash of downloaded objects using the_hash_algo http: replace hard-coded constant with the_hash_algo http-walker: replace sha1_to_hex http-push: remove remaining uses of sha1_to_hex http-backend: allow 64-character hex names http-push: convert to use the_hash_algo builtin/pull: make hash-size independent builtin/am: make hash size independent ...
2019-04-25Merge branch 'bp/post-index-change-hook'Junio C Hamano1-2/+12
A new hook "post-index-change" is called when the on-disk index file changes, which can help e.g. a virtualized working tree implementation. * bp/post-index-change-hook: read-cache: add post-index-change hook
2019-04-10add: error appropriately on repository with no commitsKyle Meyer1-0/+3
The previous commit made 'git add' abort when given a repository that doesn't have a commit checked out. However, the output upon failure isn't appropriate: % git add repo warning: adding embedded git repository: repo hint: You've added another git repository inside your current repository. hint: [...] error: unable to index file 'repo/' fatal: adding files failed The hint doesn't apply in this case, and the error message doesn't tell the user why 'repo' couldn't be added to the index. Provide better output by teaching add_to_index() to error when given a git directory where HEAD can't be resolved. To avoid the embedded repository warning and hint, call check_embedded_repo() only after add_file_to_index() succeeds because, in general, its output doesn't make sense if adding to the index fails. Signed-off-by: Kyle Meyer <kyle@kyleam.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01read-cache: read data in a hash-independent waybrian m. carlson1-45/+29
Index entries are structured with a variety of fields up front, followed by a hash and one or two flags fields. Because the hash field is stored in the middle of the structure, it's difficult to use one fixed-size structure that easily allows access to the hash and flags fields. Adjust the structure to hold the maximum amount of data that may be needed using a member called "data" and read and write this field independently in the various places that need to read and write the structure. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-07Merge branch 'jh/trace2'Junio C Hamano1-1/+50
A more structured way to obtain execution trace has been added. * jh/trace2: trace2: add for_each macros to clang-format trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh trace2:data: add subverb for rebase trace2:data: add subverb to reset command trace2:data: add subverb to checkout command trace2:data: pack-objects: add trace2 regions trace2:data: add trace2 instrumentation to index read/write trace2:data: add trace2 hook classification trace2:data: add trace2 transport child classification trace2:data: add trace2 sub-process classification trace2:data: add editor/pager child classification trace2:data: add trace2 regions to wt-status trace2: collect Windows-specific process information trace2: create new combined trace facility trace2: Documentation/technical/api-trace2.txt
2019-03-07Merge branch 'nd/split-index-null-base-fix'Junio C Hamano1-2/+3
Split-index fix. * nd/split-index-null-base-fix: read-cache.c: fix writing "link" index ext with null base oid
2019-03-07Merge branch 'tg/checkout-no-overlay'Junio C Hamano1-1/+7
"git checkout --no-overlay" can be used to trigger a new mode of checking out paths out of the tree-ish, that allows paths that match the pathspec that are in the current index and working tree and are not in the tree-ish. * tg/checkout-no-overlay: revert "checkout: introduce checkout.overlayMode config" checkout: introduce checkout.overlayMode config checkout: introduce --{,no-}overlay option checkout: factor out mark_cache_entry_for_checkout function checkout: clarify comment read-cache: add invalidate parameter to remove_marked_cache_entries entry: support CE_WT_REMOVE flag in checkout_entry entry: factor out unlink_entry function move worktree tests to t24*
2019-02-22trace2:data: add trace2 instrumentation to index read/writeJeff Hostetler1-1/+50
Add trace2 events to measure reading and writing the index. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-15read-cache: add post-index-change hookBen Peart1-2/+12
Add a post-index-change hook that is invoked after the index is written in do_write_locked_index(). This hook is meant primarily for notification, and cannot affect the outcome of git commands that trigger the index write. The hook is passed a flag to indicate whether the working directory was updated or not and a flag indicating if a skip-worktree bit could have changed. These flags enable the hook to optimize its response to the index change notification. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-13read-cache.c: fix writing "link" index ext with null base oidNguyễn Thái Ngọc Duy1-2/+3
Since commit 7db118303a (unpack_trees: fix breakage when o->src_index != o->dst_index - 2018-04-23) and changes in merge code to use separate index_state for source and destination, when doing a merge with split index activated, we may run into this line in unpack_trees(): o->result.split_index = init_split_index(&o->result); This is by itself not wrong. But this split index information is not fully populated (and it's only so when move_cache_to_base_index() is called, aka force splitting the index, or loading index_state from a file). Both "base_oid" and "base" in this case remain null. So when writing the main index down, we link to this index with null oid (default value after init_split_index()), which also means "no split index" internally. This triggers an incorrect base index refresh: warning: could not freshen shared index '.../sharedindex.0{40}' This patch makes sure we will not refresh null base_oid (because the file is never there). It also makes sure not to write "link" extension with null base_oid in the first place (no point having it at all). Read code already has protection against null base_oid. There is also another side fix in remove_split_index() that causes a crash when doing "git update-index --no-split-index" when base_oid in the index file is null. In this case we will not load istate->split_index->base but we dereference it anyway and are rewarded with a segfault. This should not happen anymore, but it's still wrong to dereference a potential NULL pointer, especially when we do check for NULL pointer in the next code. Reported-by: Luke Diamand <luke@diamand.org> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06Merge branch 'jk/add-ignore-errors-bit-assignment-fix'Junio C Hamano1-1/+1
A hotfix to an incomplete fix made earlier. * jk/add-ignore-errors-bit-assignment-fix: add_to_index(): convert forgotten HASH_RENORMALIZE check
2019-02-06Merge branch 'nd/the-index-final'Junio C Hamano1-27/+17
The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument
2019-02-06add_to_index(): convert forgotten HASH_RENORMALIZE checkJeff King1-1/+1
Commit 9e5da3d055 (add: use separate ADD_CACHE_RENORMALIZE flag, 2019-01-17) switched out using HASH_RENORMALIZE in our flags field for a new ADD_CACHE_RENORMALIZE flag. However, it forgot to convert one of the checks for HASH_RENORMALIZE into the new flag, which totally broke "git add --renormalize". To make matters even more confusing, the resulting code would racily pass the tests! The forgotten check was responsible for defeating the up-to-date check of the index entry. That meant that "git add --renormalize" would refuse to renormalize things that appeared stat-clean. But most of the time the test commands run fast enough that the file mtime is the same as the index mtime. And thus we err on the conservative side and re-hash the file, which is what "--renormalize" would have wanted. But if you're unlucky and cross that one-second boundary between writing the file and writing the index (which is more likely to happen on a slow or heavily-loaded system), then the file appears stat-clean. And "--renormalize" would effectively do nothing. The fix is straightforward: convert this check to use the right flag. Noticed-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-05Merge branch 'jk/add-ignore-errors-bit-assignment-fix'Junio C Hamano1-4/+4
"git add --ignore-errors" did not work as advertised and instead worked as an unintended synonym for "git add --renormalize", which has been fixed. * jk/add-ignore-errors-bit-assignment-fix: add: use separate ADD_CACHE_RENORMALIZE flag
2019-01-24cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switchNguyễn Thái Ngọc Duy1-2/+0
By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17add: use separate ADD_CACHE_RENORMALIZE flagJeff King1-4/+4
Commit 9472935d81 (add: introduce "--renormalize", 2017-11-16) taught git-add to pass HASH_RENORMALIZE to add_to_index(), which then passes the flag along to index_path(). However, the flags taken by add_to_index() and the ones taken by index_path() are distinct namespaces. We cannot take HASH_* flags in add_to_index(), because they overlap with the ADD_CACHE_* flags we already take (in this case, HASH_RENORMALIZE conflicts with ADD_CACHE_IGNORE_ERRORS). We can solve this by adding a new ADD_CACHE_RENORMALIZE flag, and using it to set HASH_RENORMALIZE within add_to_index(). In order to make it clear that these two flags come from distinct sets, let's also change the name "newflags" in the function to "hash_flags". Reported-by: Dmitriy Smirnov <dmitriy.smirnov@jetbrains.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14Merge branch 'nd/indentation-fix'Junio C Hamano1-59/+59
Code cleanup. * nd/indentation-fix: Indent code with TABs
2019-01-14read-cache.c: remove the_* from index_has_changes()Nguyễn Thái Ngọc Duy1-7/+5
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14read-cache.c: replace update_index_if_able with repo_&Nguyễn Thái Ngọc Duy1-6/+8
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14read-cache.c: kill read_index()Nguyễn Thái Ngọc Duy1-7/+4
read_index() shares the same problem as hold_locked_index(): it assumes $GIT_DIR/index. Move all call sites to repo_read_index() instead. read_index_preload() and read_index_unmerged() are also killed as a consequence. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14repository.c: replace hold_locked_index() with repo_hold_locked_index()Nguyễn Thái Ngọc Duy1-5/+0
hold_locked_index() assumes the index path at $GIT_DIR/index. This is not good for places that take an arbitrary index_state instead of the_index, which is basically everywhere except builtin/. Replace it with repo_hold_locked_index(). hold_locked_index() remains as a wrapper around repo_hold_locked_index() to reduce changes in builtin/ Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-04Merge branch 'nd/the-index'Junio C Hamano1-1/+1
More codepaths become aware of working with in-core repository instance other than the default "the_repository". * nd/the-index: (22 commits) rebase-interactive.c: remove the_repository references rerere.c: remove the_repository references pack-*.c: remove the_repository references pack-check.c: remove the_repository references notes-cache.c: remove the_repository references line-log.c: remove the_repository reference diff-lib.c: remove the_repository references delta-islands.c: remove the_repository references cache-tree.c: remove the_repository references bundle.c: remove the_repository references branch.c: remove the_repository reference bisect.c: remove the_repository reference blame.c: remove implicit dependency the_repository sequencer.c: remove implicit dependency on the_repository sequencer.c: remove implicit dependency on the_index transport.c: remove implicit dependency on the_index notes-merge.c: remove implicit dependency the_repository notes-merge.c: remove implicit dependency on the_index list-objects.c: reduce the_repository references list-objects-filter.c: remove implicit dependency on the_index ...
2019-01-04Merge branch 'nd/i18n'Junio C Hamano1-36/+37
More _("i18n") markings. * nd/i18n: fsck: mark strings for translation fsck: reduce word legos to help i18n parse-options.c: mark more strings for translation parse-options.c: turn some die() to BUG() parse-options: replace opterror() with optname() repack: mark more strings for translation remote.c: mark messages for translation remote.c: turn some error() or die() to BUG() reflog: mark strings for translation read-cache.c: add missing colon separators read-cache.c: mark more strings for translation read-cache.c: turn die("internal error") to BUG() attr.c: mark more string for translation archive.c: mark more strings for translation alias.c: mark split_cmdline_strerror() strings for translation git.c: mark more strings for translation
2019-01-02read-cache: add invalidate parameter to remove_marked_cache_entriesThomas Gummerer1-1/+7
When marking cache entries for removal, and later removing them all at once using 'remove_marked_cache_entries()', cache entries currently have to be invalidated manually in the cache tree and in the untracked cache. Add an invalidate flag to the function. With the flag set, the function will take care of invalidating the path in the cache tree and in the untracked cache. Note that the current callsites already do the invalidation properly in other places, so we're just passing 0 from there to keep the status quo. This will be useful in a subsequent commit. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-09Indent code with TABsNguyễn Thái Ngọc Duy1-59/+59
We indent with TABs and sometimes for fine alignment, TABs followed by spaces, but never all spaces (unless the indentation is less than 8 columns). Indenting with spaces slips through in some places. Fix them. Imported code and compat/ are left alone on purpose. The former should remain as close as upstream as possible. The latter pretty much has separate maintainers, it's up to them to decide. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21Merge branch 'cc/shared-index-permbits'Junio C Hamano1-1/+2
The way .git/index and .git/sharedindex* files were initially created gave these files different perm bits until they were adjusted for shared repository settings. This was made consistent. * cc/shared-index-permbits: read-cache: make the split index obey umask settings
2018-11-21index: make index.threads=true enable ieot and eoieJonathan Nieder1-6/+17
If a user explicitly sets [index] threads = true to read the index using multiple threads, ensure that index writes include the offset table by default to make that possible. This ensures that the user's intent of turning on threading is respected. In other words, permit the following configurations: - index.threads and index.recordOffsetTable unspecified: do not write the offset table yet (to avoid alarming the user with "ignoring IEOT extension" messages when an older version of Git accesses the repository) but do make use of multiple threads to read the index if the supporting offset table is present. This can also be requested explicitly by setting index.threads=true, 0, or >1 and index.recordOffsetTable=false. - index.threads=false or 1: do not write the offset table, and do not make use of the offset table. One can set index.recordOffsetTable=false as well, to be more explicit. - index.threads=true, 0, or >1 and index.recordOffsetTable unspecified: write the offset table and make use of threads at read time. This can also be requested by setting index.threads=true, 0, >1, or unspecified and index.recordOffsetTable=true. Fortunately the complication is temporary: once most Git installations have upgraded to a version with support for the IEOT and EOIE extensions, we can flip the defaults for index.recordEndOfIndexEntries and index.recordOffsetTable to true and eliminate the settings. Helped-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21ieot: default to not writing IEOT sectionJonathan Nieder1-1/+10
As with EOIE, popular versions of Git do not support the new IEOT extension yet. When accessing a Git repository written by a more modern version of Git, they correctly ignore the unrecognized section, but in the process they loudly warn ignoring IEOT extension resulting in confusion for users. Introduce the index extension more gently by not writing it yet in this first version with support for it. Soon, once sufficiently many users are running a modern version of Git, we can flip the default so users benefit from this index extension by default. Introduce a '[index] recordOffsetTable' configuration variable to control whether the new index extension is written. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21eoie: default to not writing EOIE sectionJonathan Nieder1-1/+10
Since 3b1d9e04 (eoie: add End of Index Entry (EOIE) extension, 2018-10-10) Git defaults to writing the new EOIE section when writing out an index file. Usually that is a good thing because it improves threaded performance, but when a Git repository is shared with older versions of Git, it produces a confusing warning: $ git status ignoring EOIE extension HEAD detached at 371ed0defa nothing to commit, working tree clean Let's introduce the new index extension more gently. First we'll roll out the new version of Git that understands it, and then once sufficiently many users are using such a version, we can flip the default to writing it by default. Introduce a '[index] recordEndOfIndexEntries' configuration variable to allow interested users to benefit from this index extension early. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-19read-cache: make the split index obey umask settingsÆvar Arnfjörð Bjarmason1-1/+2
Make the split index write out its .git/sharedindex_* files with the same permissions as .git/index. This only changes the behavior when core.sharedRepository isn't set, i.e. the user's umask settings will be respected. This hasn't been the case ever since the split index was originally implemented in c18b80a0e8 ("update-index: new options to enable/disable split index mode", 2014-06-13). A mkstemp()-like function has always been used to create it. First mkstemp() itself, and then later our own mkstemp()-like in f6ecc62dbf ("write_shared_index(): use tempfile module", 2015-08-10) A related bug was fixed in df801f3f9f ("read-cache: use shared perms when writing shared index", 2017-06-25). Since then the split index has respected core.sharedRepository. However, using that setting should not be required simply to make git obey the user's umask setting. It's intended for the use-case of overriding whatever that umask is set to. This fixes cases where the user has e.g. set his umask to 022 on a shared server in anticipation of other user's needing to run "status", "log" etc. in his repository. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-18Merge branch 'nd/pthreads'Junio C Hamano1-27/+10
The codebase has been cleaned up to reduce "#ifndef NO_PTHREADS". * nd/pthreads: Clean up pthread_create() error handling read-cache.c: initialize copy_len to shut up gcc 8 read-cache.c: reduce branching based on HAVE_THREADS read-cache.c: remove #ifdef NO_PTHREADS pack-objects: remove #ifdef NO_PTHREADS preload-index.c: remove #ifdef NO_PTHREADS grep: clean up num_threads handling grep: remove #ifdef NO_PTHREADS attr.c: remove #ifdef NO_PTHREADS name-hash.c: remove #ifdef NO_PTHREADS index-pack: remove #ifdef NO_PTHREADS send-pack.c: move async's #ifdef NO_PTHREADS back to run-command.c run-command.h: include thread-utils.h instead of pthread.h thread-utils: macros to unconditionally compile pthreads API
2018-11-13Merge branch 'bp/refresh-index-using-preload'Junio C Hamano1-0/+6
The helper function to refresh the cached stat information in the in-core index has learned to perform the lstat() part of the operation in parallel on multi-core platforms. * bp/refresh-index-using-preload: refresh_index: remove unnecessary calls to preload_index() speed up refresh_index() by utilizing preload_index()
2018-11-12cache-tree.c: remove the_repository referencesNguyễn Thái Ngọc Duy1-1/+1
This case is more interesting than other boring "remove the_repo" commits because while we need access to the object database, we cannot simply use r->index because unpack-trees.c can operate on a temporary index, not $GIT_DIR/index. Ideally we should be able to pass an object database to lookup_tree() but that ship has sailed. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12read-cache.c: add missing colon separatorsNguyễn Thái Ngọc Duy1-5/+5
typechange_fmt and added_fmt should have a colon before "needs update". Align the statements to make it easier to read and see. Also drop the unnecessary (). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12read-cache.c: mark more strings for translationNguyễn Thái Ngọc Duy1-28/+29
There are a couple other improvements on these strings as well: - add missing colon (as separator) - quote paths - provide more information on error messages - keep first word in lowercase Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12read-cache.c: turn die("internal error") to BUG()Nguyễn Thái Ngọc Duy1-3/+3
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-06Merge branch 'nd/unpack-trees-with-cache-tree'Junio C Hamano1-1/+1
Trivial bugfix. * nd/unpack-trees-with-cache-tree: read-cache: use of memory after it is freed
2018-11-05read-cache.c: initialize copy_len to shut up gcc 8Nguyễn Thái Ngọc Duy1-3/+1
It was reported that when building with NO_PTHREADS=1, -Wmaybe-uninitialized is triggered. Just initialize the variable from the beginning to shut the compiler up (because this warning is enabled in config.dev) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05read-cache.c: reduce branching based on HAVE_THREADSNguyễn Thái Ngọc Duy1-10/+9
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05read-cache.c: remove #ifdef NO_PTHREADSNguyễn Thái Ngọc Duy1-24/+10
This is a faithful conversion with no attempt to clean up whatsoever. Code indentation is left broken. There will be another commit to clean it up and un-indent if we just indent now. It's just more code noise. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30speed up refresh_index() by utilizing preload_index()Ben Peart1-0/+6
Speed up refresh_index() by utilizing preload_index() to do most of the work spread across multiple threads. This works because most cache entries will get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when called from within refresh_index(). On a Windows repo with ~200K files, this drops refresh times from 6.64 seconds to 2.87 seconds for a savings of 57%. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-26Merge branch 'sg/split-index-racefix'Junio C Hamano1-1/+1
The codepath to support the experimental split-index mode had remaining "racily clean" issues fixed. * sg/split-index-racefix: split-index: BUG() when cache entry refers to non-existing shared entry split-index: smudge and add racily clean cache entries to split index split-index: don't compare cached data of entries already marked for split index split-index: count the number of deleted entries t1700-split-index: date back files to avoid racy situations split-index: add tests to demonstrate the racy split index problem t1700-split-index: document why FSMONITOR is disabled in this test script
2018-10-22read-cache: use of memory after it is freedCarlo Marcelo Arenas Belón1-1/+1
introduced with c46c406ae1e (trace.h: support nested performance tracing) on Aug 18, 2018 but not affecting maint Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19Merge branch 'nd/status-refresh-progress'Junio C Hamano1-0/+12
"git status" learns to show progress bar when refreshing the index takes a long time. * nd/status-refresh-progress: status: show progress bar if refreshing the index takes too long
2018-10-19Merge branch 'bp/read-cache-parallel'Junio C Hamano1-116/+658
A new extension to the index file has been introduced, which allows the file to be read in parallel. * bp/read-cache-parallel: read-cache: load cache entries on worker threads ieot: add Index Entry Offset Table (IEOT) extension read-cache: load cache extensions on a worker thread config: add new index.threads config setting eoie: add End of Index Entry (EOIE) extension read-cache: clean up casting and byte decoding read-cache.c: optimize reading index format v4
2018-10-19Merge branch 'nd/the-index'Junio C Hamano1-14/+19
Various codepaths in the core-ish part learn to work on an arbitrary in-core index structure, not necessarily the default instance "the_index". * nd/the-index: (23 commits) revision.c: reduce implicit dependency the_repository revision.c: remove implicit dependency on the_index ws.c: remove implicit dependency on the_index tree-diff.c: remove implicit dependency on the_index submodule.c: remove implicit dependency on the_index line-range.c: remove implicit dependency on the_index userdiff.c: remove implicit dependency on the_index rerere.c: remove implicit dependency on the_index sha1-file.c: remove implicit dependency on the_index patch-ids.c: remove implicit dependency on the_index merge.c: remove implicit dependency on the_index merge-blobs.c: remove implicit dependency on the_index ll-merge.c: remove implicit dependency on the_index diff-lib.c: remove implicit dependency on the_index read-cache.c: remove implicit dependency on the_index diff.c: remove implicit dependency on the_index grep.c: remove implicit dependency on the_index diff.c: remove the_index dependency in textconv() functions blame.c: rename "repo" argument to "r" combine-diff.c: remove implicit dependency on the_index ...
2018-10-12split-index: smudge and add racily clean cache entries to split indexSZEDER Gábor1-1/+1
Ever since the split index feature was introduced [1], refreshing a split index is prone to a variant of the classic racy git problem. Consider the following sequence of commands updating the split index when the shared index contains a racily clean cache entry, i.e. an entry whose cached stat data matches with the corresponding file in the worktree and the cached mtime matches that of the index: echo "cached content" >file git update-index --split-index --add file echo "dirty worktree" >file # size stays the same! # ... wait ... git update-index --add other-file Normally, when a non-split index is updated, then do_write_index() (the function responsible for writing all kinds of indexes, "regular", split, and shared) recognizes racily clean cache entries, and writes them with smudged stat data, i.e. with file size set to 0. When subsequent git commands read the index, they will notice that the smudged stat data doesn't match with the file in the worktree, and then go on to check the file's content and notice its dirtiness. In the above example, however, in the second 'git update-index' prepare_to_write_split_index() decides which cache entries stored only in the shared index should be replaced in the new split index. Alas, this function never looks out for racily clean cache entries, and since the file's stat data in the worktree hasn't changed since the shared index was written, it won't be replaced in the new split index. Consequently, do_write_index() doesn't even get this racily clean cache entry, and can't smudge its stat data. Subsequent git commands will then see that the index has more recent mtime than the file and that the (not smudged) cached stat data still matches with the file in the worktree, and, ultimately, will erroneously consider the file clean. Modify prepare_to_write_split_index() to recognize racily clean cache entries, and mark them to be added to the split index. Note that there are two places where it should check raciness: first those cache entries that are only stored in the shared index, and then those that have been copied by unpack_trees() from the shared index while it constructed a new index. This way do_write_index() will get these racily clean cache entries as well, and will then write them with smudged stat data to the new split index. This change makes all tests in 't1701-racy-split-index.sh' pass, so flip the two 'test_expect_failure' tests to success. Also add the '#' (as in nr. of trial) to those tests' description that were omitted when the tests expected failure. Note that after this change if the index is split when it contains a racily clean cache entry, then a smudged cache entry will be written both to the new shared and to the new split indexes. This doesn't affect regular git commands: as far as they are concerned this is just an entry in the split index replacing an outdated entry in the shared index. It did affect a few tests in 't1700-split-index.sh', though, because they actually check which entries are stored in the split index; a previous patch in this series has already made the necessary adjustments in 't1700'. And racily clean cache entries and index splitting are rare enough to not worry about the resulting duplicated smudged cache entries, and the additional complexity required to prevent them is not worth it. Several tests failed occasionally when the test suite was run with 'GIT_TEST_SPLIT_INDEX=yes'. Here are those that I managed to trace back to this racy split index problem, starting with those failing more frequently, with a link to a failing Travis CI build job for each. The highlighted line [2] shows when the racy file was written, which is not always in the failing test but in a preceeding setup test. t3903-stash.sh: https://travis-ci.org/git/git/jobs/385542084#L5858 t4024-diff-optimize-common.sh: https://travis-ci.org/git/git/jobs/386531969#L3174 t4015-diff-whitespace.sh: https://travis-ci.org/git/git/jobs/360797600#L8215 t2200-add-update.sh: https://travis-ci.org/git/git/jobs/382543426#L3051 t0090-cache-tree.sh: https://travis-ci.org/git/git/jobs/416583010#L3679 There might be others, e.g. perhaps 't1000-read-tree-m-3way.sh' and others using 'lib-read-tree-m-3way.sh', but I couldn't confirm yet. [1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge branch 'nd/split-index', 2014-07-16). [2] Note that those highlighted lines are in the 'after failure' fold, and your browser might unhelpfully fold it up before you could take a good look. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11read-cache: load cache entries on worker threadsBen Peart1-37/+193
This patch helps address the CPU cost of loading the index by utilizing the Index Entry Offset Table (IEOT) to divide loading and conversion of the cache entries across multiple threads in parallel. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 files reduced the time by 32.24% Test w/1,000,000 files reduced the time by -4.77% Note that on the 1,000,000 files case, multi-threading the cache entry parsing does not yield a performance win. This is because the cost to parse the index extensions in this repo, far outweigh the cost of loading the cache entries. The high cost of parsing the index extensions is driven by the cache tree and the untracked cache extensions. As this is currently the longest pole, any reduction in this time will reduce the overall index load times so is worth further investigation in another patch series. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11ieot: add Index Entry Offset Table (IEOT) extensionBen Peart1-3/+193
This patch enables addressing the CPU cost of loading the index by adding additional data to the index that will allow us to efficiently multi- thread the loading and conversion of cache entries. It accomplishes this by adding an (optional) index extension that is a table of offsets to blocks of cache entries in the index file. To make this work for V4 indexes, when writing the cache entries, it periodically "resets" the prefix-compression by encoding the current entry as if the path name for the previous entry is completely different and saves the offset of that entry in the IEOT. Basically, with V4 indexes, it generates offsets into blocks of prefix-compressed entries. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11read-cache: load cache extensions on a worker threadBen Peart1-16/+79
This patch helps address the CPU cost of loading the index by loading the cache extensions on a worker thread in parallel with loading the cache entries. In some cases, loading the extensions takes longer than loading the cache entries so this patch utilizes the new EOIE to start the thread to load the extensions before loading all the cache entries in parallel. This is possible because the current extensions don't access the cache entries in the index_state structure so are OK that they don't all exist yet. The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED extensions don't even get a pointer to the index so don't have access to the cache entries. CACHE_EXT_LINK only uses the index_state to initialize the split index. CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last update and dirty flags. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 files reduced the time by 0.53% Test w/1,000,000 files reduced the time by 27.78% Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11eoie: add End of Index Entry (EOIE) extensionBen Peart1-8/+150
The End of Index Entry (EOIE) is used to locate the end of the variable length index entries and the beginning of the extensions. Code can take advantage of this to quickly locate the index extensions without having to parse through all of the index entries. The EOIE extension is always written out to the index file including to the shared index when using the split index feature. Because it is always written out, the SHA checksums in t/t1700-split-index.sh were updated to reflect its inclusion. It is written as an optional extension to ensure compatibility with other git implementations that do not yet support it. It is always written out to ensure it is available as often as possible to speed up index operations. Because it must be able to be loaded before the variable length cache entries and other index extensions, this extension must be written last. The signature for this extension is { 'E', 'O', 'I', 'E' }. The extension consists of: - 32-bit offset to the end of the index entries - 160-bit SHA-1 over the extension types and their sizes (but not their contents). E.g. if we have "TREE" extension that is N-bytes long, "REUC" extension that is M-bytes long, followed by "EOIE", then the hash would be: SHA-1("TREE" + <binary representation of N> + "REUC" + <binary representation of M>) Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11read-cache: clean up casting and byte decodingBen Peart1-12/+11
This patch does a clean up pass to minimize the casting required to work with the memory mapped index (mmap). It also makes the decoding of network byte order more consistent by using get_be32() where possible. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-26read-cache.c: optimize reading index format v4Nguyễn Thái Ngọc Duy1-68/+60
Index format v4 requires some more computation to assemble a path based on a previous one. The current code is not very efficient because - it doubles memory copy, we assemble the final path in a temporary first before putting it back to a cache_entry - strbuf_remove() in expand_name_field() is not exactly a good fit for stripping a part at the end, _setlen() would do the same job and is much cheaper. - the open-coded loop to find the end of the string in expand_name_field() can't beat an optimized strlen() This patch avoids the temporary buffer and writes directly to the new cache_entry, which addresses the first two points. The last point could also be avoided if the total string length fits in the first 12 bits of ce_flags, if not we fall back to strlen(). Running "test-tool read-cache 100" on webkit.git (275k files), reading v2 only takes 4.226 seconds, while v4 takes 5.711 seconds, 35% more time. The patch reduces read time on v4 to 4.319 seconds. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21sha1-file.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-10/+15
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21read-cache.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21diff.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-1/+1
A new variant repo_diff_setup() is added that takes 'struct repository *' and diff_setup() becomes a thin macro around it that is protected by NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to NO_THE_INDEX_.... The plan is these macros will always be defined for all library files and the macros are only accessible in builtin/ Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21read-cache.c: remove 'const' from index_has_changes()Nguyễn Thái Ngọc Duy1-1/+1
This function calls do_diff_cache() which eventually needs to set this "istate" to unpack_options->src_index [1]. This is an unfortunate fact that unpack_trees() _will_ update [2] src_index so we can't really pass a const index_state there. Just remove 'const'. [1] Right now diff_cache() in diff-lib.c assigns the_index to src_index. But the plan is to get rid of the_index, so it should be 'istate' from here that gets assigned to src_index. [2] Some transient bits in the source index are touched. Optional extensions can also be removed. But other than that the source tree should still be valid. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'jk/cocci'Junio C Hamano1-6/+6
spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion
2018-09-17Merge branch 'nd/unpack-trees-with-cache-tree'Junio C Hamano1-5/+11
The unpack_trees() API used in checking out a branch and merging walks one or more trees along with the index. When the cache-tree in the index tells us that we are walking a tree whose flattened contents is known (i.e. matches a span in the index), as linearly scanning a span in the index is much more efficient than having to open tree objects recursively and listing their entries, the walk can be optimized, which is done in this topic. * nd/unpack-trees-with-cache-tree: Document update for nd/unpack-trees-with-cache-tree cache-tree: verify valid cache-tree in the test suite unpack-trees: add missing cache invalidation unpack-trees: reuse (still valid) cache-tree from src_index unpack-trees: reduce malloc in cache-tree walk unpack-trees: optimize walking same trees with cache-tree unpack-trees: add performance tracing trace.h: support nested performance tracing
2018-09-17status: show progress bar if refreshing the index takes too longNguyễn Thái Ngọc Duy1-0/+12
Refreshing the index is usually very fast, but it can still take a long time sometimes. Cold cache is one. Or copying a repo to a new place (*). It's good to show something to let the user know "git status" is not hanging, it's just busy doing something. (*) In this case, all stat info in the index becomes invalid and git falls back to rehashing all file content to see if there's any difference between updating stat info in the index. This is quite expensive. Even with a repo as small as git.git, it takes 3 seconds. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29read-cache: use oideq() in ce_compare functionsJeff King1-2/+2
These functions return the full oidcmp() value, but the callers really only care whether it is non-zero. We can use the more strict !oideq(), which a compiler may be able to optimize further. This does change the meaning of the return value subtly, but it's unlikely that anybody would try to use them for ordering. They're static-local in this file, and they already return other error values that would confuse an ordering (e.g., open() failure gives -1). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "hashcmp() != 0" to "!hasheq()"Jeff King1-2/+2
This rounds out the previous three patches, covering the inequality logic for the "hash" variant of the functions. As with the previous three, the accompanying code changes are the mechanical result of applying the coccinelle patch; see those patches for more discussion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "oidcmp() != 0" to "!oideq()"Jeff King1-1/+1
This is the flip side of the previous two patches: checking for a non-zero oidcmp() can be more strictly expressed as inequality. Like those patches, we write "!= 0" in the coccinelle transformation, which covers by isomorphism the more common: if (oidcmp(E1, E2)) As with the previous two patches, this patch can be achieved almost entirely by running "make coccicheck"; the only differences are manual line-wrap fixes to match the original code. There is one thing to note for anybody replicating this, though: coccinelle 1.0.4 seems to miss the case in builtin/tag.c, even though it's basically the same as all the others. Running with 1.0.7 does catch this, so presumably it's just a coccinelle bug that was fixed in the interim. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "oidcmp() == 0" to oideq()Jeff King1-1/+1
Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20Merge branch 'nd/no-the-index'Junio C Hamano1-1/+1
The more library-ish parts of the codebase learned to work on the in-core index-state instance that is passed in by their callers, instead of always working on the singleton "the_index" instance. * nd/no-the-index: (24 commits) blame.c: remove implicit dependency on the_index apply.c: remove implicit dependency on the_index apply.c: make init_apply_state() take a struct repository apply.c: pass struct apply_state to more functions resolve-undo.c: use the right index instead of the_index archive-*.c: use the right repository archive.c: avoid access to the_index grep: use the right index instead of the_index attr: remove index from git_attr_set_direction() entry.c: use the right index instead of the_index submodule.c: use the right index instead of the_index pathspec.c: use the right index instead of the_index unpack-trees: avoid the_index in verify_absent() unpack-trees: convert clear_ce_flags* to avoid the_index unpack-trees: don't shadow global var the_index unpack-trees: add a note about path invalidation unpack-trees: remove 'extern' on function declaration ls-files: correct index argument to get_convert_attr_ascii() preload-index.c: use the right index instead of the_index dir.c: remove an implicit dependency on the_index in pathspec code ...
2018-08-18cache-tree: verify valid cache-tree in the test suiteNguyễn Thái Ngọc Duy1-0/+3
This makes sure that cache-tree is consistent with the index. The main purpose is to catch potential problems by saving the index in unpack_trees() but the line in write_index() would also help spot missing invalidation in other code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18unpack-trees: reuse (still valid) cache-tree from src_indexNguyễn Thái Ngọc Duy1-0/+2
We do n-way merge by walking the source index and n trees at the same time and add merge results to a new temporary index called o->result. The merge result for any given path could be either - keep_entry(): same old index entry in o->src_index is reused - merged_entry(): either a new entry is added, or an existing one updated - deleted_entry(): one entry from o->src_index is removed For some reason [1] we keep making sure that the source index's cache-tree is still valid if used by o->result: for all those merged/deleted entries, we invalidate the same path in o->src_index, so only cache-trees covering the "keep_entry" parts remain good. Because of this, the cache-tree from o->src_index can be perfectly reused in o->result. And in fact we already rely on this logic to reuse untracked cache in edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08). Move the cache-tree to o->result before doing cache_tree_update() to reduce hashing cost. Since cache_tree_update() has risen up as one of the most expensive parts in unpack_trees() after the last few patches. This does help reduce unpack_trees() time significantly (on webkit.git): before after -------------------------------------------------------------------- 0.080394752 0.051258167 s: read cache .git/index 0.216010838 0.212106298 s: preload index 0.008534301 0.280521764 s: refresh index 0.251992198 0.218160442 s: traverse_trees 0.377031383 0.374948191 s: check_updates 0.372768105 0.037040114 s: cache_tree_update 1.045887251 0.672031609 s: unpack_trees 0.314983512 0.317456290 s: write index, changed mask = 2e 0.062572653 0.038382654 s: traverse_trees 0.000022544 0.000042731 s: check_updates 0.073795585 0.050930053 s: unpack_trees 0.073807557 0.051099735 s: diff-index 1.938191592 1.614241153 s: git command: git checkout - [1] I'm pretty sure the reason is an oversight in 34110cd4e3 (Make 'unpack_trees()' have a separate source and destination index - 2008-03-06). That patch aims to _not_ update the source index at all. The invalidation should have been done on o->result in that patch. But then there was no cache-tree on o->result even then so it's pointless to do so. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18trace.h: support nested performance tracingNguyễn Thái Ngọc Duy1-5/+6
Performance measurements are listed right now as a flat list, which is fine when we measure big blocks. But when we start adding more and more measurements, some of them could be just part of a bigger measurement and a flat list gives a wrong impression that they are executed at the same level instead of nested. Add trace_performance_enter() and trace_performance_leave() to allow indent these nested measurements. For now it does not help much because the only nested thing is (lazy) name hash initialization (e.g. called in diff-index from "git status"). This will help more because I'm going to add some more tracing that's actually nested. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-17Merge branch 'en/abort-df-conflict-fixes'Junio C Hamano1-5/+8
"git merge --abort" etc. did not clean things up properly when there were conflicted entries in the index in certain order that are involved in D/F conflicts. This has been corrected. * en/abort-df-conflict-fixes: read-cache: fix directory/file conflict handling in read_index_unmerged() t1015: demonstrate directory/file conflict recovery failures
2018-08-13dir.c: remove an implicit dependency on the_index in pathspec codeNguyễn Thái Ngọc Duy1-1/+1
Make the match_patchspec API and friends take an index_state instead of assuming the_index in dir.c. All external call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02Merge branch 'en/dirty-merge-fixes'Junio C Hamano1-0/+40
The recursive merge strategy did not properly ensure there was no change between HEAD and the index before performing its operation, which has been corrected. * en/dirty-merge-fixes: merge: fix misleading pre-merge check documentation merge-recursive: enforce rule that index matches head before merging t6044: add more testcases with staged changes before a merge is invoked merge-recursive: fix assumption that head tree being merged is HEAD merge-recursive: make sure when we say we abort that we actually abort t6044: add a testcase for index matching head, when head doesn't match HEAD t6044: verify that merges expected to abort actually abort index_has_changes(): avoid assuming operating on the_index read-cache.c: move index_has_changes() from merge.c
2018-08-02Merge branch 'jm/cache-entry-from-mem-pool'Junio C Hamano1-46/+218
For a large tree, the index needs to hold many cache entries allocated on heap. These cache entries are now allocated out of a dedicated memory pool to amortize malloc(3) overhead. * jm/cache-entry-from-mem-pool: block alloc: add validations around cache_entry lifecyle block alloc: allocate cache entries from mem_pool mem-pool: fill out functionality mem-pool: add life cycle management functions mem-pool: only search head block for available space block alloc: add lifecycle APIs for cache_entry structs read-cache: teach make_cache_entry to take object_id read-cache: teach refresh_cache_entry to take istate
2018-07-31read-cache: fix directory/file conflict handling in read_index_unmerged()Elijah Newren1-5/+8
read_index_unmerged() has two intended purposes: * return 1 if there are any unmerged entries, 0 otherwise * drops any higher-stage entries down to stage #0 There are several callers of read_index_unmerged() that check the return value to see if it is non-zero, all of which then die() if that condition is met. For these callers, dropping higher-stage entries down to stage #0 is a waste of resources, and returning immediately on first unmerged entry would be better. But it's probably only a very minor difference and isn't the focus of this series. The remaining callers ignore the return value and call this function for the side effect of dropping higher-stage entries down to stage #0. As mentioned in commit e11d7b596970 ("'reset --merge': fix unmerged case", 2009-12-31), The _only_ reason we want to keep a previously unmerged entry in the index at stage #0 is so that we don't forget the fact that we have corresponding file in the work tree in order to be able to remove it when the tree we are resetting to does not have the path. In fact, prior to commit d1a43f2aa4bf ("reset --hard/read-tree --reset -u: remove unmerged new paths", 2008-10-15), read_index_unmerged() did just remove unmerged entries from the cache immediately but that had the unwanted effect of leaving around new untracked files in the tree from aborted merges. So, that's the intended purpose of this function. The problem is that when directory/files conflicts are present, trying to add the file to the index at stage 0 fails (because there is still a directory in the way), and the function returns early with a -1 return code to signify the error. As noted above, none of the callers who want the drop-to-stage-0 behavior check the return status, though, so this means all remaining unmerged entries remain in the index and the callers proceed assuming otherwise. Users then see errors of the form: error: 'DIR-OR-FILE' appears as both a file and as a directory error: DIR-OR-FILE: cannot drop to stage #0 and potentially also messages about other unmerged entries which came lexicographically later than whatever pathname was both a file and a directory. Google finds a few hits searching for those messages, suggesting there were probably a couple people who hit this besides me. Luckily, calling `git reset --hard` multiple times would workaround this bug. Since the whole purpose here is to just put the entry *temporarily* into the index so that any associated file in the working copy can be removed, we can just skip the DFCHECK and allow both the file and directory to appear in the index. The temporary simultaneous appearance of the directory and file entries in the index will be removed by the callers by calling unpack_trees(), which excludes these unmerged entries marked with CE_CONFLICTED flag from the resulting index, before they attempt to write the index anywhere. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-18Merge branch 'sb/object-store-grafts'Junio C Hamano1-0/+1
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-07-11merge-recursive: fix assumption that head tree being merged is HEADElijah Newren1-4/+8
`git merge-recursive` does a three-way merge between user-specified trees base, head, and remote. Since the user is allowed to specify head, we can not necesarily assume that head == HEAD. Modify index_has_changes() to take an extra argument specifying the tree to compare against. If NULL, it will compare to HEAD. We then use this from merge-recursive to make sure we compare to the user-specified head. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03index_has_changes(): avoid assuming operating on the_indexElijah Newren1-4/+7
Modify index_has_changes() to take a struct istate* instead of just operating on the_index. This is only a partial conversion, though, because we call do_diff_cache() which implicitly assumes work is to be done on the_index. Ongoing work is being done elsewhere to do the remainder of the conversion, and thus is not duplicated here. Instead, a simple check is put in place until that work is complete. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03read-cache.c: move index_has_changes() from merge.cElijah Newren1-0/+33
Since index_has_change() is an index-related function, move it to read-cache.c, only modifying it to avoid uses of the active_cache and active_nr macros. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03block alloc: add validations around cache_entry lifecyleJameson Miller1-2/+53
Add an option (controlled by an environment variable) perform extra validations on mem_pool allocated cache entries. When set: 1) Invalidate cache_entry memory when discarding cache_entry. 2) When discarding index_state struct, verify that all cache_entries were allocated from expected mem_pool. 3) When discarding mem_pools, invalidate mem_pool memory. This should provide extra checks that mem_pools and their allocated cache_entries are being used as expected. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03block alloc: allocate cache entries from mem_poolJameson Miller1-19/+100
When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Managing transient cache_entry structs: Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with. Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently. Several strategies were contemplated around how to handle this: Chosen approach: An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not. This is currently an int field, as there are no more available bits in the existing ce_flags bit field. If / when more bits are needed, this new field can be turned into a proper bit field. Alternatives: 1) Do not include any information about how the cache_entry was allocated. Calling code would be responsible for tracking whether the cache_entry needed to be freed or not. Pro: No extra memory overhead to track this state Con: Extra complexity in callers to handle this correctly. The extra complexity and burden to not regress this behavior in the future was more than we wanted. 2) cache_entry would gain knowledge about which mem_pool allocated it Pro: Could (potentially) do extra logic to know when a mem_pool no longer had references to any cache_entry Con: cache_entry would grow heavier by a pointer, instead of int We didn't see a tangible benefit to this approach 3) Do not add any extra information to a cache_entry, but when freeing a cache entry, check if the memory exists in a region managed by existing mem_pools. Pro: No extra memory overhead to track state Con: Extra computation is performed when freeing cache entries We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this stae. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03block alloc: add lifecycle APIs for cache_entry structsJameson Miller1-28/+65
It has been observed that the time spent loading an index with a large number of entries is partly dominated by malloc() calls. This change is in preparation for using memory pools to reduce the number of malloc() calls made to allocate cahce entries when loading an index. Add an API to allocate and discard cache entries, abstracting the details of managing the memory backing the cache entries. This commit does actually change how memory is managed - this will be done in a later commit in the series. This change makes the distinction between cache entries that are associated with an index and cache entries that are not associated with an index. A main use of cache entries is with an index, and we can optimize the memory management around this. We still have other cases where a cache entry is not persisted with an index, and so we need to handle the "transient" use case as well. To keep the congnitive overhead of managing the cache entries, there will only be a single discard function. This means there must be enough information kept with the cache entry so that we know how to discard them. A summary of the main functions in the API is: make_cache_entry: create cache entry for use in an index. Uses specified parameters to populate cache_entry fields. make_empty_cache_entry: Create an empty cache entry for use in an index. Returns cache entry with empty fields. make_transient_cache_entry: create cache entry that is not used in an index. Uses specified parameters to populate cache_entry fields. make_empty_transient_cache_entry: create cache entry that is not used in an index. Returns cache entry with empty fields. discard_cache_entry: A single function that knows how to discard a cache entry regardless of how it was allocated. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03read-cache: teach make_cache_entry to take object_idJameson Miller1-3/+5
Teach make_cache_entry function to take object_id instead of a SHA-1. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03read-cache: teach refresh_cache_entry to take istateJameson Miller1-4/+5
Refactor refresh_cache_entry() to work on a specific index, instead of implicitly using the_index. This is in preparation for making the make_cache_entry function apply to a specific index. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-30Merge branch 'bc/object-id'Junio C Hamano1-17/+17
Conversion from uchar[20] to struct object_id continues. * bc/object-id: (42 commits) merge-one-file: compute empty blob object ID add--interactive: compute the empty tree value Update shell scripts to compute empty tree object ID sha1_file: only expose empty object constants through git_hash_algo dir: use the_hash_algo for empty blob object ID sequencer: use the_hash_algo for empty tree object ID cache-tree: use is_empty_tree_oid sha1_file: convert cached object code to struct object_id builtin/reset: convert use of EMPTY_TREE_SHA1_BIN builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX wt-status: convert two uses of EMPTY_TREE_SHA1_HEX submodule: convert several uses of EMPTY_TREE_SHA1_HEX sequencer: convert one use of EMPTY_TREE_SHA1_HEX merge: convert empty tree constant to the_hash_algo builtin/merge: switch tree functions to use object_id builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo sha1-file: add functions for hex empty tree and blob OIDs builtin/receive-pack: avoid hard-coded constants for push certs diff: specify abbreviation size in terms of the_hash_algo upload-pack: replace use of several hard-coded constants ...
2018-05-29Sync with Git 2.17.1Junio C Hamano1-13/+38
* maint: (25 commits) Git 2.17.1 Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 fsck: complain when .gitmodules is a symlink index-pack: check .gitmodules files with --strict unpack-objects: call fsck_finish() after fscking objects fsck: call fsck_finish() after fscking objects fsck: check .gitmodules content fsck: handle promisor objects in .gitmodules check fsck: detect gitmodules files fsck: actually fsck blob data fsck: simplify ".git" check index-pack: make fsck error message more specific verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant ...
2018-05-22Sync with Git 2.16.4Junio C Hamano1-13/+38
* maint-2.16: Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-22Sync with Git 2.15.2Junio C Hamano1-13/+38
* maint-2.15: Git 2.15.2 Git 2.14.4 Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-22Sync with Git 2.14.4Junio C Hamano1-13/+38
* maint-2.14: Git 2.14.4 Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-22Sync with Git 2.13.7Junio C Hamano1-13/+38
* maint-2.13: Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-21verify_path: disallow symlinks in .gitmodulesJeff King1-9/+31
There are a few reasons it's not a good idea to make .gitmodules a symlink, including: 1. It won't be portable to systems without symlinks. 2. It may behave inconsistently, since Git may look at this file in the index or a tree without bothering to resolve any symbolic links. We don't do this _yet_, but the config infrastructure is there and it's planned for the future. With some clever code, we could make (2) work. And some people may not care about (1) if they only work on one platform. But there are a few security reasons to simply disallow it: a. A symlinked .gitmodules file may circumvent any fsck checks of the content. b. Git may read and write from the on-disk file without sanity checking the symlink target. So for example, if you link ".gitmodules" to "../oops" and run "git submodule add", we'll write to the file "oops" outside the repository. Again, both of those are problems that _could_ be solved with sufficient code, but given the complications in (1) and (2), we're better off just outlawing it explicitly. Note the slightly tricky call to verify_path() in update-index's update_one(). There we may not have a mode if we're not updating from the filesystem (e.g., we might just be removing the file). Passing "0" as the mode there works fine; since it's not a symlink, we'll just skip the extra checks. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21verify_dotfile: mention case-insensitivity in commentJeff King1-1/+4
We're more restrictive than we need to be in matching ".GIT" on case-sensitive filesystems; let's make a note that this is intentional. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21verify_path: drop clever fallthroughJeff King1-4/+4
We check ".git" and ".." in the same switch statement, and fall through the cases to share the end-of-component check. While this saves us a line or two, it makes modifying the function much harder. Let's just write it out. Signed-off-by: Jeff King <peff@peff.net>
2018-05-16object-store: move object access functions to object-store.hStefan Beller1-0/+1
This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02Update struct index_state to use struct object_idbrian m. carlson1-8/+8
Adjust struct index_state to use struct object_id instead of unsigned char [20]. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02split-index: convert struct split_index to object_idbrian m. carlson1-11/+11
Convert the base_sha1 member of struct split_index to use struct object_id and rename it base_oid. Include cache.h to make the structure visible. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16read-cache.c: make $GIT_TEST_SPLIT_INDEX booleanNguyễn Thái Ngọc Duy1-2/+2
While at there, document about this special mode when running the test suite. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-10Merge branch 'bc/object-id'Junio C Hamano1-2/+2
Conversion from uchar[20] to struct object_id continues. * bc/object-id: (36 commits) convert: convert to struct object_id sha1_file: introduce a constant for max header length Convert lookup_replace_object to struct object_id sha1_file: convert read_sha1_file to struct object_id sha1_file: convert read_object_with_reference to object_id tree-walk: convert tree entry functions to object_id streaming: convert istream internals to struct object_id tree-walk: convert get_tree_entry_follow_symlinks internals to object_id builtin/notes: convert static functions to object_id builtin/fmt-merge-msg: convert remaining code to object_id sha1_file: convert sha1_object_info* to object_id Convert remaining callers of sha1_object_info_extended to object_id packfile: convert unpack_entry to struct object_id sha1_file: convert retry_bad_packed_offset to struct object_id sha1_file: convert assert_sha1_type to object_id builtin/mktree: convert to struct object_id streaming: convert open_istream to use struct object_id sha1_file: convert check_sha1_signature to struct object_id sha1_file: convert read_loose_object to use struct object_id builtin/index-pack: convert struct ref_delta_entry to object_id ...
2018-03-22Merge branch 'nd/shared-index-fix' into maintJunio C Hamano1-18/+22
Code clean-up. * nd/shared-index-fix: read-cache: don't write index twice if we can't write shared index read-cache.c: move tempfile creation/cleanup out of write_shared_index read-cache.c: change type of "temp" in write_shared_index()
2018-03-22Merge branch 'tg/split-index-fixes' into maintJunio C Hamano1-11/+14
The split-index mode had a few corner case bugs fixed. * tg/split-index-fixes: travis: run tests with GIT_TEST_SPLIT_INDEX split-index: don't write cache tree with null oid entries read-cache: fix reading the shared index for other repos
2018-03-21Merge branch 'rj/warning-uninitialized-fix'Junio C Hamano1-2/+4
Compilation fix. * rj/warning-uninitialized-fix: read-cache: fix an -Wmaybe-uninitialized warning -Wuninitialized: remove some 'init-self' workarounds
2018-03-21Merge branch 'bp/refresh-cache-ent-rehash-fix'Junio C Hamano1-1/+3
The codepath to replace an existing entry in the index had a bug in updating the name hash structure, which has been fixed. * bp/refresh-cache-ent-rehash-fix: Fix bugs preventing adding updated cache entries to the name hash
2018-03-21Merge branch 'ma/skip-writing-unchanged-index'Junio C Hamano1-0/+6
Internal API clean-up to allow write_locked_index() optionally skip writing the in-core index when it is not modified. * ma/skip-writing-unchanged-index: write_locked_index(): add flag to avoid writing unchanged index
2018-03-20read-cache: fix an -Wmaybe-uninitialized warningRamsay Jones1-2/+4
The function ce_write_entry() uses a 'self-initialised' variable construct, for the symbol 'saved_namelen', to suppress a gcc '-Wmaybe-uninitialized' warning, given that the warning is a false positive. For the purposes of this discussion, the ce_write_entry() function has three code blocks of interest, that look like so: /* block #1 */ if (ce->ce_flags & CE_STRIP_NAME) { saved_namelen = ce_namelen(ce); ce->ce_namelen = 0; } /* block #2 */ /* * several code blocks that contain, among others, calls * to copy_cache_entry_to_ondisk(ondisk, ce); */ /* block #3 */ if (ce->ce_flags & CE_STRIP_NAME) { ce->ce_namelen = saved_namelen; ce->ce_flags &= ~CE_STRIP_NAME; } The warning implies that gcc thinks it is possible that the first block is not entered, the calls to copy_cache_entry_to_ondisk() could toggle the CE_STRIP_NAME flag on, thereby entering block #3 with saved_namelen unset. However, the copy_cache_entry_to_ondisk() function does not write to ce->ce_flags (it only reads). gcc could easily determine this, since that function is local to this file, but it obviously doesn't. In order to suppress this warning, we make it clear to the reader (human and compiler), that block #3 will only be entered when the first block has been entered, by introducing a new 'stripped_name' boolean variable. We also take the opportunity to change the type of 'saved_namelen' to 'unsigned int' to match ce->ce_namelen. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15Fix bugs preventing adding updated cache entries to the name hashBen Peart1-1/+3
Update replace_index_entry() to clear the CE_HASHED flag from the new cache entry so that it can add it to the name hash in set_index_entry() Fix refresh_cache_ent() to use the copy_cache_entry() macro instead of memcpy() so that it doesn't incorrectly copy the hash state from the old entry. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14sha1_file: convert read_sha1_file to struct object_idbrian m. carlson1-2/+2
Convert read_sha1_file to take a pointer to struct object_id and rename it read_object_file. Do the same for read_sha1_file_extended. Convert one use in grep.c to use the new function without any other code change, since the pointer being passed is a void pointer that is already initialized with a pointer to struct object_id. Update the declaration and definitions of the modified functions, and apply the following semantic patch to convert the remaining callers: @@ expression E1, E2, E3; @@ - read_sha1_file(E1.hash, E2, E3) + read_object_file(&E1, E2, E3) @@ expression E1, E2, E3; @@ - read_sha1_file(E1->hash, E2, E3) + read_object_file(E1, E2, E3) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1.hash, E2, E3, E4) + read_object_file_extended(&E1, E2, E3, E4) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1->hash, E2, E3, E4) + read_object_file_extended(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-06Merge branch 'bw/c-plus-plus'Junio C Hamano1-20/+20
Avoid using identifiers that clash with C++ keywords. Even though it is not a goal to compile Git with C++ compilers, changes like this help use of code analysis tools that targets C++ on our codebase. * bw/c-plus-plus: (37 commits) replace: rename 'new' variables trailer: rename 'template' variables tempfile: rename 'template' variables wrapper: rename 'template' variables environment: rename 'namespace' variables diff: rename 'template' variables environment: rename 'template' variables init-db: rename 'template' variables unpack-trees: rename 'new' variables trailer: rename 'new' variables submodule: rename 'new' variables split-index: rename 'new' variables remote: rename 'new' variables ref-filter: rename 'new' variables read-cache: rename 'new' variables line-log: rename 'new' variables imap-send: rename 'new' variables http: rename 'new' variables entry: rename 'new' variables diffcore-delta: rename 'new' variables ...
2018-03-01write_locked_index(): add flag to avoid writing unchanged indexMartin Ågren1-0/+6
We have several callers like if (active_cache_changed && write_locked_index(...)) handle_error(); rollback_lock_file(...); where the final rollback is needed because "!active_cache_changed" shortcuts the if-expression. There are also a few variants of this, including some if-else constructs that make it more clear when the explicit rollback is really needed. Teach `write_locked_index()` to take a new flag SKIP_IF_UNCHANGED and simplify the callers. Leave the most complicated of the callers (in builtin/update-index.c) unchanged. Rewriting it to use this new flag would end up duplicating logic. We could have made the new flag behave the other way round ("FORCE_WRITE"), but that could break existing users behind their backs. Let's take the more conservative approach. We can still migrate existing callers to use our new flag. Later we might even be able to flip the default, possibly without entirely ignoring the risk to in-flight or out-of-tree topics. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-22read-cache: rename 'new' variablesBrandon Williams1-20/+20
Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-15Merge branch 'bc/hash-algo'Junio C Hamano1-29/+29
More abstraction of hash function from the codepath. * bc/hash-algo: hash: update obsolete reference to SHA1_HEADER bulk-checkin: abstract SHA-1 usage csum-file: abstract uses of SHA-1 csum-file: rename sha1file to hashfile read-cache: abstract away uses of SHA-1 pack-write: switch various SHA-1 values to abstract forms pack-check: convert various uses of SHA-1 to abstract forms fast-import: switch various uses of SHA-1 to the_hash_algo sha1_file: switch uses of SHA-1 to the_hash_algo builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo builtin/index-pack: improve hash function abstraction hash: create union for hash context allocation hash: move SHA-1 macros to hash.h
2018-02-15Merge branch 'nd/trace-index-ops'Junio C Hamano1-0/+7
* nd/trace-index-ops: trace: measure where the time is spent in the index-heavy operations
2018-02-15Merge branch 'po/object-id'Junio C Hamano1-3/+3
Conversion from uchar[20] to struct object_id continues. * po/object-id: sha1_file: rename hash_sha1_file_literally sha1_file: convert write_loose_object to object_id sha1_file: convert force_object_loose to object_id sha1_file: convert write_sha1_file to object_id notes: convert write_notes_tree to object_id notes: convert combine_notes_* to object_id commit: convert commit_tree* to object_id match-trees: convert splice_tree to object_id cache: clear whole hash buffer with oidclr sha1_file: convert hash_sha1_file to object_id dir: convert struct sha1_stat to use object_id sha1_file: convert pretend_sha1_file to object_id
2018-02-13Merge branch 'nd/shared-index-fix'Junio C Hamano1-18/+22
Code clean-up. * nd/shared-index-fix: read-cache: don't write index twice if we can't write shared index read-cache.c: move tempfile creation/cleanup out of write_shared_index read-cache.c: change type of "temp" in write_shared_index()
2018-02-13Merge branch 'sg/cocci-move-array'Junio C Hamano1-3/+2
Code clean-up. * sg/cocci-move-array: Use MOVE_ARRAY
2018-02-13Merge branch 'tg/split-index-fixes'Junio C Hamano1-11/+14
The split-index mode had a few corner case bugs fixed. * tg/split-index-fixes: travis: run tests with GIT_TEST_SPLIT_INDEX split-index: don't write cache tree with null oid entries read-cache: fix reading the shared index for other repos
2018-02-02read-cache: abstract away uses of SHA-1brian m. carlson1-29/+29
Convert various uses of direct calls to SHA-1 and 20- and 40-based constants to use the_hash_algo instead. Don't yet convert the on-disk data structures, which will be handled in a future commit. Adjust some comments so as not to refer explicitly to SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02trace: measure where the time is spent in the index-heavy operationsNguyễn Thái Ngọc Duy1-0/+7
All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not. An unoptimized git-status would give something like below: 0.001791141 s: read cache ... 0.004011363 s: preload index 0.000516161 s: refresh index 0.003139257 s: git command: ... 'status' '--porcelain=2' 0.006788129 s: diff-files 0.002090267 s: diff-index 0.001885735 s: initialize name hash 0.032013138 s: read directory 0.051781209 s: git command: './git' 'status' Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30sha1_file: convert write_sha1_file to object_idPatryk Obara1-3/+3
Convert the definition and declaration of write_sha1_file to struct object_id and adjust usage of this function. This commit also converts static function write_sha1_file_prepare, as it is closely related. Rename these functions to write_object_file and write_object_file_prepare respectively. Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents wherever possible. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-24read-cache: don't write index twice if we can't write shared indexNguyễn Thái Ngọc Duy1-2/+3
In a0a967568e ("update-index --split-index: do not split if $GIT_DIR is read only", 2014-06-13), we tried to make sure we can still write an index, even if the shared index can not be written. We did so by just calling 'do_write_locked_index()' just before 'write_shared_index()'. 'do_write_locked_index()' always at least closes the tempfile nowadays, and used to close or commit the lockfile if COMMIT_LOCK or CLOSE_LOCK were given at the time this feature was introduced. COMMIT_LOCK or CLOSE_LOCK is passed in by most callers of 'write_locked_index()'. After calling 'write_shared_index()', we call 'write_split_index()', which calls 'do_write_locked_index()' again, which then tries to use the closed lockfile again, but in fact fails to do so as it's already closed. This eventually leads to a segfault. Make sure to write the main index only once. [nd: most of the commit message and investigation done by Thomas, I only tweaked the solution a bit] Helped-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-22Use MOVE_ARRAYSZEDER Gábor1-3/+2
Use the helper macro MOVE_ARRAY to move arrays. This is shorter and safer, as it automatically infers the size of elements. Patch generated by Coccinelle and contrib/coccinelle/array.cocci in Travis CI's static analysis build job. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19split-index: don't write cache tree with null oid entriesThomas Gummerer1-1/+1
In a96d3cc3f6 ("cache-tree: reject entries with null sha1", 2017-04-21) we made sure that broken cache entries do not get propagated to new trees. Part of that was making sure not to re-use an existing cache tree that includes a null oid. It did so by dropping the cache tree in 'do_write_index()' if one of the entries contains a null oid. In split index mode however, there are two invocations to 'do_write_index()', one for the shared index and one for the split index. The cache tree is only written once, to the split index. As we only loop through the elements that are effectively being written by the current invocation, that may not include the entry with a null oid in the split index (when it is already written to the shared index), where we write the cache tree. Therefore in split index mode we may still end up writing the cache tree, even though there is an entry with a null oid in the index. Fix this by checking for null oids in prepare_to_write_split_index, where we loop the entries of the shared index as well as the entries for the split index. This fixes t7009 with GIT_TEST_SPLIT_INDEX. Also add a new test that's more specifically showing the problem. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19read-cache: fix reading the shared index for other reposThomas Gummerer1-10/+13
read_index_from() takes a path argument for the location of the index file. For reading the shared index in split index mode however it just ignores that path argument, and reads it from the gitdir of the current repository. This works as long as an index in the_repository is read. Once that changes, such as when we read the index of a submodule, or of a different working tree than the current one, the gitdir of the_repository will no longer contain the appropriate shared index, and git will fail to read it. For example t3007-ls-files-recurse-submodules.sh was broken with GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also broken in a similar manner, probably by introducing struct repository there, although I didn't track down the exact commit for that. be489d02d2 ("revision.c: --indexed-objects add objects from all worktrees", 2017-08-23) breaks with split index mode in a similar manner, not erroring out when it can't read the index, but instead carrying on with pruning, without taking the index of the worktree into account. Fix this by passing an additional gitdir parameter to read_index_from, to indicate where it should look for and read the shared index from. read_cache_from() defaults to using the gitdir of the_repository. As it is mostly a convenience macro, having to pass get_git_dir() for every call seems overkill, and if necessary users can have more control by using read_index_from(). Helped-by: Brandon Williams <bmwill@google.com> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16read-cache.c: move tempfile creation/cleanup out of write_shared_indexNguyễn Thái Ngọc Duy1-16/+17
For one thing, we have more consistent cleanup procedure now and always keep errno intact. The real purpose is the ability to break out of write_locked_index() early when mks_tempfile() fails in the next patch. It's more awkward to do it if this mks_tempfile() is still inside write_shared_index(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16read-cache.c: change type of "temp" in write_shared_index()Nguyễn Thái Ngọc Duy1-9/+11
This local variable 'temp' will be passed in from the caller in the next patch. To reduce patch noise, let's change its type now while it's still a local variable and get all the trival conversion out of the next patch. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-27Merge branch 'tb/add-renormalize'Junio C Hamano1-11/+19
"git add --renormalize ." is a new and safer way to record the fact that you are correcting the end-of-line convention and other "convert_to_git()" glitches in the in-repository data. * tb/add-renormalize: add: introduce "--renormalize"
2017-11-21Merge branch 'av/fsmonitor'Junio C Hamano1-0/+3
Various fixes to bp/fsmonitor topic. * av/fsmonitor: fsmonitor: simplify determining the git worktree under Windows fsmonitor: store fsmonitor bitmap before splitting index fsmonitor: read from getcwd(), not the PWD environment variable fsmonitor: delay updating state until after split index is merged fsmonitor: document GIT_TRACE_FSMONITOR fsmonitor: don't bother pretty-printing JSON from watchman fsmonitor: set the PWD to the top of the working tree
2017-11-21Merge branch 'bp/fsmonitor'Junio C Hamano1-5/+40
We learned to talk to watchman to speed up "git status" and other operations that need to see which paths have been modified. * bp/fsmonitor: fsmonitor: preserve utf8 filenames in fsmonitor-watchman log fsmonitor: read entirety of watchman output fsmonitor: MINGW support for watchman integration fsmonitor: add a performance test fsmonitor: add a sample integration script for Watchman fsmonitor: add test cases for fsmonitor extension split-index: disable the fsmonitor extension when running the split index test fsmonitor: add a test tool to dump the index extension update-index: add fsmonitor support to update-index ls-files: Add support in ls-files to display the fsmonitor valid bit fsmonitor: add documentation for the fsmonitor extension. fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. update-index: add a new --force-write-index option preload-index: add override to enable testing preload-index bswap: add 64 bit endianness helper get_be64
2017-11-17add: introduce "--renormalize"Torsten Bögershausen1-11/+19
Make it safer to normalize the line endings in a repository. Files that had been commited with CRLF will be commited with LF. The old way to normalize a repo was like this: # Make sure that there are not untracked files $ echo "* text=auto" >.gitattributes $ git read-tree --empty $ git add . $ git commit -m "Introduce end-of-line normalization" The user must make sure that there are no untracked files, otherwise they would have been added and tracked from now on. The new "add --renormalize" does not add untracked files: $ echo "* text=auto" >.gitattributes $ git add --renormalize . $ git commit -m "Introduce end-of-line normalization" Note that "git add --renormalize <pathspec>" is the short form for "git add -u --renormalize <pathspec>". While at it, document that the same renormalization may be needed, whenever a clean filter is added or changed. Helped-By: Junio C Hamano <gitster@pobox.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-15Merge branch 'bp/read-index-from-skip-verification'Junio C Hamano1-0/+6
Drop (perhaps overly cautious) sanity check before using the index read from the filesystem at runtime. * bp/read-index-from-skip-verification: read_index_from(): speed index loading by skipping verification of the entry order
2017-11-10fsmonitor: store fsmonitor bitmap before splitting indexAlex Vandiver1-0/+3
ba1b9cac ("fsmonitor: delay updating state until after split index is merged", 2017-10-27) resolved the problem of the fsmonitor data being applied to the non-base index when reading; however, a similar problem exists when writing the index. Specifically, writing of the fsmonitor extension happens only after the work to split the index has been applied -- as such, the information in the index is only for the non-"base" index, and thus the extension information contains only partial data. When saving, compute the ewah bitmap before the index is split, and store it in the fsmonitor_dirty field, mirroring the behavior that occurred during reading. fsmonitor_dirty is kept from being leaked by being freed when the extension data is written -- which always happens precisely once, no matter the split index configuration. Signed-off-by: Alex Vandiver <alexmv@dropbox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-08read_index_from(): speed index loading by skipping verification of the entry ↵Ben Peart1-0/+6
order There is code in post_read_index_from() to catch out of order entries when reading an index file. This order verification is ~13% of the cost of every call to read_index_from(). Update check_ce_order() so that it skips this verification unless the "verify_ce_order" global variable is set. Teach fsck to force this verification. The effect can be seen using t/perf/p0002-read-cache.sh: Test HEAD HEAD~1 -------------------------------------------------------------------------------------- 0002.1: read_cache/discard_cache 1000 times 0.41(0.04+0.04) 0.50(0.00+0.10) +22.0% Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>