aboutsummaryrefslogtreecommitdiffstats
path: root/cache.h
AgeCommit message (Collapse)AuthorFilesLines
2021-05-16Merge branch 'mt/parallel-checkout-part-3'Junio C Hamano1-5/+9
The final part of "parallel checkout". * mt/parallel-checkout-part-3: ci: run test round with parallel-checkout enabled parallel-checkout: add tests related to .gitattributes t0028: extract encoding helpers to lib-encoding.sh parallel-checkout: add tests related to path collisions parallel-checkout: add tests for basic operations checkout-index: add parallel checkout support builtin/checkout.c: complete parallel checkout support make_transient_cache_entry(): optionally alloc from mem_pool
2021-05-11Merge branch 'jk/symlinked-dotgitx-cleanup'Junio C Hamano1-0/+1
Various test and documentation updates about .gitsomething paths that are symlinks. * jk/symlinked-dotgitx-cleanup: docs: document symlink restrictions for dot-files fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW t0060: test ntfs/hfs-obscured dotfiles t7450: test .gitmodules symlink matching against obscured names t7450: test verify_path() handling of gitmodules t7415: rename to expand scope fsck_tree(): wrap some long lines fsck_tree(): fix shadowed variable t7415: remove out-dated comment about translation
2021-05-07Merge branch 'mt/add-rm-in-sparse-checkout'Junio C Hamano1-7/+8
"git add" and "git rm" learned not to touch those paths that are outside of sparse checkout. * mt/add-rm-in-sparse-checkout: rm: honor sparse checkout patterns add: warn when asked to update SKIP_WORKTREE entries refresh_index(): add flag to ignore SKIP_WORKTREE entries pathspec: allow to ignore SKIP_WORKTREE entries on index matching add: make --chmod and --renormalize honor sparse checkouts t3705: add tests for `git add` in sparse checkouts add: include magic part of pathspec on --refresh error
2021-05-05make_transient_cache_entry(): optionally alloc from mem_poolMatheus Tavares1-5/+9
Allow make_transient_cache_entry() to optionally receive a mem_pool struct in which it should allocate the entry. This will be used in the following patch, to store some transient entries which should persist until parallel checkout finishes. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04t0060: test ntfs/hfs-obscured dotfilesJeff King1-0/+1
We have tests that cover various filesystem-specific spellings of ".gitmodules", because we need to reliably identify that path for some security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules: add tests, 2018-05-12), with the actual code coming from e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02). Those latter two commits also added similar matching functions for .gitattributes and .gitignore. These ended up not being used in the final series, and are currently dead code. But in preparation for them being used in some fsck checks, let's make sure they actually work by throwing a few basic tests at them. Likewise, let's cover .mailmap (which does need matching code added). I didn't bother with the whole battery of tests that we cover for .gitmodules. These functions are all based on the same generic matcher, so it's sufficient to test most of the corner cases just once. Note that the ntfs magic prefix names in the tests come from the algorithm described in e7cb0b4455 (and are different for each file). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'ds/sparse-index-protections'Junio C Hamano1-4/+21
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-04-14cache: move ensure_full_index() to cache.hDerrick Stolee1-0/+1
Soon we will insert ensure_full_index() calls across the codebase. Instead of also adding include statements for sparse-index.h, let's just use the fact that anything that cares about the index already has cache.h in its includes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-14*: remove 'const' qualifier for struct index_stateDerrick Stolee1-3/+3
Several methods specify that they take a 'struct index_state' pointer with the 'const' qualifier because they intend to only query the data, not change it. However, we will be introducing a step very low in the method stack that might modify a sparse-index to become a full index in the case that our queries venture inside a sparse-directory entry. This change only removes the 'const' qualifiers that are necessary for the following change which will actually modify the implementation of index_name_stage_pos(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08refresh_index(): add flag to ignore SKIP_WORKTREE entriesMatheus Tavares1-7/+8
refresh_index() doesn't update SKIP_WORKTREE entries, but it still matches them against the given pathspecs, marks the matches on the seen[] array, check if unmerged, etc. In the following patch, one caller will need refresh_index() to ignore SKIP_WORKTREE entries entirely, so add a flag that implements this behavior. While we are here, also realign the REFRESH_* flags and convert the hex values to the more natural bit shift format, which makes it easier to spot holes. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-02Merge branch 'mt/parallel-checkout-part-1'Junio C Hamano1-24/+0
Preparatory API changes for parallel checkout. * mt/parallel-checkout-part-1: entry: add checkout_entry_ca() taking preloaded conv_attrs entry: move conv_attrs lookup up to checkout_entry() entry: extract update_ce_after_write() from write_entry() entry: make fstat_output() and read_blob_entry() public entry: extract a header file for entry.c functions convert: add classification for conv_attrs struct convert: add get_stream_filter_ca() variant convert: add [async_]convert_to_working_tree_ca() variants convert: make convert_attrs() and convert structs public
2021-03-30Merge branch 'ab/read-tree'Junio C Hamano1-1/+1
Code simplification by removing support for a caller that is long gone. * ab/read-tree: tree.h API: simplify read_tree_recursive() signature tree.h API: expose read_tree_1() as read_tree_at() archive: stop passing "stage" through read_tree_recursive() ls-files: refactor away read_tree() ls-files: don't needlessly pass around stage variable tree.c API: move read_tree() into builtin/ls-files.c ls-files tests: add meaningful --with-tree tests show tests: add test for "git show <tree>"
2021-03-30Merge branch 'mt/checkout-remove-nofollow'Junio C Hamano1-1/+1
When "git checkout" removes a path that does not exist in the commit it is checking out, it wasn't careful enough not to follow symbolic links, which has been corrected. * mt/checkout-remove-nofollow: checkout: don't follow symlinks when removing entries symlinks: update comment on threaded_check_leading_path()
2021-03-30sparse-index: add index.sparse config optionDerrick Stolee1-0/+1
When enabled, this config option signals that index writes should attempt to use sparse-directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: convert from full to sparseDerrick Stolee1-0/+2
If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-checkout: hold pattern list in indexDerrick Stolee1-0/+2
As we modify the sparse-checkout definition, we perform index operations on a pattern_list that only exists in-memory. This allows easy backing out in case the index update fails. However, if the index write itself cares about the sparse-checkout pattern set, we need access to that in-memory copy. Place a pointer to a 'struct pattern_list' in the index so we can access this on-demand. This will be used in the next change which uses the sparse-checkout definition to filter out directories that are outside the sparse cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-index: implement ensure_full_index()Derrick Stolee1-1/+12
We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-23entry: extract a header file for entry.c functionsMatheus Tavares1-24/+0
The declarations of entry.c's public functions and structures currently reside in cache.h. Although not many, they contribute to the size of cache.h and, when changed, cause the unnecessary recompilation of modules that don't really use these functions. So let's move them to a new entry.h header. While at it let's also move a comment related to checkout_entry() from entry.c to entry.h as it's more useful to describe the function there. Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-20tree.c API: move read_tree() into builtin/ls-files.cÆvar Arnfjörð Bjarmason1-1/+1
Since the read_tree() API was added around the same time as read_tree_recursive() in 94537c78a82 (Move "read_tree()" to "tree.c"[...], 2005-04-22) and b12ec373b8e ([PATCH] Teach read-tree about commit objects, 2005-04-20) things have gradually migrated over to the read_tree_recursive() version. Now builtin/ls-files.c is the last user of this code, let's move all the relevant code there. This allows for subsequent simplification of it, and an eventual move to read_tree_recursive(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18checkout: don't follow symlinks when removing entriesMatheus Tavares1-1/+1
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20), symlink.c:check_leading_path() started returning different codes for FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was not adjusted for this change, so it started to follow symlinks on the leading path of to-be-removed entries. Fix that and add a regression test. Note that since 1d718a5108 check_leading_path() no longer differentiates the case where it found a symlink in the path's leading components from the cases where it found a regular file or failed to lstat() the component. So, a side effect of this current patch is that unlink_entry() now returns early in all of these three cases. And because we no longer try to unlink such paths, we also don't get the warning from remove_or_warn(). For the regular file and symlink cases, it's questionable whether the warning was useful in the first place: unlink_entry() removes tracked paths that should no longer be present in the state we are checking out to. If the path had its leading dir replaced by another file, it means that the basename already doesn't exist, so there is no need for a warning. Sure, we are leaving a regular file or symlink behind at the path's dirname, but this file is either untracked now (so again, no need to warn), or it will be replaced by a tracked file during the next phase of this checkout operation. As for failing to lstat() one of the leading components, the basename might still exist only we cannot unlink it (e.g. due to the lack of the required permissions). Since the user expect it to be removed (especially with checkout's --no-overlay option), add back the warning in this more relevant case. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-08Sync with Git 2.30.2 for CVE-2021-21300Junio C Hamano1-0/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-12Sync with 2.29.3Johannes Schindelin1-0/+1
* maint-2.29: Git 2.29.3 Git 2.28.1 Git 2.27.1 Git 2.26.3 Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.28.1Johannes Schindelin1-0/+1
* maint-2.28: Git 2.28.1 Git 2.27.1 Git 2.26.3 Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.27.1Johannes Schindelin1-0/+1
* maint-2.27: Git 2.27.1 Git 2.26.3 Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.26.3Johannes Schindelin1-0/+1
* maint-2.26: Git 2.26.3 Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.25.5Johannes Schindelin1-0/+1
* maint-2.25: Git 2.25.5 Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.24.4Johannes Schindelin1-0/+1
* maint-2.24: Git 2.24.4 Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.23.4Johannes Schindelin1-0/+1
* maint-2.23: Git 2.23.4 Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.22.5Johannes Schindelin1-0/+1
* maint-2.22: Git 2.22.5 Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.21.4Johannes Schindelin1-0/+1
* maint-2.21: Git 2.21.4 Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.20.5Johannes Schindelin1-0/+1
* maint-2.20: Git 2.20.5 Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.19.6Johannes Schindelin1-0/+1
* maint-2.19: Git 2.19.6 Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.18.5Johannes Schindelin1-0/+1
* maint-2.18: Git 2.18.5 Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12Sync with 2.17.6Johannes Schindelin1-0/+1
* maint-2.17: Git 2.17.6 unpack_trees(): start with a fresh lstat cache run-command: invalidate lstat cache after a command finished checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12checkout: fix bug that makes checkout follow symlinks in leading pathMatheus Tavares1-0/+1
Before checking out a file, we have to confirm that all of its leading components are real existing directories. And to reduce the number of lstat() calls in this process, we cache the last leading path known to contain only directories. However, when a path collision occurs (e.g. when checking out case-sensitive files in case-insensitive file systems), a cached path might have its file type changed on disk, leaving the cache on an invalid state. Normally, this doesn't bring any bad consequences as we usually check out files in index order, and therefore, by the time the cached path becomes outdated, we no longer need it anyway (because all files in that directory would have already been written). But, there are some users of the checkout machinery that do not always follow the index order. In particular: checkout-index writes the paths in the same order that they appear on the CLI (or stdin); and the delayed checkout feature -- used when a long-running filter process replies with "status=delayed" -- postpones the checkout of some entries, thus modifying the checkout order. When we have to check out an out-of-order entry and the lstat() cache is invalid (due to a previous path collision), checkout_entry() may end up using the invalid data and thrusting that the leading components are real directories when, in reality, they are not. In the best case scenario, where the directory was replaced by a regular file, the user will get an error: "fatal: unable to create file 'foo/bar': Not a directory". But if the directory was replaced by a symlink, checkout could actually end up following the symlink and writing the file at a wrong place, even outside the repository. Since delayed checkout is affected by this bug, it could be used by an attacker to write arbitrary files during the clone of a maliciously crafted repository. Some candidate solutions considered were to disable the lstat() cache during unordered checkouts or sort the entries before passing them to the checkout machinery. But both ideas include some performance penalty and they don't future-proof the code against new unordered use cases. Instead, we now manually reset the lstat cache whenever we successfully remove a directory. Note: We are not even checking whether the directory was the same as the lstat cache points to because we might face a scenario where the paths refer to the same location but differ due to case folding, precomposed UTF-8 issues, or the presence of `..` components in the path. Two regression tests, with case-collisions and utf8-collisions, are also added for both checkout-index and delayed checkout. Note: to make the previously mentioned clone attack unfeasible, it would be sufficient to reset the lstat cache only after the remove_subtree() call inside checkout_entry(). This is the place where we would remove a directory whose path collides with the path of another entry that we are currently trying to check out (possibly a symlink). However, in the interest of a thorough fix that does not leave Git open to similar-but-not-identical attack vectors, we decided to intercept all `rmdir()` calls in one fell swoop. This addresses CVE-2021-21300. Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
2021-02-10Merge branch 'ds/more-index-cleanups'Junio C Hamano1-0/+1
Cleaning various codepaths up. * ds/more-index-cleanups: t1092: test interesting sparse-checkout scenarios test-lib: test_region looks for trace2 regions sparse-checkout: load sparse-checkout patterns name-hash: use trace2 regions for init repository: add repo reference to index_state fsmonitor: de-duplicate BUG()s around dirty bits cache-tree: extract subtree_pos() cache-tree: simplify verify_cache() prototype cache-tree: clean up cache_tree_update()
2021-01-25Merge branch 'ps/config-env-pairs'Junio C Hamano1-0/+1
Introduce two new ways to feed configuration variable-value pairs via environment variables, and tweak the way GIT_CONFIG_PARAMETERS encodes variable/value pairs to make it more robust. * ps/config-env-pairs: config: allow specifying config entries via envvar pairs environment: make `getenv_safe()` a public function config: store "git -c" variables using more robust format config: parse more robust format in GIT_CONFIG_PARAMETERS config: extract function to parse config pairs quote: make sq_dequote_step() a public function config: add new way to pass config via `--config-env` git: add `--super-prefix` to usage string
2021-01-23repository: add repo reference to index_stateDerrick Stolee1-0/+1
It will be helpful to add behavior to index operations that might trigger an object lookup. Since each index belongs to a specific repository, add a 'repo' pointer to struct index_state that allows access to this repository. Add a BUG() statement if the repo already has an index, and the index already has a repo, but somehow the index points to a different repo. This will prevent future changes from needing to pass an additional 'struct repository *repo' parameter and instead rely only on the 'struct index_state *istate' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15Merge branch 'bc/rev-parse-path-format'Junio C Hamano1-0/+2
"git rev-parse" can be explicitly told to give output as absolute or relative path with the `--path-format=(absolute|relative)` option. * bc/rev-parse-path-format: rev-parse: add option for absolute or relative path formatting abspath: add a function to resolve paths with missing components
2021-01-15config: allow specifying config entries via envvar pairsPatrick Steinhardt1-0/+1
While we currently have the `GIT_CONFIG_PARAMETERS` environment variable which can be used to pass runtime configuration data to git processes, it's an internal implementation detail and not supposed to be used by end users. Next to being for internal use only, this way of passing config entries has a major downside: the config keys need to be parsed as they contain both key and value in a single variable. As such, it is left to the user to escape any potentially harmful characters in the value, which is quite hard to do if values are controlled by a third party. This commit thus adds a new way of adding config entries via the environment which gets rid of this shortcoming. If the user passes the `GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each `i` in `[0,n)`. While the same can be achieved with `git -c <name>=<value>`, one may wish to not do so for potentially sensitive information. E.g. if one wants to set `http.extraHeader` to contain an authentication token, doing so via `-c` would trivially leak those credentials via e.g. ps(1), which typically also shows command arguments. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-17Merge branch 'jk/oid-array-cleanup'Junio C Hamano1-94/+0
Code clean-up. * jk/oid-array-cleanup: commit-graph: use size_t for array allocation and indexing commit-graph: replace packed_oid_list with oid_array commit-graph: drop count_distinct_commits() function oid-array: provide a for-loop iterator oid-array: make sort function public cache.h: move hash/oid functions to hash.h t0064: make duplicate tests more robust t0064: drop sha1 mention from filename oid-array.h: drop sha1 mention from header guard
2020-12-12abspath: add a function to resolve paths with missing componentsbrian m. carlson1-0/+2
Currently, we have a function to resolve paths, strbuf_realpath. This function canonicalizes paths like realpath(3), but permits a trailing component to be absent from the file system. In other words, this is the behavior of the GNU realpath(1) without any arguments. In the future, we'll need this same behavior, except that we want to allow for any number of missing trailing components, which is the behavior of GNU realpath(1) with the -m option. This is useful because we'll want to canonicalize a path that may point to a not yet present path under the .git directory. For example, a user may want to know where an arbitrary ref would be stored if it existed in the file system. Let's refactor strbuf_realpath to move most of the code to an internal function and then pass it two flags to control its behavior. We'll add a strbuf_realpath_forgiving function that has our new behavior, and leave strbuf_realpath with the older, stricter behavior. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08Merge branch 'mt/do-not-use-scld-in-working-tree'Junio C Hamano1-1/+6
"git apply" adjusted the permission bits of working-tree files and directories according core.sharedRepository setting by mistake and for a long time, which has been corrected. * mt/do-not-use-scld-in-working-tree: apply: don't use core.sharedRepository to create working tree files
2020-12-04cache.h: move hash/oid functions to hash.hJeff King1-94/+0
We define git_hash_algo and object_id in hash.h, but most of the utility functions are declared in the main cache.h. Let's move them to hash.h along with their struct definitions. This cleans up cache.h a bit, but also avoids circular dependencies when other headers need to know about these functions (e.g., if oid-array.h were to have an inline that used oideq(), it couldn't include cache.h because it is itself included by cache.h). No including C files should be affected, because hash.h is always included in cache.h already. We do have to mention repository.h at the top of hash.h, though, since we depend on the_repository in some of our inline functions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-02apply: don't use core.sharedRepository to create working tree filesMatheus Tavares1-1/+6
core.sharedRepository defines which permissions Git should set when creating files in $GIT_DIR, so that the repository may be shared with other users. But (in its current form) the setting shouldn't affect how files are created in the working tree. This is not respected by apply and am (which uses apply), when creating leading directories: $ cat d.patch diff --git a/d/f b/d/f new file mode 100644 index 0000000..e69de29 Apply without the setting: $ umask 0077 $ git apply d.patch $ ls -ld d drwx------ Apply with the setting: $ umask 0077 $ git -c core.sharedRepository=0770 apply d.patch $ ls -ld d drwxrws--- Only the leading directories are affected. That's because they are created with safe_create_leading_directories(), which calls adjust_shared_perm() to set the directories' permissions based on core.sharedRepository. To fix that, let's introduce a variant of this function that ignores the setting, and use it in apply. Also add a regression test and a note in the function documentation about the use of each variant according to the destination (working tree or git dir). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-24move sleep_millisec to git-compat-util.hHan-Wen Nienhuys1-1/+0
The sleep function is defined in wrapper.c, so it makes more sense to be a in system compatibility header. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-22builtin/clone: avoid failure with GIT_DEFAULT_HASHbrian m. carlson1-1/+1
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to "sha256", then we can end up with a repository where the repository format version is 0 but the extensions.objectformat key is set to "sha256". This is both wrong (the user has a SHA-1 repository) and nonfunctional (because the extension cannot be used in a v0 repository). This happens because in a clone, we initially set up the repository, and then change its algorithm based on what the remote side tells us it's using. We've initially set up the repository as SHA-256 in this case, and then later on reset the repository version without clearing the extension. We could just always set the extension in this case, but that would mean that our SHA-1 repositories weren't compatible with older Git versions, even though there's no reason why they shouldn't be. And we also don't want to initialize the repository as SHA-1 initially, since that means if we're cloning an empty repository, we'll have failed to honor the GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a SHA-256 repository. Neither of those are appealing, so let's tell the repository initialization code if we're doing a reinit like this, and if so, to clear the extension if we're using SHA-1. This makes sure we produce a valid and functional repository and doesn't break any of our other use cases. Reported-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-09Merge branch 'jt/interpret-branch-name-fallback'Junio C Hamano1-8/+19
"git status" has trouble showing where it came from by interpreting reflog entries that recordcertain events, e.g. "checkout @{u}", and gives a hard/fatal error. Even though it inherently is impossible to give a correct answer because the reflog entries lose some information (e.g. "@{u}" does not record what branch the user was on hence which branch 'the upstream' needs to be computed, and even if the record were available, the relationship between branches may have changed), at least hide the error to allow "status" show its output. * jt/interpret-branch-name-fallback: wt-status: tolerate dangling marks refs: move dwim_ref() to header file sha1-name: replace unsigned int with option struct
2020-09-02wt-status: tolerate dangling marksJonathan Tan1-0/+7
When a user checks out the upstream branch of HEAD, the upstream branch not being a local branch, and then runs "git status", like this: git clone $URL client cd client git checkout @{u} git status no status is printed, but instead an error message: fatal: HEAD does not point to a branch (This error message when running "git branch" persists even after checking out other things - it only stops after checking out a branch.) This is because "git status" reads the reflog when determining the "HEAD detached" message, and thus attempts to DWIM "@{u}", but that doesn't work because HEAD no longer points to a branch. Therefore, when calculating the status of a worktree, tolerate dangling marks. This is done by adding an additional parameter to dwim_ref() and repo_dwim_ref(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-02sha1-name: replace unsigned int with option structJonathan Tan1-8/+12
In preparation for a future patch adding a boolean parameter to repo_interpret_branch_name(), which might be easily confused with an existing unsigned int parameter, refactor repo_interpret_branch_name() to take an option struct instead of the unsigned int parameter. The static function interpret_branch_mark() is also updated to take the option struct in preparation for that future patch, since it will also make use of the to-be-introduced boolean parameter. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27Merge branch 'jk/leakfix'Junio C Hamano1-2/+2
Code clean-up. * jk/leakfix: submodule--helper: fix leak of core.worktree value config: fix leak in git_config_get_expiry_in_days() config: drop git_config_get_string_const() config: fix leaks from git_config_get_string_const() checkout: fix leak of non-existent branch names submodule--helper: use strbuf_release() to free strbufs clear_pattern_list(): clear embedded hashmaps
2020-08-17config: drop git_config_get_string_const()Jeff King1-2/+2
As evidenced by the leak fixes in the previous commit, the "const" in git_config_get_string_const() clearly misleads people into thinking that it does not allocate a copy of the string. We can fix this by renaming it, but it's easier still to just drop it. Of the four remaining callers: - The one in git_config_parse_expiry() still needs to allocate, since that's what its callers expect. We can just use the non-const version and cast our pointer. Slightly ugly, but the damage is contained in one spot. - The two in apply are writing to global "const char *" variables, and need to continue allocating. We often mark these as const because we assign default string literals to them. But in this case we don't do that, so we can just declare them as real "char *" pointers and use the non-const version. - The call in checkout doesn't actually need a copy; it can just use the non-allocating "tmp" version of the function. The function is also mentioned in the MyFirstContribution document. We can swap that call out for the non-allocating "tmp" variant, which fits well in the example given. We'll drop the "configset" and "repo" variants, as well (which are unused). Note that this frees up the "const" name, so we could rename the "tmp" variant back to that. But let's give some time for topics in flight to adapt to the new code before doing so (if we do it too soon, the function semantics will change but the compiler won't alert us). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30Merge branch 'jk/reject-newer-extensions-in-v0' into masterJunio C Hamano1-0/+2
With the base fix to 2.27 regresion, any new extensions in a v0 repository would still be silently honored, which is not quite right. Instead, complain and die loudly. * jk/reject-newer-extensions-in-v0: verify_repository_format(): complain about new extensions in v0 repo
2020-07-16Merge branch 'jn/v0-with-extensions-fix' into masterJunio C Hamano1-1/+0
In 2.28-rc0, we corrected a bug that some repository extensions are honored by mistake even in a version 0 repositories (these configuration variables in extensions.* namespace were supposed to have special meaning in repositories whose version numbers are 1 or higher), but this was a bit too big a change. * jn/v0-with-extensions-fix: repository: allow repository format upgrade with extensions Revert "check_repository_format_gently(): refuse extensions for old repositories"
2020-07-16verify_repository_format(): complain about new extensions in v0 repoJeff King1-0/+2
We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16repository: allow repository format upgrade with extensionsJonathan Nieder1-1/+0
Now that we officially permit repository extensions in repository format v0, permit upgrading a repository with extensions from v0 to v1 as well. For example, this means a repository where the user has set "extensions.preciousObjects" can use "git fetch --filter=blob:none origin" to upgrade the repository to use v1 and the partial clone extension. To avoid mistakes, continue to forbid repository format upgrades in v0 repositories with an unrecognized extension. This way, a v0 user using a misspelled extension field gets a chance to correct the mistake before updating to the less forgiving v1 format. While we're here, make the error message for failure to upgrade the repository format a bit shorter, and present it as an error, not a warning. Reported-by: Huan Huan Chen <huanhuanchen@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-06Merge branch 'js/default-branch-name'Junio C Hamano1-1/+1
The name of the primary branch in existing repositories, and the default name used for the first branch in newly created repositories, is made configurable, so that we can eventually wean ourselves off of the hardcoded 'master'. * js/default-branch-name: contrib: subtree: adjust test to change in fmt-merge-msg testsvn: respect `init.defaultBranch` remote: use the configured default branch name when appropriate clone: use configured default branch name when appropriate init: allow setting the default for the initial branch name via the config init: allow specifying the initial branch name for the new repository docs: add missing diamond brackets submodule: fall back to remote's HEAD for missing remote.<name>.branch send-pack/transport-helper: avoid mentioning a particular branch fmt-merge-msg: stop treating `master` specially
2020-06-24init: allow specifying the initial branch name for the new repositoryJohannes Schindelin1-1/+1
There is a growing number of projects and companies desiring to change the main branch name of their repositories (see e.g. https://twitter.com/mislav/status/1270388510684598272 for background on this). To change that branch name for new repositories, currently the only way to do that automatically is by copying all of Git's template directory, then hard-coding the desired default branch name into the `.git/HEAD` file, and then configuring `init.templateDir` to point to those copied template files. To make this process much less cumbersome, let's introduce a new option: `--initial-branch=<branch-name>`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05repository: add a helper function to perform repository format upgradeXin Li1-0/+1
In version 1 of repository format, "extensions" gained special meaning and it is safer to avoid upgrading when there are pre-existing extensions. Make list-objects-filter to use the helper function instead of setting repository version directly as a prerequisite of exposing the upgrade capability. Signed-off-by: Xin Li <delphij@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-22Merge branch 'jk/oid-array-cleanups'Junio C Hamano1-1/+1
Code cleanup. * jk/oid-array-cleanups: oidset: stop referring to sha1-array ref-filter: stop referring to "sha1 array" bisect: stop referring to sha1_array test-tool: rename sha1-array to oid-array oid_array: rename source file from sha1-array oid_array: use size_t for iteration oid_array: use size_t for count and allocation
2020-03-30oid_array: rename source file from sha1-arrayJeff King1-1/+1
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to oid_array, 2017-03-31), but the file is still called sha1-array. Besides being slightly confusing, it makes it more annoying to grep for leftover occurrences of "sha1" in various files, because the header is included in so many places. Let's complete the transition by renaming the source and header files (and fixing up a few comment references). I kept the "-" in the name, as that seems to be our style; cf. fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10). We also have oidmap.h and oidset.h without any punctuation, but those are "struct oidmap" and "struct oidset" in the code. We _could_ make this "oidarray" to match, but somehow it looks uglier to me because of the length of "array" (plus it would be a very invasive patch for little gain). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-26Merge branch 'bc/filter-process'Junio C Hamano1-0/+1
Provide more information (e.g. the object of the tree-ish in which the blob being converted appears, in addition to its path, which has already been given) to smudge/clean conversion filters. * bc/filter-process: t0021: test filter metadata for additional cases builtin/reset: compute checkout metadata for reset builtin/rebase: compute checkout metadata for rebases builtin/clone: compute checkout metadata for clones builtin/checkout: compute checkout metadata for checkouts convert: provide additional metadata to filters convert: permit passing additional metadata to filter processes builtin/checkout: pass branch info down to checkout_worktree
2020-03-26Merge branch 'bc/sha-256-part-1-of-4'Junio C Hamano1-2/+23
SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...
2020-03-16convert: provide additional metadata to filtersbrian m. carlson1-0/+1
Now that we have the codebase wired up to pass any additional metadata to filters, let's collect the additional metadata that we'd like to pass. The two main places we pass this metadata are checkouts and archives. In these two situations, reading HEAD isn't a valid option, since HEAD isn't updated for checkouts until after the working tree is written and archives can accept an arbitrary tree. In other situations, HEAD will usually reflect the refname of the branch in current use. We pass a smaller amount of data in other cases, such as git cat-file, where we can really only logically know about the blob. This commit updates only the parts of the checkout code where we don't use unpack_trees. That function and callers of it will be handled in a future commit. In the archive code, we leak a small amount of memory, since nothing we pass in the archiver argument structure is freed. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-10real_path_if_valid(): remove unsafe APIAlexandr Miloslavskiy1-1/+0
This commit continues the work started with previous commit. Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-10real_path: remove unsafe APIAlexandr Miloslavskiy1-1/+0
Returning a shared buffer invites very subtle bugs due to reentrancy or multi-threading, as demonstrated by the previous patch. There was an unfinished effort to abolish this [1]. Let's finally rid of `real_path()`, using `strbuf_realpath()` instead. This patch uses a local `strbuf` for most places where `real_path()` was previously called. However, two places return the value of `real_path()` to the caller. For them, a `static` local `strbuf` was added, effectively pushing the problem one level higher: read_gitfile_gently() get_superproject_working_tree() [1] https://lore.kernel.org/git/1480964316-99305-1-git-send-email-bmwill@google.com/ Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-06set_git_dir: fix crash when used with real_path()Alexandr Miloslavskiy1-1/+1
`real_path()` returns result from a shared buffer, inviting subtle reentrance bugs. One of these bugs occur when invoked this way: set_git_dir(real_path(git_dir)) In this case, `real_path()` has reentrance: real_path read_gitfile_gently repo_set_gitdir setup_git_env set_git_dir_1 set_git_dir Later, `set_git_dir()` uses its now-dead parameter: !is_absolute_path(path) Fix this by using a dedicated `strbuf` to hold `strbuf_realpath()`. Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24init-db: move writing repo version into a functionbrian m. carlson1-0/+1
When we perform a clone, we won't know the remote side's hash algorithm until we've read the heads. Consequently, we'll need to rewrite the repository format version and hash algorithm once we know what the remote side has. Move the code that does this into its own function so that we can call it from clone in the future. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24builtin/init-db: allow specifying hash algorithm on command linebrian m. carlson1-1/+2
Allow the user to specify the hash algorithm on the command line by using the --object-format option to git init. Validate that the user is not attempting to reinitialize a repository with a different hash algorithm. Ensure that if we are writing a non-SHA-1 repository that we set the repository version to 1 and write the objectFormat extension. Restrict this option to work only when ENABLE_SHA256 is set until the codebase is in a situation to fully support this. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24setup: allow check_repository_format to read repository formatbrian m. carlson1-1/+3
In some cases, we will want to not only check the repository format, but extract the information that we've gained. To do so, allow check_repository_format to take a pointer to struct repository_format. Allow passing NULL for this argument if we're not interested in the information, and pass NULL for all existing callers. A future patch will make use of this information. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24hex: add functions to parse hex object IDs in any algorithmbrian m. carlson1-0/+10
There are some places where we need to parse a hex object ID in any algorithm without knowing beforehand which algorithm is in use. An example is when parsing fast-import marks. Add a get_oid_hex_any to parse an object ID and return the algorithm it belongs to, and additionally add parse_oid_hex_any which is the equivalent change for parse_oid_hex. If the object is not parseable, we return GIT_HASH_UNKNOWN. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24hex: introduce parsing variants taking hash algorithmsbrian m. carlson1-0/+7
Introduce variants of get_oid_hex and parse_oid_hex that parse an arbitrary hash algorithm, implementing internal functions to avoid duplication. These functions can be used in the transport code to parse refs properly. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14Merge branch 'mt/use-passed-repo-more-in-funcs'Junio C Hamano1-1/+2
Some codepaths were given a repository instance as a parameter to work in the repository, but passed the_repository instance to its callees, which has been cleaned up (somewhat). * mt/use-passed-repo-more-in-funcs: sha1-file: allow check_object_signature() to handle any repo sha1-file: pass git_hash_algo to hash_object_file() sha1-file: pass git_hash_algo to write_object_file_prepare() streaming: allow open_istream() to handle any repo pack-check: use given repo's hash_algo at verify_packfile() cache-tree: use given repo's hash_algo at verify_one() diff: make diff_populate_filespec() honor its repo argument
2020-01-31sha1-file: allow check_object_signature() to handle any repoMatheus Tavares1-1/+2
Some callers of check_object_signature() can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repository is always used internally. To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-13fsmonitor: change last update timestamp on the index_state to opaque tokenKevin Willford1-1/+1
Some file system monitors might not use or take a timestamp for processing and in the case of watchman could have race conditions with using a timestamp. Watchman uses something called a clockid that is used for race free queries to it. The clockid for watchman is simply a string. Change the fsmonitor_last_update from being a uint64_t to a char pointer so that any arbitrary data can be stored in it and passed back to the fsmonitor. Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-06Merge branch 'ds/sparse-cone'Junio C Hamano1-2/+2
Code cleanup. * ds/sparse-cone: Documentation/git-sparse-checkout.txt: fix a typo sparse-checkout: use extern for global variables
2020-01-02sparse-checkout: use extern for global variablesDerrick Stolee1-2/+2
When the core.sparseCheckoutCone config setting was added in 879321eb0b ("sparse-checkout: add 'cone' mode" 2019-11-21), the variables storing the config values for core.sparseCheckout and core.sparseCheckoutCone were rearranged in cache.h, but in doing so the "extern" keyword was dropped. While we are tending to drop the "extern" keyword for function declarations, it is still necessary for global variables used across multiple *.c files. The impact of not having the extern keyword may be unpredictable. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-25Merge branch 'ds/sparse-cone'Junio C Hamano1-1/+5
Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command. * ds/sparse-cone: (21 commits) sparse-checkout: improve OS ls compatibility sparse-checkout: respect core.ignoreCase in cone mode sparse-checkout: check for dirty status sparse-checkout: update working directory in-process for 'init' sparse-checkout: cone mode should not interact with .gitignore sparse-checkout: write using lockfile sparse-checkout: use in-process update for disable subcommand sparse-checkout: update working directory in-process sparse-checkout: sanitize for nested folders unpack-trees: add progress to clear_ce_flags() unpack-trees: hash less in cone mode sparse-checkout: init and set in cone mode sparse-checkout: use hashmaps for cone patterns sparse-checkout: add 'cone' mode trace2: add region in clear_ce_flags sparse-checkout: create 'disable' subcommand sparse-checkout: add '--stdin' option to set subcommand sparse-checkout: 'set' subcommand clone: add --sparse mode sparse-checkout: create 'init' subcommand ...
2019-12-16Merge branch 'hw/doc-in-header'Junio C Hamano1-4/+37
* hw/doc-in-header: trace2: move doc to trace2.h submodule-config: move doc to submodule-config.h tree-walk: move doc to tree-walk.h trace: move doc to trace.h run-command: move doc to run-command.h parse-options: add link to doc file in parse-options.h credential: move doc to credential.h argv-array: move doc to argv-array.h cache: move doc to cache.h sigchain: move doc to sigchain.h pathspec: move doc to pathspec.h revision: move doc to revision.h attr: move doc to attr.h refs: move doc to refs.h remote: move doc to remote.h and refspec.h sha1-array: move doc to sha1-array.h merge: move doc to ll-merge.h graph: move doc to graph.h and graph.c dir: move doc to dir.h diff: move doc to diff.h and diffcore.h
2019-11-22unpack-trees: add progress to clear_ce_flags()Derrick Stolee1-0/+2
When a large repository has many sparse-checkout patterns, the process for updating the skip-worktree bits can take long enough that a user gets confused why nothing is happening. Update the clear_ce_flags() method to write progress. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: add 'cone' modeDerrick Stolee1-1/+3
The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow. If there are 1,000 patterns and 1,000,000 entries, this time can be very significant. Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns. This is a separate config setting from core.sparseCheckout to avoid breaking older clients by introducing a tri-state option. The config option does nothing right now, but will be expanded upon in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-18cache: move doc to cache.hHeba Waly1-4/+37
Move the documentation from Documentation/technical/api-allocation-growing.txt to cache.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-allocation-growing.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13hex: drop sha1_to_hex()Jeff King1-3/+4
There's only a single caller left of sha1_to_hex(), since everybody that has an object name in "unsigned char[]" now uses hash_to_hex() instead. This case is in the sha1dc wrapper, where we print a hex sha1 when we find a collision. This one will always be sha1, regardless of the current hash algorithm, so we can't use hash_to_hex() here. In practice we'd probably not be running sha1 at all if it isn't the current algorithm, but it's possible we might still occasionally need to compute a sha1 in a post-sha256 world. Since sha1_to_hex() is just a wrapper for hash_to_hex_algop(), let's call that ourselves. There's value in getting rid of the sha1-specific wrapper to de-clutter the global namespace, and to make sure nobody uses it (and as with sha1_to_hex_r() in the previous patch, we'll drop the coccinelle transformations, too). The sha1_to_hex() function is mentioned in a comment; we can easily swap that out for oid_to_hex() to give a better example. Also update the comment that was left stale when we added "struct object_id *" as a way to name an object and added functions to convert it to hex. The function is also mentioned in some test vectors in t4100, but that's not runnable code, so there's no point in trying to clean it up. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-11hex: drop sha1_to_hex_r()Jeff King1-1/+0
There are no callers left; everybody uses oid_to_hex_r() or hash_to_hex_algop_r(). This used to actually be the underlying implementation for oid_to_hex_r(), but that's no longer the case since 47edb64997 (hex: introduce functions to print arbitrary hashes, 2018-11-14). Let's get rid of it to de-clutter and to make sure nobody uses it. Likewise we can drop the coccinelle rules that mention it, since the compiler will make it quite clear that the code does not work. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-15Merge branch 'js/azure-pipelines-msvc'Junio C Hamano1-0/+13
CI updates. * js/azure-pipelines-msvc: ci: also build and test with MS Visual Studio on Azure Pipelines ci: really use shallow clones on Azure Pipelines tests: let --immediate and --write-junit-xml play well together test-tool run-command: learn to run (parts of) the testsuite vcxproj: include more generated files vcxproj: only copy `git-remote-http.exe` once it was built msvc: work around a bug in GetEnvironmentVariable() msvc: handle DEVELOPER=1 msvc: ignore some libraries when linking compat/win32/path-utils.h: add #include guards winansi: use FLEX_ARRAY to avoid compiler warning msvc: avoid using minus operator on unsigned types push: do not pretend to return `int` from `die_push_simple()`
2019-10-11Merge branch 'bc/object-id-part17'Junio C Hamano1-7/+1
Preparation for SHA-256 upgrade continues. * bc/object-id-part17: (26 commits) midx: switch to using the_hash_algo builtin/show-index: replace sha1_to_hex rerere: replace sha1_to_hex builtin/receive-pack: replace sha1_to_hex builtin/index-pack: replace sha1_to_hex packfile: replace sha1_to_hex wt-status: convert struct wt_status to object_id cache: remove null_sha1 builtin/worktree: switch null_sha1 to null_oid builtin/repack: write object IDs of the proper length pack-write: use hash_to_hex when writing checksums sequencer: convert to use the_hash_algo bisect: switch to using the_hash_algo sha1-lookup: switch hard-coded constants to the_hash_algo config: use the_hash_algo in abbrev comparison combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo bundle: switch to use the_hash_algo connected: switch GIT_SHA1_HEXSZ to the_hash_algo show-index: switch hard-coded constants to the_hash_algo blame: remove needless comparison with GIT_SHA1_HEXSZ ...
2019-10-07Merge branch 'ss/get-time-cleanup'Junio C Hamano1-3/+2
Code simplification. * ss/get-time-cleanup: test_date.c: remove reference to GIT_TEST_DATE_NOW Quit passing 'now' to date code
2019-10-07Merge branch 'tg/stash-refresh-index'Junio C Hamano1-0/+18
"git stash" learned to write refreshed index back to disk. * tg/stash-refresh-index: stash: make sure to write refreshed cache merge: use refresh_and_write_cache factor out refresh_and_write_cache function
2019-10-06msvc: avoid using minus operator on unsigned typesJohannes Schindelin1-0/+13
MSVC complains about this with `-Wall`, which can be taken as a sign that this is indeed a real bug. The symptom is: C4146: unary minus operator applied to unsigned type, result still unsigned Let's avoid this warning in the minimal way, e.g. writing `-1 - <unsigned value>` instead of `-<unsigned value> - 1`. Note that the change in the `estimate_cache_size()` function is needed because MSVC considers the "return type" of the `sizeof()` operator to be `size_t`, i.e. unsigned, and therefore it cannot be negated using the unary minus operator. Even worse, that arithmetic is doing extra work, in vain. We want to calculate the entry extra cache size as the difference between the size of the `cache_entry` structure minus the size of the `ondisk_cache_entry` structure, padded to the appropriate alignment boundary. To that end, we start by assigning that difference to the `per_entry` variable, and then abuse the `len` parameter of the `align_padding_size()` macro to take the negative size of the ondisk entry size. Essentially, we try to avoid passing the already calculated difference to that macro by passing the operands of that difference instead, when the macro expects operands of an addition: #define align_padding_size(size, len) \ ((size + (len) + 8) & ~7) - (size + len) Currently, we pass A and -B to that macro instead of passing A - B and 0, where A - B is already stored in the `per_entry` variable, ready to be used. This is neither necessary, nor intuitive. Let's fix this, and have code that is both easier to read and that also does not trigger MSVC's warning. While at it, we take care of reporting overflows (which are unlikely, but hey, defensive programming is good!). We _also_ take pains of casting the unsigned value to signed: otherwise, the signed operand (i.e. the `-1`) would be cast to unsigned before doing the arithmetic. Helped-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-20factor out refresh_and_write_cache functionThomas Gummerer1-0/+18
Getting the lock for the index, refreshing it and then writing it is a pattern that happens more than once throughout the codebase, and isn't trivial to get right. Factor out the refresh_and_write_cache function from builtin/am.c to read-cache.c, so it can be re-used in other places in a subsequent commit. Note that we return different error codes for failing to refresh the cache, and failing to write the index. The current caller only cares about failing to write the index. However for other callers we're going to convert in subsequent patches we will need this distinction. Helped-by: Martin Ågren <martin.agren@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-18Merge branch 'md/list-objects-filter-combo'Junio C Hamano1-0/+22
The list-objects-filter API (used to create a sparse/lazy clone) learned to take a combined filter specification. * md/list-objects-filter-combo: list-objects-filter-options: make parser void list-objects-filter-options: clean up use of ALLOC_GROW list-objects-filter-options: allow mult. --filter strbuf: give URL-encoding API a char predicate fn list-objects-filter-options: make filter_spec a string_list list-objects-filter-options: move error check up list-objects-filter: implement composite filters list-objects-filter-options: always supply *errbuf list-objects-filter: put omits set in filter struct list-objects-filter: encapsulate filter components
2019-09-18Merge branch 'cc/multi-promisor'Junio C Hamano1-2/+0
Teach the lazy clone machinery that there can be more than one promisor remote and consult them in order when downloading missing objects on demand. * cc/multi-promisor: Move core_partial_clone_filter_default to promisor-remote.c Move repository_format_partial_clone to promisor-remote.c Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} remote: add promisor and partial clone config to the doc partial-clone: add multiple remotes in the doc t0410: test fetching from many promisor remotes builtin/fetch: remove unique promisor remote limitation promisor-remote: parse remote.*.partialclonefilter Use promisor_remote_get_direct() and has_promisor_remote() promisor-remote: use repository_format_partial_clone promisor-remote: add promisor_remote_reinit() promisor-remote: implement promisor_remote_get_direct() Add initial support for many promisor remotes fetch-object: make functions return an error code t0410: remove pipes after git commands
2019-09-12Quit passing 'now' to date codeStephen P. Smith1-3/+2
Commit b841d4ff43 (Add `human` format to test-tool, 2019-01-28) added a get_time() function which allows $GIT_TEST_DATE_NOW in the environment to override the current time. So we no longer need to interpret that variable in cmd__date(). Therefore, we can stop passing the "now" parameter down through the date functions, since nobody uses them. Note that we do need to make sure all of the previous callers that took a "now" parameter are correctly using get_time(). Signed-off-by: Stephen P. Smith <ischis2@cox.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19cache: remove null_sha1brian m. carlson1-7/+1
All of the existing uses of null_sha1 can be converted into uses of null_oid, so do so. Remove null_sha1 and is_null_sha1, and define is_null_oid in terms of null_oid. This also has the additional benefit of removing several uses of sha1_to_hex. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-29Merge branch 'sg/rebase-progress' into maintJunio C Hamano1-0/+1
Use "Erase in Line" CSI sequence that is already used in the editor support to clear cruft in the progress output. * sg/rebase-progress: progress: use term_clear_line() rebase: fix garbled progress display with '-x' pager: add a helper function to clear the last line in the terminal t3404: make the 'rebase.missingCommitsCheck=ignore' test more focused t3404: modernize here doc style
2019-07-19Merge branch 'nd/tree-walk-with-repo'Junio C Hamano1-3/+4
The tree-walk API learned to pass an in-core repository instance throughout more codepaths. * nd/tree-walk-with-repo: t7814: do not generate same commits in different repos Use the right 'struct repository' instead of the_repository match-trees.c: remove the_repo from shift_tree*() tree-walk.c: remove the_repo from get_tree_entry_follow_symlinks() tree-walk.c: remove the_repo from get_tree_entry() tree-walk.c: remove the_repo from fill_tree_descriptor() sha1-file.c: remove the_repo from read_object_with_reference()
2019-07-09Merge branch 'sg/rebase-progress'Junio C Hamano1-0/+1
Use "Erase in Line" CSI sequence that is already used in the editor support to clear cruft in the progress output. * sg/rebase-progress: progress: use term_clear_line() rebase: fix garbled progress display with '-x' pager: add a helper function to clear the last line in the terminal t3404: make the 'rebase.missingCommitsCheck=ignore' test more focused t3404: modernize here doc style
2019-06-28list-objects-filter-options: clean up use of ALLOC_GROWMatthew DeVore1-0/+22
Introduce a new macro ALLOC_GROW_BY which automatically zeros the added array elements and takes care of updating the nr value. Use the macro in code introduced earlier in this patchset. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-27match-trees.c: remove the_repo from shift_tree*()Nguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-27sha1-file.c: remove the_repo from read_object_with_reference()Nguyễn Thái Ngọc Duy1-1/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25Move core_partial_clone_filter_default to promisor-remote.cChristian Couder1-1/+0
Now that we can have a different default partial clone filter for each promisor remote, let's hide core_partial_clone_filter_default as a static in promisor-remote.c to avoid it being use for anything other than managing backward compatibility. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25Move repository_format_partial_clone to promisor-remote.cChristian Couder1-1/+0
Now that we have has_promisor_remote() and can use many promisor remotes, let's hide repository_format_partial_clone as a static in promisor-remote.c to avoid it being use for anything other than managing backward compatibility. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-24pager: add a helper function to clear the last line in the terminalSZEDER Gábor1-0/+1
There are a couple of places where we want to clear the last line on the terminal, e.g. when a progress bar line is overwritten by a shorter line, then the end of that progress line would remain visible, unless we cover it up. In 'progress.c' we did this by always appending a fixed number of space characters to the next line (even if it was not shorter than the previous), but as it turned out that fixed number was not quite large enough, see the fix in 9f1fd84e15 (progress: clear previous progress update dynamically, 2019-04-12). From then on we've been keeping track of the length of the last displayed progress line and appending the appropriate number of space characters to the next line, if necessary, but, alas, this approach turned out to be error prone, see the fix in 1aed1a5f25 (progress: avoid empty line when breaking the progress line, 2019-05-19). The next patch in this series is about to fix a case where we don't clear the last line, and on occasion do end up with such garbage at the end of the line. It would be great if we could do that without the need to deal with that without meticulously computing the necessary number of space characters. So add a helper function to clear the last line on the terminal using an ANSI escape sequence, which has the advantage to clear the whole line no matter how wide it is, even after the terminal width changed. Such an escape sequence is not available on dumb terminals, though, so in that case fall back to simply print a whole terminal width (as reported by term_columns()) worth of space characters. In 'editor.c' launch_specified_editor() already used this ANSI escape sequence, so replace it with a call to this function. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20hash.h: move object_id definition from cache.hJeff King1-24/+0
Our hashmap.h helpfully defines a sha1hash() function. But it cannot define a similar oidhash() without including all of cache.h, which itself wants to include hashmap.h! Let's break this circular dependency by moving the definition to hash.h, along with the remaining RAWSZ macros, etc. That will put them with the existing git_hash_algo definition. One alternative would be to move oidhash() into cache.h, but it's already quite bloated. We're better off moving things out than in. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28fill_stat_cache_info(): prepare for an fsmonitor fixJohannes Schindelin1-1/+1
We will need to pass down the `struct index_state` to `mark_fsmonitor_valid()` for an upcoming bug fix, and this here function calls that there function, so we need to extend the signature of `fill_stat_cache_info()` first. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-19Merge branch 'js/fsmonitor-refresh-after-discarding-index'Junio C Hamano1-1/+2
The fsmonitor interface got out of sync after the in-core index file gets discarded, which has been corrected. * js/fsmonitor-refresh-after-discarding-index: fsmonitor: force a refresh after the index was discarded fsmonitor: demonstrate that it is not refreshed after discard_index()
2019-05-13Merge branch 'dl/no-extern-in-func-decl'Junio C Hamano1-178/+178
Mechanically and systematically drop "extern" from function declarlation. * dl/no-extern-in-func-decl: *.[ch]: manually align parameter lists *.[ch]: remove extern from function declarations using sed *.[ch]: remove extern from function declarations using spatch
2019-05-09Merge branch 'nd/sha1-name-c-wo-the-repository'Junio C Hamano1-16/+36
Further code clean-up to allow the lowest level of name-to-object mapping layer to work with a passed-in repository other than the default one. * nd/sha1-name-c-wo-the-repository: (34 commits) sha1-name.c: remove the_repo from get_oid_mb() sha1-name.c: remove the_repo from other get_oid_* sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name submodule-config.c: use repo_get_oid for reading .gitmodules sha1-name.c: add repo_get_oid() sha1-name.c: remove the_repo from get_oid_with_context_1() sha1-name.c: remove the_repo from resolve_relative_path() sha1-name.c: remove the_repo from diagnose_invalid_index_path() sha1-name.c: remove the_repo from handle_one_ref() sha1-name.c: remove the_repo from get_oid_1() sha1-name.c: remove the_repo from get_oid_basic() sha1-name.c: remove the_repo from get_describe_name() sha1-name.c: remove the_repo from get_oid_oneline() sha1-name.c: add repo_interpret_branch_name() sha1-name.c: remove the_repo from interpret_branch_mark() sha1-name.c: remove the_repo from interpret_nth_prior_checkout() sha1-name.c: remove the_repo from get_short_oid() sha1-name.c: add repo_for_each_abbrev() sha1-name.c: store and use repo in struct disambiguate_state sha1-name.c: add repo_find_unique_abbrev_r() ...
2019-05-09Merge branch 'en/merge-directory-renames'Junio C Hamano1-1/+1
"git merge-recursive" backend recently learned a new heuristics to infer file movement based on how other files in the same directory moved. As this is inherently less robust heuristics than the one based on the content similarity of the file itself (rather than based on what its neighbours are doing), it sometimes gives an outcome unexpected by the end users. This has been toned down to leave the renamed paths in higher/conflicted stages in the index so that the user can examine and confirm the result. * en/merge-directory-renames: merge-recursive: switch directory rename detection default merge-recursive: give callers of handle_content_merge() access to contents merge-recursive: track information associated with directory renames t6043: fix copied test description to match its purpose merge-recursive: switch from (oid,mode) pairs to a diff_filespec merge-recursive: cleanup handle_rename_* function signatures merge-recursive: track branch where rename occurred in rename struct merge-recursive: remove ren[12]_other fields from rename_conflict_info merge-recursive: shrink rename_conflict_info merge-recursive: move some struct declarations together merge-recursive: use 'ci' for rename_conflict_info variable name merge-recursive: rename locals 'o' and 'a' to 'obuf' and 'abuf' merge-recursive: rename diff_filespec 'one' to 'o' merge-recursive: rename merge_options argument from 'o' to 'opt' Use 'unsigned short' for mode, like diff_filespec does
2019-05-08fsmonitor: force a refresh after the index was discardedJohannes Schindelin1-1/+2
With this change, the `index_state` struct becomes the new home for the flag that says whether the fsmonitor hook has been run, i.e. it is now per-index. It also gets re-set when the index is discarded, fixing the bug demonstrated by the "test_expect_failure" test added in the preceding commit. In that case fsmonitor-enabled Git would miss updates under certain circumstances, see that preceding commit for details. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-05*.[ch]: manually align parameter listsDenton Liu1-21/+21
In previous patches, extern was mechanically removed from function declarations without care to formatting, causing parameter lists to be misaligned. Manually format changed sections such that the parameter lists should be realigned. Viewing this patch with 'git diff -w' should produce no output. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-05*.[ch]: remove extern from function declarations using sedDenton Liu1-2/+2
There has been a push to remove extern from function declarations. Finish the job by removing all instances of "extern" for function declarations in headers using sed. This was done by running the following on my system with sed 4.2.2: $ git ls-files \*.{c,h} | grep -v ^compat/ | xargs sed -i'' -e 's/^\(\s*\)extern \([^(]*([^*]\)/\1\2/' Files under `compat/` are intentionally excluded as some are directly copied from external sources and we should avoid churning them as much as possible. Then, leftover instances of extern were found by running $ git grep -w -C3 extern \*.{c,h} and manually checking the output. No other instances were found. Note that the regex used specifically excludes function variables which _should_ be left as extern. Not the most elegant way to do it but it gets the job done. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-05*.[ch]: remove extern from function declarations using spatchDenton Liu1-170/+170
There has been a push to remove extern from function declarations. Remove some instances of "extern" for function declarations which are caught by Coccinelle. Note that Coccinelle has some difficulty with processing functions with `__attribute__` or varargs so some `extern` declarations are left behind to be dealt with in a future patch. This was the Coccinelle patch used: @@ type T; identifier f; @@ - extern T f(...); and it was run with: $ git ls-files \*.{c,h} | grep -v ^compat/ | xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place Files under `compat/` are intentionally excluded as some are directly copied from external sources and we should avoid churning them as much as possible. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25Merge branch 'bp/post-index-change-hook'Junio C Hamano1-1/+3
A new hook "post-index-change" is called when the on-disk index file changes, which can help e.g. a virtualized working tree implementation. * bp/post-index-change-hook: read-cache: add post-index-change hook
2019-04-22Merge branch 'ps/stash-in-c'Junio C Hamano1-0/+5
"git stash" rewritten in C. * ps/stash-in-c: (28 commits) tests: add a special setup where stash.useBuiltin is off stash: optionally use the scripted version again stash: add back the original, scripted `git stash` stash: convert `stash--helper.c` into `stash.c` stash: replace all `write-tree` child processes with API calls stash: optimize `get_untracked_files()` and `check_changes()` stash: convert save to builtin stash: make push -q quiet stash: convert push to builtin stash: convert create to builtin stash: convert store to builtin stash: convert show to builtin stash: convert list to builtin stash: convert pop to builtin stash: convert branch to builtin stash: convert drop and clear to builtin stash: convert apply to builtin stash: mention options in `show` synopsis stash: add tests for `git stash show` config stash: rename test cases to be more descriptive ...
2019-04-16sha1-name.c: remove the_repo from get_oid_mb()Nguyễn Thái Ngọc Duy1-1/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16sha1-name.c: remove the_repo from other get_oid_*Nguyễn Thái Ngọc Duy1-6/+12
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_nameNguyễn Thái Ngọc Duy1-1/+3
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16sha1-name.c: add repo_get_oid()Nguyễn Thái Ngọc Duy1-1/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16sha1-name.c: remove the_repo from get_oid_1()Nguyễn Thái Ngọc Duy1-2/+5
There is a cyclic dependency between one of these functions so they cannot be converted one by one, so all related functions are converted at once. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16sha1-name.c: add repo_for_each_abbrev()Nguyễn Thái Ngọc Duy1-1/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16sha1-name.c: add repo_find_unique_abbrev_r()Nguyễn Thái Ngọc Duy1-2/+4
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08refs.c: remove the_repo from substitute_branch_name()Nguyễn Thái Ngọc Duy1-2/+6
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08Use 'unsigned short' for mode, like diff_filespec doesElijah Newren1-1/+1
struct diff_filespec defines mode to be an 'unsigned short'. Several other places in the API which we'd like to interact with using a diff_filespec used a plain unsigned (or unsigned int). This caused problems when taking addresses, so switch to unsigned short. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20Merge branch 'ma/clear-repository-format'Junio C Hamano1-3/+28
The setup code has been cleaned up to avoid leaks around the repository_format structure. * ma/clear-repository-format: setup: fix memory leaks with `struct repository_format` setup: free old value before setting `work_tree`
2019-03-07Merge branch 'jh/trace2'Junio C Hamano1-0/+1
A more structured way to obtain execution trace has been added. * jh/trace2: trace2: add for_each macros to clang-format trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh trace2:data: add subverb for rebase trace2:data: add subverb to reset command trace2:data: add subverb to checkout command trace2:data: pack-objects: add trace2 regions trace2:data: add trace2 instrumentation to index read/write trace2:data: add trace2 hook classification trace2:data: add trace2 transport child classification trace2:data: add trace2 sub-process classification trace2:data: add editor/pager child classification trace2:data: add trace2 regions to wt-status trace2: collect Windows-specific process information trace2: create new combined trace facility trace2: Documentation/technical/api-trace2.txt
2019-03-07Merge branch 'wh/author-committer-ident-config'Junio C Hamano1-2/+11
Four new configuration variables {author,committer}.{name,email} have been introduced to override user.{name,email} in more specific cases. * wh/author-committer-ident-config: config: allow giving separate author and committer idents
2019-03-07Merge branch 'tg/checkout-no-overlay'Junio C Hamano1-1/+6
"git checkout --no-overlay" can be used to trigger a new mode of checking out paths out of the tree-ish, that allows paths that match the pathspec that are in the current index and working tree and are not in the tree-ish. * tg/checkout-no-overlay: revert "checkout: introduce checkout.overlayMode config" checkout: introduce checkout.overlayMode config checkout: introduce --{,no-}overlay option checkout: factor out mark_cache_entry_for_checkout function checkout: clarify comment read-cache: add invalidate parameter to remove_marked_cache_entries entry: support CE_WT_REMOVE flag in checkout_entry entry: factor out unlink_entry function move worktree tests to t24*
2019-03-07ident: don't require calling prepare_fallback_ident firstThomas Gummerer1-1/+0
In fd5a58477c ("ident: add the ability to provide a "fallback identity"", 2019-02-25) I made it a requirement to call prepare_fallback_ident as the first function in the ident API. However in stash we didn't actually end up following that. This leads to a BUG if user.email and user.name are set. It was not caught in the test suite because we only rely on environment variables for setting the user name and email instead of the config. Instead of making it a bug to call other functions in the ident API first, just return silently if the identity of a user was already set up. Reported-by: Denton Liu <liu.denton@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-01setup: fix memory leaks with `struct repository_format`Martin Ågren1-3/+28
After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-01ident: add the ability to provide a "fallback identity"Johannes Schindelin1-0/+5
In 3bc2111fc2e9 (stash: tolerate missing user identity, 2018-11-18), `git stash` learned to provide a fallback identity for the case that no proper name/email was given (and `git stash` does not really care about a correct identity anyway, but it does want to create a commit object). In preparation for the same functionality in the upcoming built-in version of `git stash`, let's offer the same functionality as an API function. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> [tg: add docs; make it a bug to call the function before other functions in the ident API] Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-01sha1-name.c: add `get_oidf()` which acts like `get_oid()`Paul-Sebastian Ungureanu1-0/+1
Compared to `get_oid()`, `get_oidf()` has as parameters a pointer to `object_id`, a printf format string and additional arguments. This will help simplify the code in subsequent commits. Original-idea-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Paul-Sebastian Ungureanu <ungureanupaulsebastian@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22trace2: create new combined trace facilityJeff Hostetler1-0/+1
Create a new unified tracing facility for git. The eventual intent is to replace the current trace_printf* and trace_performance* routines with a unified set of git_trace2* routines. In addition to the usual printf-style API, trace2 provides higer-level event verbs with fixed-fields allowing structured data to be written. This makes post-processing and analysis easier for external tools. Trace2 defines 3 output targets. These are set using the environment variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT". These may be set to "1" or to an absolute pathname (just like the current GIT_TRACE). * GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command summary data. * GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE. It extends the output with columns for the command process, thread, repo, absolute and relative elapsed times. It reports events for child process start/stop, thread start/stop, and per-thread function nesting. * GIT_TR2_EVENT is a new structured format. It writes event data as a series of JSON records. Calls to trace2 functions log to any of the 3 output targets enabled without the need to call different trace_printf* or trace_performance* routines. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-15read-cache: add post-index-change hookBen Peart1-1/+3
Add a post-index-change hook that is invoked after the index is written in do_write_locked_index(). This hook is meant primarily for notification, and cannot affect the outcome of git commands that trigger the index write. The hook is passed a flag to indicate whether the working directory was updated or not and a flag indicating if a skip-worktree bit could have changed. These flags enable the hook to optimize its response to the index change notification. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06Merge branch 'jk/loose-object-cache-oid'Junio C Hamano1-3/+3
Code clean-up. * jk/loose-object-cache-oid: prefer "hash mismatch" to "sha1 mismatch" sha1-file: avoid "sha1 file" for generic use in messages sha1-file: prefer "loose object file" to "sha1 file" in messages sha1-file: drop has_sha1_file() convert has_sha1_file() callers to has_object_file() sha1-file: convert pass-through functions to object_id sha1-file: modernize loose header/stream functions sha1-file: modernize loose object file functions http: use struct object_id instead of bare sha1 update comment references to sha1_object_info() sha1-file: fix outdated sha1 comment references
2019-02-06Merge branch 'lt/date-human'Junio C Hamano1-0/+3
A new date format "--date=human" that morphs its output depending on how far the time is from the current time has been introduced. "--date=auto" can be used to use this new format when the output is going to the pager or to the terminal and otherwise the default format. * lt/date-human: Add `human` date format tests. Add `human` format to test-tool Add 'human' date format documentation Replace the proposed 'auto' mode with 'auto:' Add 'human' date format
2019-02-06Merge branch 'jk/unused-parameter-cleanup'Junio C Hamano1-1/+1
Code cleanup. * jk/unused-parameter-cleanup: convert: drop path parameter from actual conversion functions convert: drop len parameter from conversion checks config: drop unused parameter from maybe_remove_section() show_date_relative(): drop unused "tz" parameter column: drop unused "opts" parameter in item_length() create_bundle(): drop unused "header" parameter apply: drop unused "def" parameter from find_name_gnu() match-trees: drop unused path parameter from score functions
2019-02-06Merge branch 'nd/the-index-final'Junio C Hamano1-23/+13
The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument
2019-02-06Merge branch 'tt/bisect-in-c'Junio C Hamano1-0/+3
More code in "git bisect" has been rewritten in C. * tt/bisect-in-c: bisect--helper: `bisect_start` shell function partially in C bisect--helper: `get_terms` & `bisect_terms` shell function in C bisect--helper: `bisect_next_check` shell function in C bisect--helper: `check_and_set_terms` shell function in C wrapper: move is_empty_file() and rename it as is_empty_or_missing_file() bisect--helper: `bisect_write` shell function in C bisect--helper: `bisect_reset` shell function in C
2019-02-06Merge branch 'dt/cat-file-batch-ambiguous'Junio C Hamano1-1/+19
"git cat-file --batch" reported a dangling symbolic link by mistake, when it wanted to report that a given name is ambiguous. * dt/cat-file-batch-ambiguous: t1512: test ambiguous cat-file --batch and --batch-output Do not print 'dangling' for cat-file in case of ambiguity
2019-02-05Merge branch 'jk/add-ignore-errors-bit-assignment-fix'Junio C Hamano1-0/+1
"git add --ignore-errors" did not work as advertised and instead worked as an unintended synonym for "git add --renormalize", which has been fixed. * jk/add-ignore-errors-bit-assignment-fix: add: use separate ADD_CACHE_RENORMALIZE flag
2019-02-04config: allow giving separate author and committer identsWilliam Hubbs1-2/+11
The author.email, author.name, committer.email and committer.name settings are analogous to the GIT_AUTHOR_* and GIT_COMMITTER_* environment variables, but for the git config system. This allows them to be set separately for each repository. Git supports setting different authorship and committer information with environment variables. However, environment variables are set in the shell, so if different authorship and committer information is needed for different repositories an external tool is required. This adds support to git config for author.email, author.name, committer.email and committer.name settings so this information can be set per repository. Also, it generalizes the fmt_ident function so it can handle author vs committer identification. Signed-off-by: William Hubbs <williamh@gentoo.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-29Merge branch 'bc/tree-walk-oid'Junio C Hamano1-1/+1
The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use
2019-01-29Merge branch 'bc/sha-256'Junio C Hamano1-18/+33
Add sha-256 hash and plug it through the code to allow building Git with the "NewHash". * bc/sha-256: hash: add an SHA-256 implementation using OpenSSL sha256: add an SHA-256 implementation using libgcrypt Add a base implementation of SHA-256 support commit-graph: convert to using the_hash_algo t/helper: add a test helper to compute hash speed sha1-file: add a constant for hash block size t: make the sha1 test-tool helper generic t: add basic tests for our SHA-1 implementation cache: make hashcmp and hasheq work with larger hashes hex: introduce functions to print arbitrary hashes sha1-file: provide functions to look up hash algorithms sha1-file: rename algorithm to "sha1"
2019-01-29Add `human` format to test-toolStephen P. Smith1-0/+2
Add the human format support to the test tool so that GIT_TEST_DATE_NOW can be used to specify the current time. The get_time() helper function was created and and checks the GIT_TEST_DATE_NOW environment variable. If GIT_TEST_DATE_NOW is set, then that date is used instead of the date returned by by gettimeofday(). All calls to gettimeofday() were replaced by calls to get_time(). Renamed occurances of TEST_DATE_NOW to GIT_TEST_DATE_NOW since the variable is now used in the get binary and not just in the test-tool. Signed-off-by: Stephen P. Smith <ischis2@cox.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24show_date_relative(): drop unused "tz" parameterJeff King1-1/+1
The timestamp we receive is in epoch time, so there's no need for a timezone parameter to interpret it. The matching show_date() uses "tz" to show dates in author local time, but relative dates show only the absolute time difference. The author's location is irrelevant, barring relativistic effects from using Git close to the speed of light. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switchNguyễn Thái Ngọc Duy1-3/+3
By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18Do not print 'dangling' for cat-file in case of ambiguityDavid Turner1-1/+19
The return values -1 and -2 from get_oid could mean two different things, depending on whether they were from an enum returned by get_tree_entry_follow_symlinks, or from a different code path. This caused 'dangling' to be printed from a git cat-file in the case of an ambiguous (-2) result. Unify the results of get_oid* and get_tree_entry_follow_symlinks to be one common type, with unambiguous values. Signed-off-by: David Turner <novalis@novalis.org> Reported-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18Add 'human' date formatLinus Torvalds1-0/+1
This adds --date=human, which skips the timezone if it matches the current time-zone, and doesn't print the whole date if that matches (ie skip printing year for dates that are "this year", but also skip the whole date itself if it's in the last few days and we can just say what weekday it was). For really recent dates (same day), use the relative date stamp, while for old dates (year doesn't match), don't bother with time and timezone. Also add 'auto' date mode, which defaults to human if we're using the pager. So you can do git config --add log.date auto and your "git log" commands will show the human-legible format unless you're scripting things. Note that this time format still shows the timezone for recent enough events (but not so recent that they show up as relative dates). You can combine it with the "-local" suffix to never show timezones for an even more simplified view. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Stephen P. Smith <ischis2@cox.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17add: use separate ADD_CACHE_RENORMALIZE flagJeff King1-0/+1
Commit 9472935d81 (add: introduce "--renormalize", 2017-11-16) taught git-add to pass HASH_RENORMALIZE to add_to_index(), which then passes the flag along to index_path(). However, the flags taken by add_to_index() and the ones taken by index_path() are distinct namespaces. We cannot take HASH_* flags in add_to_index(), because they overlap with the ADD_CACHE_* flags we already take (in this case, HASH_RENORMALIZE conflicts with ADD_CACHE_IGNORE_ERRORS). We can solve this by adding a new ADD_CACHE_RENORMALIZE flag, and using it to set HASH_RENORMALIZE within add_to_index(). In order to make it clear that these two flags come from distinct sets, let's also change the name "newflags" in the function to "hash_flags". Reported-by: Dmitriy Smirnov <dmitriy.smirnov@jetbrains.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15cache: make oidcpy always copy GIT_MAX_RAWSZ bytesbrian m. carlson1-1/+1
There are some situations in which we want to store an object ID into struct object_id without the_hash_algo necessarily being set correctly. One such case is when cloning a repository, where we must read refs from the remote side without having a repository from which to read the preferred algorithm. In this cases, we may have the_hash_algo set to SHA-1, which is the default, but read refs into struct object_id that are SHA-256. When copying these values, we will want to copy them completely, not just the first 20 bytes. Consequently, make sure that oidcpy copies the maximum number of bytes at all times, regardless of the setting of the_hash_algo. Since oidcpy and hashcpy are no longer functionally identical, remove the Cocinelle object_id transformations that convert from one into the other. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14read-cache.c: remove the_* from index_has_changes()Nguyễn Thái Ngọc Duy1-3/+3
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14sha1-name.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-1/+3
This kills the_index dependency in get_oid_with_context() but for get_oid() and friends, they still assume the_repository (which also means the_index). Unfortunately the widespread use of get_oid() will make it hard to make the conversion now. We probably will add repo_get_oid() at some point and limit the use of get_oid() in builtin/ instead of forcing all get_oid() call sites to carry struct repository. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14read-cache.c: replace update_index_if_able with repo_&Nguyễn Thái Ngọc Duy1-6/+0
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14read-cache.c: kill read_index()Nguyễn Thái Ngọc Duy1-8/+3
read_index() shares the same problem as hold_locked_index(): it assumes $GIT_DIR/index. Move all call sites to repo_read_index() instead. read_index_preload() and read_index_unmerged() are also killed as a consequence. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14repository.c: replace hold_locked_index() with repo_hold_locked_index()Nguyễn Thái Ngọc Duy1-1/+1
hold_locked_index() assumes the index path at $GIT_DIR/index. This is not good for places that take an arbitrary index_state instead of the_index, which is basically everywhere except builtin/. Replace it with repo_hold_locked_index(). hold_locked_index() remains as a wrapper around repo_hold_locked_index() to reduce changes in builtin/ Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08sha1-file: modernize loose header/stream functionsJeff King1-2/+2
As with the open/map/close functions for loose objects that were recently converted, the functions for parsing the loose object stream use the name "sha1" and a bare "unsigned char *". Let's fix that so that unpack_sha1_header() becomes unpack_loose_header(), etc. These conversions are less clear-cut than the file access functions. You could argue that the they are parsing Git's canonical object format (i.e., "type size\0contents", over which we compute the hash), which is not strictly tied to loose storage. But in practice these functions are used only for loose objects, and using the term "loose_header" (instead of "object_header") distinguishes it from the object header found in packfiles (which contains the same information in a different format). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08update comment references to sha1_object_info()Jeff King1-1/+1
Commit abef9020e3 (sha1_file: convert sha1_object_info* to object_id, 2018-03-12) renamed the function to oid_object_info(), but missed some comments which mention it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02read-cache: add invalidate parameter to remove_marked_cache_entriesThomas Gummerer1-1/+1
When marking cache entries for removal, and later removing them all at once using 'remove_marked_cache_entries()', cache entries currently have to be invalidated manually in the cache tree and in the untracked cache. Add an invalidate flag to the function. With the flag set, the function will take care of invalidating the path in the cache tree and in the untracked cache. Note that the current callsites already do the invalidation properly in other places, so we're just passing 0 from there to keep the status quo. This will be useful in a subsequent commit. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02entry: factor out unlink_entry functionThomas Gummerer1-0/+5
Factor out the 'unlink_entry()' function from unpack-trees.c to entry.c. It will be used in other places as well in subsequent steps. As it's no longer a static function, also move the documentation to the header file to make it more discoverable. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02wrapper: move is_empty_file() and rename it as is_empty_or_missing_file()Pranit Bauva1-0/+3
is_empty_file() can help to refactor a lot of code. This will be very helpful in porting "git bisect" to C. Suggested-by: Torsten Bögershausen <tboegi@web.de> Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-14Add a base implementation of SHA-256 supportbrian m. carlson1-3/+9
SHA-1 is weak and we need to transition to a new hash function. For some time, we have referred to this new function as NewHash. Recently, we decided to pick SHA-256 as NewHash. The reasons behind the choice of SHA-256 are outlined in the thread starting at [1] and in the commit history for the hash function transition document. Add a basic implementation of SHA-256 based off libtomcrypt, which is in the public domain. Optimize it and restructure it to meet our coding standards. Pull in the update and final functions from the SHA-1 block implementation, as we know these function correctly with all compilers. This implementation is slower than SHA-1, but more performant implementations will be introduced in future commits. Wire up SHA-256 in the list of hash algorithms, and add a test that the algorithm works correctly. Note that with this patch, it is still not possible to switch to using SHA-256 in Git. Additional patches are needed to prepare the code to handle a larger hash algorithm and further test fixes are needed. [1] https://public-inbox.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/ Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-14sha1-file: add a constant for hash block sizebrian m. carlson1-0/+4
There is one place we need the hash algorithm block size: the HMAC code for push certs. Expose this constant in struct git_hash_algo and expose values for SHA-1 and for the largest value of any hash. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-14cache: make hashcmp and hasheq work with larger hashesbrian m. carlson1-10/+12
In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we modified hashcmp to assert that the hash size was always 20 to help it optimize and inline calls to memcmp. In a future series, we replaced many calls to hashcmp and oidcmp with calls to hasheq and oideq to improve inlining further. However, we want to support hash algorithms other than SHA-1, namely SHA-256. When doing so, we must handle the case where these values are 32 bytes long as well as 20. Adjust hashcmp to handle two cases: 20-byte matches, and maximum-size matches. Therefore, when we include SHA-256, we'll automatically handle it properly, while at the same time teaching the compiler that there are only two possible options to consider. This will allow the compiler to write the most efficient possible code. Copy similar code into hasheq and perform an identical transformation. At least with GCC 8.2.0, making hasheq defer to hashcmp when there are two branches prevents the compiler from inlining the comparison, while the code in this patch is inlined properly. Add a comment to avoid an accidental performance regression from well-intentioned refactoring. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-14hex: introduce functions to print arbitrary hashesbrian m. carlson1-6/+9
Currently, we have functions that turn an arbitrary SHA-1 value or an object ID into hex format, either using a static buffer or with a user-provided buffer. Add variants of these functions that can handle an arbitrary hash algorithm, specified by constant. Update the documentation as well. While we're at it, remove the "extern" declaration from this family of functions, since it's not needed and our style now recommends against it. We use the variant taking the algorithm structure pointer as the internal variant, since taking an algorithm pointer is the easiest way to handle all of the variants in use. Note that we maintain these functions because there are hashes which must change based on the hash algorithm in use but are not object IDs (such as pack checksums). Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-14checkout: print something when checking out pathsNguyễn Thái Ngọc Duy1-2/+2
One of the problems with "git checkout" is that it does so many different things and could confuse people specially when we fail to handle ambiguation correctly. One way to help with that is tell the user what sort of operation is actually carried out. When switching branches, we always print something unless --quiet, either - "HEAD is now at ..." - "Reset branch ..." - "Already on ..." - "Switched to and reset ..." - "Switched to a new branch ..." - "Switched to branch ..." Checking out paths however is silent. Print something so that if we got the user intention wrong, they won't waste too much time to find that out. For the remaining cases of checkout we now print either - "Checked out ... paths out of the index" - "Checked out ... paths out of <abbrev hash>" Since the purpose of printing this is to help disambiguate. Only do it when "--" is missing. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13Merge branch 'bp/refresh-index-using-preload'Junio C Hamano1-0/+3
The helper function to refresh the cached stat information in the in-core index has learned to perform the lstat() part of the operation in parallel on multi-core platforms. * bp/refresh-index-using-preload: refresh_index: remove unnecessary calls to preload_index() speed up refresh_index() by utilizing preload_index()
2018-11-13Merge branch 'ao/submodule-wo-gitmodules-checked-out'Junio C Hamano1-0/+2
The submodule support has been updated to read from the blob at HEAD:.gitmodules when the .gitmodules file is missing from the working tree. * ao/submodule-wo-gitmodules-checked-out: t/helper: add test-submodule-nested-repo-config submodule: support reading .gitmodules when it's not in the working tree submodule: add a helper to check if it is safe to write to .gitmodules t7506: clean up .gitmodules properly before setting up new scenario submodule: use the 'submodule--helper config' command submodule--helper: add a new 'config' subcommand t7411: be nicer to future tests and really clean things up t7411: merge tests 5 and 6 submodule: factor out a config_set_in_gitmodules_file_gently function submodule: add a print_config_from_gitmodules() helper
2018-11-13Merge branch 'js/mingw-perl5lib'Junio C Hamano1-8/+0
Windows fix. * js/mingw-perl5lib: mingw: unset PERL5LIB by default config: move Windows-specific config settings into compat/mingw.c config: allow for platform-specific core.* config settings config: rename `dummy` parameter to `cb` in git_default_config()
2018-11-13Merge branch 'nd/per-worktree-config'Junio C Hamano1-0/+2
A fourth class of configuration files (in addition to the traditional "system wide", "per user in the $HOME directory" and "per repository in the $GIT_DIR/config") has been introduced so that different worktrees that share the same repository (hence the same $GIT_DIR/config file) can use different customization. * nd/per-worktree-config: worktree: add per-worktree config files t1300: extract and use test_cmp_config()
2018-11-02Merge branch 'ag/rebase-i-in-c'Junio C Hamano1-0/+1
Rewrite of the remaining "rebase -i" machinery in C. * ag/rebase-i-in-c: rebase -i: move rebase--helper modes to rebase--interactive rebase -i: remove git-rebase--interactive.sh rebase--interactive2: rewrite the submodes of interactive rebase in C rebase -i: implement the main part of interactive rebase as a builtin rebase -i: rewrite init_basic_state() in C rebase -i: rewrite write_basic_state() in C rebase -i: rewrite the rest of init_revisions_and_shortrevisions() in C rebase -i: implement the logic to initialize $revisions in C rebase -i: remove unused modes and functions rebase -i: rewrite complete_action() in C t3404: todo list with commented-out commands only aborts sequencer: change the way skip_unnecessary_picks() returns its result sequencer: refactor append_todo_help() to write its message to a buffer rebase -i: rewrite checkout_onto() in C rebase -i: rewrite setup_reflog_action() in C sequencer: add a new function to silence a command, except if it fails rebase -i: rewrite the edit-todo functionality in C editor: add a function to launch the sequence editor rebase -i: rewrite append_todo_help() in C sequencer: make three functions and an enum from sequencer.c public
2018-10-31config: move Windows-specific config settings into compat/mingw.cJohannes Schindelin1-8/+0
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30speed up refresh_index() by utilizing preload_index()Ben Peart1-0/+3
Speed up refresh_index() by utilizing preload_index() to do most of the work spread across multiple threads. This works because most cache entries will get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when called from within refresh_index(). On a Windows repo with ~200K files, this drops refresh times from 6.64 seconds to 2.87 seconds for a savings of 57%. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-26Merge branch 'sg/split-index-racefix'Junio C Hamano1-0/+2
The codepath to support the experimental split-index mode had remaining "racily clean" issues fixed. * sg/split-index-racefix: split-index: BUG() when cache entry refers to non-existing shared entry split-index: smudge and add racily clean cache entries to split index split-index: don't compare cached data of entries already marked for split index split-index: count the number of deleted entries t1700-split-index: date back files to avoid racy situations split-index: add tests to demonstrate the racy split index problem t1700-split-index: document why FSMONITOR is disabled in this test script
2018-10-22worktree: add per-worktree config filesNguyễn Thái Ngọc Duy1-0/+2
A new repo extension is added, worktreeConfig. When it is present: - Repository config reading by default includes $GIT_DIR/config _and_ $GIT_DIR/config.worktree. "config" file remains shared in multiple worktree setup. - The special treatment for core.bare and core.worktree, to stay effective only in main worktree, is gone. These config settings are supposed to be in config.worktree. This extension is most useful in multiple worktree setup because you now have an option to store per-worktree config (which is either .git/config.worktree for main worktree, or .git/worktrees/xx/config.worktree for linked ones). This extension can be used in single worktree mode, even though it's pretty much useless (but this can happen after you remove all linked worktrees and move back to single worktree). "git config" reads from both "config" and "config.worktree" by default (i.e. without either --user, --file...) when this extension is present. Default writes still go to "config", not "config.worktree". A new option --worktree is added for that (*). Since a new repo extension is introduced, existing git binaries should refuse to access to the repo (both from main and linked worktrees). So they will not misread the config file (i.e. skip the config.worktree part). They may still accidentally write to the config file anyway if they use with "git config --file <path>". This design places a bet on the assumption that the majority of config variables are shared so it is the default mode. A safer move would be default writes go to per-worktree file, so that accidental changes are isolated. (*) "git config --worktree" points back to "config" file when this extension is not present and there is only one worktree so that it works in any both single and multiple worktree setups. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19Merge branch 'nd/status-refresh-progress'Junio C Hamano1-2/+5
"git status" learns to show progress bar when refreshing the index takes a long time. * nd/status-refresh-progress: status: show progress bar if refreshing the index takes too long
2018-10-19Merge branch 'nd/the-index'Junio C Hamano1-6/+8
Various codepaths in the core-ish part learn to work on an arbitrary in-core index structure, not necessarily the default instance "the_index". * nd/the-index: (23 commits) revision.c: reduce implicit dependency the_repository revision.c: remove implicit dependency on the_index ws.c: remove implicit dependency on the_index tree-diff.c: remove implicit dependency on the_index submodule.c: remove implicit dependency on the_index line-range.c: remove implicit dependency on the_index userdiff.c: remove implicit dependency on the_index rerere.c: remove implicit dependency on the_index sha1-file.c: remove implicit dependency on the_index patch-ids.c: remove implicit dependency on the_index merge.c: remove implicit dependency on the_index merge-blobs.c: remove implicit dependency on the_index ll-merge.c: remove implicit dependency on the_index diff-lib.c: remove implicit dependency on the_index read-cache.c: remove implicit dependency on the_index diff.c: remove implicit dependency on the_index grep.c: remove implicit dependency on the_index diff.c: remove the_index dependency in textconv() functions blame.c: rename "repo" argument to "r" combine-diff.c: remove implicit dependency on the_index ...
2018-10-12split-index: smudge and add racily clean cache entries to split indexSZEDER Gábor1-0/+2
Ever since the split index feature was introduced [1], refreshing a split index is prone to a variant of the classic racy git problem. Consider the following sequence of commands updating the split index when the shared index contains a racily clean cache entry, i.e. an entry whose cached stat data matches with the corresponding file in the worktree and the cached mtime matches that of the index: echo "cached content" >file git update-index --split-index --add file echo "dirty worktree" >file # size stays the same! # ... wait ... git update-index --add other-file Normally, when a non-split index is updated, then do_write_index() (the function responsible for writing all kinds of indexes, "regular", split, and shared) recognizes racily clean cache entries, and writes them with smudged stat data, i.e. with file size set to 0. When subsequent git commands read the index, they will notice that the smudged stat data doesn't match with the file in the worktree, and then go on to check the file's content and notice its dirtiness. In the above example, however, in the second 'git update-index' prepare_to_write_split_index() decides which cache entries stored only in the shared index should be replaced in the new split index. Alas, this function never looks out for racily clean cache entries, and since the file's stat data in the worktree hasn't changed since the shared index was written, it won't be replaced in the new split index. Consequently, do_write_index() doesn't even get this racily clean cache entry, and can't smudge its stat data. Subsequent git commands will then see that the index has more recent mtime than the file and that the (not smudged) cached stat data still matches with the file in the worktree, and, ultimately, will erroneously consider the file clean. Modify prepare_to_write_split_index() to recognize racily clean cache entries, and mark them to be added to the split index. Note that there are two places where it should check raciness: first those cache entries that are only stored in the shared index, and then those that have been copied by unpack_trees() from the shared index while it constructed a new index. This way do_write_index() will get these racily clean cache entries as well, and will then write them with smudged stat data to the new split index. This change makes all tests in 't1701-racy-split-index.sh' pass, so flip the two 'test_expect_failure' tests to success. Also add the '#' (as in nr. of trial) to those tests' description that were omitted when the tests expected failure. Note that after this change if the index is split when it contains a racily clean cache entry, then a smudged cache entry will be written both to the new shared and to the new split indexes. This doesn't affect regular git commands: as far as they are concerned this is just an entry in the split index replacing an outdated entry in the shared index. It did affect a few tests in 't1700-split-index.sh', though, because they actually check which entries are stored in the split index; a previous patch in this series has already made the necessary adjustments in 't1700'. And racily clean cache entries and index splitting are rare enough to not worry about the resulting duplicated smudged cache entries, and the additional complexity required to prevent them is not worth it. Several tests failed occasionally when the test suite was run with 'GIT_TEST_SPLIT_INDEX=yes'. Here are those that I managed to trace back to this racy split index problem, starting with those failing more frequently, with a link to a failing Travis CI build job for each. The highlighted line [2] shows when the racy file was written, which is not always in the failing test but in a preceeding setup test. t3903-stash.sh: https://travis-ci.org/git/git/jobs/385542084#L5858 t4024-diff-optimize-common.sh: https://travis-ci.org/git/git/jobs/386531969#L3174 t4015-diff-whitespace.sh: https://travis-ci.org/git/git/jobs/360797600#L8215 t2200-add-update.sh: https://travis-ci.org/git/git/jobs/382543426#L3051 t0090-cache-tree.sh: https://travis-ci.org/git/git/jobs/416583010#L3679 There might be others, e.g. perhaps 't1000-read-tree-m-3way.sh' and others using 'lib-read-tree-m-3way.sh', but I couldn't confirm yet. [1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge branch 'nd/split-index', 2014-07-16). [2] Note that those highlighted lines are in the 'after failure' fold, and your browser might unhelpfully fold it up before you could take a good look. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-09submodule: add a helper to check if it is safe to write to .gitmodulesAntonio Ospite1-0/+2
Introduce a helper function named is_writing_gitmodules_ok() to verify that the .gitmodules file is safe to write. The function name follows the scheme of is_staging_gitmodules_ok(). The two symbolic constants GITMODULES_INDEX and GITMODULES_HEAD are used to get help from the C preprocessor in preventing typos, especially for future users. This is in preparation for a future change which teaches git how to read .gitmodules from the index or from the current branch if the file is not available in the working tree. The rationale behind the check is that writing to .gitmodules requires the file to be present in the working tree, unless a brand new .gitmodules is being created (in which case the .gitmodules file would not exist at all: neither in the working tree nor in the index or in the current branch). Expose the functionality also via a "submodule-helper config --check-writeable" command, as git scripts may want to perform the check before modifying submodules configuration. Signed-off-by: Antonio Ospite <ao2@ao2.it> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21ws.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-1/+1
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21sha1-file.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21merge.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-2/+4
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21read-cache.c: remove 'const' from index_has_changes()Nguyễn Thái Ngọc Duy1-1/+1
This function calls do_diff_cache() which eventually needs to set this "istate" to unpack_options->src_index [1]. This is an unfortunate fact that unpack_trees() _will_ update [2] src_index so we can't really pass a const index_state there. Just remove 'const'. [1] Right now diff_cache() in diff-lib.c assigns the_index to src_index. But the plan is to get rid of the_index, so it should be 'istate' from here that gets assigned to src_index. [2] Some transient bits in the source index are touched. Optional extensions can also be removed. But other than that the source tree should still be valid. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'jk/cocci'Junio C Hamano1-6/+16
spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion
2018-09-17Merge branch 'nd/clone-case-smashing-warning'Junio C Hamano1-0/+1
Running "git clone" against a project that contain two files with pathnames that differ only in cases on a case insensitive filesystem would result in one of the files lost because the underlying filesystem is incapable of holding both at the same time. An attempt is made to detect such a case and warn. * nd/clone-case-smashing-warning: clone: report duplicate entries on case-insensitive filesystems
2018-09-17status: show progress bar if refreshing the index takes too longNguyễn Thái Ngọc Duy1-2/+5
Refreshing the index is usually very fast, but it can still take a long time sometimes. Cold cache is one. Or copying a repo to a new place (*). It's good to show something to let the user know "git status" is not hanging, it's just busy doing something. (*) In this case, all stat info in the index becomes invalid and git falls back to rehashing all file content to see if there's any difference between updating stat info in the index. This is quite expensive. Even with a repo as small as git.git, it takes 3 seconds. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "hashcmp() == 0" to hasheq()Jeff King1-4/+4
This is the partner patch to the previous one, but covering the "hash" variants instead of "oid". Note that our coccinelle rule is slightly more complex to avoid triggering the call in hasheq(). I didn't bother to add a new rule to convert: - hasheq(E1->hash, E2->hash) + oideq(E1, E2) Since these are new functions, there won't be any such existing callers. And since most of the code is already using oideq, we're not likely to introduce new ones. We might still see "!hashcmp(E1->hash, E2->hash)" from topics in flight. But because our new rule comes after the existing ones, that should first get converted to "!oidcmp(E1, E2)" and then to "oideq(E1, E2)". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "oidcmp() == 0" to oideq()Jeff King1-2/+2
Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29introduce hasheq() and oideq()Jeff King1-0/+10
The main comparison functions we provide for comparing object ids are hashcmp() and oidcmp(). These are more flexible than a strict equality check, since they also express ordering. That makes them useful for sorting and binary searching. However, it also makes them potentially slower than a strict equality check. Consider this C code, which is traditionally what our hashcmp has looked like: #include <string.h> int hashcmp(const unsigned char *a, const unsigned char *b) { return memcmp(a, b, 20); } Compiling with "gcc -O2 -S -fverbose-asm", the generated assembly shows that we actually call memcmp(). But if we change this to a strict equality check: return !memcmp(a, b, 20); we get a faster inline version: movq (%rdi), %rax # MEM[(void *)a_4(D)], MEM[(void *)a_4(D)] movq 8(%rdi), %rdx # MEM[(void *)a_4(D)], tmp101 xorq (%rsi), %rax # MEM[(void *)b_5(D)], tmp94 xorq 8(%rsi), %rdx # MEM[(void *)b_5(D)], tmp93 orq %rax, %rdx # tmp94, tmp93 jne .L2 #, movl 16(%rsi), %eax # MEM[(void *)b_5(D)], tmp104 cmpl %eax, 16(%rdi) # tmp104, MEM[(void *)a_4(D)] je .L5 #, Obviously our hashcmp() doesn't include the "!". But because it's an inline function, optimizing compilers are able to see "!hashcmp(a,b)" in calling code and take advantage of this case. So there has been no value thus far in introducing a more restricted interface for doing strict equality checks. But as Git learns about more values for the_hash_algo, our hashcmp() will grow more complicated and may even delegate at runtime to functions optimized specifically for that hash size. That breaks the inline connection we have, and the compiler will have to assume that the caller really cares about the sign and magnitude of the memcmp() result, even though the vast majority don't. We can solve that by introducing a hasheq() function (and matching oideq() wrapper), which callers can use to make it clear that they only care about equality. For now, the implementation will literally be "!hashcmp()", but it frees us up later to introduce code optimized specifically for the equality check. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23hashcmp: assert constant hash sizeJeff King1-0/+10
Prior to 509f6f62a4 (cache: update object ID functions for the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a constant size of 20 bytes. Some compilers were able to turn that into a few quad-word comparisons, which is faster than actually calling memcmp(). In 509f6f62a4, we started using the_hash_algo->rawsz instead. Even though this will always be 20, the compiler doesn't know that while inlining hashcmp() and ends up just generating a call to memcmp(). Eventually we'll have to deal with multiple hash sizes, but for the upcoming v2.19, we can restore some of the original performance by asserting on the size. That gives the compiler enough information to know that the memcmp will always be called with a length of 20, and it performs the same optimization. Here are numbers for p0001.2 run against linux.git on a few versions. This is using -O2 with gcc 8.2.0. Test v2.18.0 v2.19.0-rc0 HEAD ------------------------------------------------------------------------------ 0001.2: 34.24(33.81+0.43) 34.83(34.42+0.40) +1.7% 33.90(33.47+0.42) -1.0% You can see that v2.19 is a little slower than v2.18. This commit ended up slightly faster than v2.18, but there's a fair bit of run-to-run noise (the generated code in the two cases is basically the same). This patch does seem to be consistently 1-2% faster than v2.19. I tried changing hashcpy(), which was also touched by 509f6f62a4, in the same way, but couldn't measure any speedup. Which makes sense, at least for this workload. A traversal of the whole commit graph requires looking up every entry of every tree via lookup_object(). That's many multiples of the numbers of objects in the repository (most of the lookups just return "yes, we already saw that object"). [jn: verified using "make object.s" that the memcmp call goes away.] Reported-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Jeff King <peff@peff.net Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20Merge branch 'en/incl-forward-decl'Junio C Hamano1-10/+0
Code hygiene improvement for the header files. * en/incl-forward-decl: Remove forward declaration of an enum compat/precompose_utf8.h: use more common include guard style urlmatch.h: fix include guard Move definition of enum branch_track from cache.h to branch.h alloc: make allocate_alloc_state and clear_alloc_state more consistent Add missing includes and forward declarations
2018-08-20Merge branch 'jk/for-each-object-iteration'Junio C Hamano1-56/+0
The API to iterate over all objects learned to optionally list objects in the order they appear in packfiles, which helps locality of access if the caller accesses these objects while as objects are enumerated. * jk/for-each-object-iteration: for_each_*_object: move declarations to object-store.h cat-file: use a single strbuf for all output cat-file: split batch "buf" into two variables cat-file: use oidset check-and-insert cat-file: support "unordered" output for --batch-all-objects cat-file: rename batch_{loose,packed}_object callbacks t1006: test cat-file --batch-all-objects with duplicates for_each_packed_object: support iterating in pack-order for_each_*_object: give more comprehensive docstrings for_each_*_object: take flag arguments as enum for_each_*_object: store flag definitions in a single location
2018-08-17clone: report duplicate entries on case-insensitive filesystemsDuy Nguyen1-0/+1
Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. In the case that we can't rely on filesystem (via inode number) to do this check, fall back to fspathcmp() which is not perfect but should not give false positives. This patch is tested with vim-colorschemes and Sublime-Gitignore repositories on a JFS partition with case insensitive support on Linux. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-15Merge branch 'js/vscode'Junio C Hamano1-11/+13
Add a script (in contrib/) to help users of VSCode work better with our codebase. * js/vscode: vscode: let cSpell work on commit messages, too vscode: add a dictionary for cSpell vscode: use 8-space tabs, no trailing ws, etc for Git's source code vscode: wrap commit messages at column 72 by default vscode: only overwrite C/C++ settings mingw: define WIN32 explicitly cache.h: extract enum declaration from inside a struct declaration vscode: hard-code a couple defines contrib: add a script to initialize VS Code configuration
2018-08-15Merge branch 'jk/core-use-replace-refs'Junio C Hamano1-4/+2
A new configuration variable core.usereplacerefs has been added, primarily to help server installations that want to ignore the replace mechanism altogether. * jk/core-use-replace-refs: add core.usereplacerefs config option check_replace_refs: rename to read_replace_refs check_replace_refs: fix outdated comment
2018-08-15Move definition of enum branch_track from cache.h to branch.hElijah Newren1-10/+0
'branch_track' feels more closely related to branching, and it is needed later in branch.h; rather than #include'ing cache.h in branch.h for this small enum, just move the enum and the external declaration for git_branch_track to branch.h. Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-14for_each_*_object: move declarations to object-store.hJeff King1-75/+0
The for_each_loose_object() and for_each_packed_object() functions are meant to be part of a unified interface: they use the same set of for_each_object_flags, and it's not inconceivable that we might one day add a single for_each_object() wrapper around them. Let's put them together in a single file, so we can avoid awkwardness like saying "the flags for this function are over in cache.h". Moving the loose functions to packfile.h is silly. Moving the packed functions to cache.h works, but makes the "cache.h is a kitchen sink" problem worse. The best place is the recently-created object-store.h, since these are quite obviously related to object storage. The for_each_*_in_objdir() functions do not use the same flags, but they are logically part of the same interface as for_each_loose_object(), and share callback signatures. So we'll move those, as well, as they also make sense in object-store.h. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13for_each_packed_object: support iterating in pack-orderJeff King1-0/+5
We currently iterate over objects within a pack in .idx order, which uses the object hashes. That means that it is effectively random with respect to the location of the object within the pack. If you're going to access the actual object data, there are two reasons to move linearly through the pack itself: 1. It improves the locality of access in the packfile. In the cold-cache case, this may mean fewer disk seeks, or better usage of disk cache. 2. We store related deltas together in the packfile. Which means that the delta base cache can operate much more efficiently if we visit all of those related deltas in sequence, as the earlier items are likely to still be in the cache. Whereas if we visit the objects in random order, our cache entries are much more likely to have been evicted by unrelated deltas in the meantime. So in general, if you're going to access the object contents pack order is generally going to end up more efficient. But if you're simply generating a list of object names, or if you're going to end up sorting the result anyway, you're better off just using the .idx order, as finding the pack order means generating the in-memory pack-revindex. According to the numbers in 8b8dfd5132 (pack-revindex: radix-sort the revindex, 2013-07-11), that takes about 200ms for linux.git, and 20ms for git.git (those numbers are a few years old but are still a good ballpark). That makes it a good optimization for some cases (we can save tens of seconds in git.git by having good locality of delta access, for a 20ms cost), but a bad one for others (e.g., right now "cat-file --batch-all-objects --batch-check="%(objectname)" is 170ms in git.git, so adding 20ms to that is noticeable). Hence this patch makes it an optional flag. You can't actually do any interesting timings yet, as it's not plumbed through to any user-facing tools like cat-file. That will come in a later patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13for_each_*_object: give more comprehensive docstringsJeff King1-3/+5
We already mention the local/alternate behavior of these functions, but we can help clarify a few other behaviors: - there's no need to mention LOCAL_ONLY specifically, since we already reference the flags by type (and as we add more flags, we don't want to have to mention each) - clarify that reachability doesn't matter here; this is all accessible objects - what ordering/uniqueness guarantees we give - how pack-specific flags are handled for the loose case Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13for_each_*_object: take flag arguments as enumJeff King1-1/+2
It's not wrong to pass our flags in an "unsigned", as we know it will be at least as large as the enum. However, using the enum in the declaration makes it more obvious where to find the list of flags. While we're here, let's also drop the "extern" noise-words from the declarations, per our modern coding style. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13for_each_*_object: store flag definitions in a single locationJeff King1-1/+12
These flags were split between cache.h and packfile.h, because some of the flags apply only to packs. However, they share a single numeric namespace, since both are respected for the packed variant. Let's make sure they're defined together so that nobody accidentally adds a new flag in one location that duplicates the other. While we're here, let's also put them in an enum (which helps debugger visibility) and use "(1<<n)" rather than counting powers of 2 manually. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>