aboutsummaryrefslogtreecommitdiffstats
path: root/t/t5616-partial-clone.sh
AgeCommit message (Collapse)AuthorFilesLines
2025-04-07t/lib-httpd: refactor "one-time-perl" CGI script to not depend on PerlPatrick Steinhardt1-22/+26
Our Apache HTTPD setup exposes an "one_time_perl" endpoint to access repositories. If used, we execute the "apply-one-time-perl.sh" CGI script that checks whether we have a "one-time-perl" script. If so, that script gets executed so that it can munge what would be served. Once done, the script gets removed so that it doesn't execute a second time. As the name says, this functionality expects the user to pass a Perl script. This isn't really necessary though: we can just as easily implement the same thing with arbitrary scripts. Refactor the code so that we instead expect an arbitrary script to exist and rename the functionality to "one-time-script". Adapt callers to use shell utilities instead of Perl so that we can drop the PERL_TEST_HELPERS prerequisite. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07t: introduce PERL_TEST_HELPERS prerequisitePatrick Steinhardt1-3/+3
In the early days of Git, Perl was used quite prominently throughout the project. This has changed significantly as almost all of the executables we ship nowadays have eventually been rewritten in C. Only a handful of subsystems remain that require Perl: - gitweb, a read-only web interface. - A couple of scripts that allow importing repositories from GNU Arch, CVS and Subversion. - git-send-email(1), which can be used to send mails. - git-request-pull(1), which is used to request somebody to pull from a URL by sending an email. - git-filter-branch(1), which uses Perl with the `--state-branch` option. This command is typically recommended against nowadays in favor of git-filter-repo(1). - Our Perl bindings for Git. - The netrc Git credential helper. None of these subsystems can really be considered to be part of the "core" of Git, and an installation without them is fully functional. It is more likely than not that an end user wouldn't even notice that any features are missing if those tools weren't installed. But while Perl nowadays very much is an optional dependency of Git, there is a significant limitation when Perl isn't available: developers cannot run our test suite. Preceding commits have started to lift this restriction by removing the strict dependency on Perl in many central parts of the test library. But there are still many tests that rely on small Perl helpers to do various different things. Introduce a new PERL_TEST_HELPERS prerequisite that guards all tests that require Perl. This prerequisite is explicitly different than the preexisting PERL prerequisite: - PERL records whether or not features depending on the Perl interpreter are built. - PERL_TEST_HELPERS records whether or not a Perl interpreter is available for our tests. By having these two separate prerequisites we can thus distinguish between tests that inherently depend on Perl because the underlying feature does, and those tests that depend on Perl because the test itself is using Perl. Adapt all tests to set the PERL_TEST_HELPERS prerequisite as needed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04Merge branch 'ps/leakfixes-part-10'Junio C Hamano1-1/+0
Leakfixes. * ps/leakfixes-part-10: (27 commits) t: remove TEST_PASSES_SANITIZE_LEAK annotations test-lib: unconditionally enable leak checking t: remove unneeded !SANITIZE_LEAK prerequisites t: mark some tests as leak free t5601: work around leak sanitizer issue git-compat-util: drop now-unused `UNLEAK()` macro global: drop `UNLEAK()` annotation t/helper: fix leaking commit graph in "read-graph" subcommand builtin/branch: fix leaking sorting options builtin/init-db: fix leaking directory paths builtin/help: fix leaks in `check_git_cmd()` help: fix leaking return value from `help_unknown_cmd()` help: fix leaking `struct cmdnames` help: refactor to not use globals for reading config builtin/sparse-checkout: fix leaking sanitized patterns split-index: fix memory leak in `move_cache_to_base_index()` git: refactor builtin handling to use a `struct strvec` git: refactor alias handling to use a `struct strvec` strvec: introduce new `strvec_splice()` function line-log: fix leak when rewriting commit parents ...
2024-11-21t: remove TEST_PASSES_SANITIZE_LEAK annotationsPatrick Steinhardt1-1/+0
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there is no longer a need to have that variable declared in all of our tests. Drop it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12index-pack: repack local links into promisor packsJonathan Tan1-0/+30
Teach index-pack to, when processing the objects in a pack with --promisor specified on the CLI, repack local objects (and the local objects that they refer to, recursively) referenced by these objects into promisor packs. This prevents the situation in which, when fetching from a promisor remote, we end up with promisor objects (newly fetched) referring to non-promisor objects (locally created prior to the fetch). This situation may arise if the client had previously pushed objects to the remote, for example. One issue that arises in this situation is that, if the non-promisor objects become inaccessible except through promisor objects (for example, if the branch pointing to them has moved to point to the promisor object that refers to them), then GC will garbage collect them. There are other ways to solve this, but the simplest seems to be to enforce the invariant that we don't have promisor objects referring to non-promisor objects. This repacking is done from index-pack to minimize the performance impact. During a fetch, the only time most objects are fully inflated in memory is when their object ID is computed, so we also scan the objects (to see which objects they refer to) during this time. Also to minimize the performance impact, an object is calculated to be local if it's a loose object or present in a non-promisor pack. (If it's also in a promisor pack or referred to by an object in a promisor pack, it is technically already a promisor object. But a misidentification of a promisor object as a non-promisor object is relatively benign here - we will thus repack that promisor object into a promisor pack, duplicating it in the object store, but there is no correctness issue, just an issue of inefficiency.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-20Merge branch 'ps/leakfixes-part-6'Junio C Hamano1-0/+1
More leakfixes. * ps/leakfixes-part-6: (22 commits) builtin/repack: fix leaking keep-pack list merge-ort: fix two leaks when handling directory rename modifications match-trees: fix leaking prefixes in `shift_tree()` builtin/fmt-merge-msg: fix leaking buffers builtin/grep: fix leaking object context builtin/pack-objects: plug leaking list of keep-packs builtin/repack: fix leaking line buffer when packing promisors negotiator/skipping: fix leaking commit entries shallow: fix leaking members of `struct shallow_info` shallow: free grafts when unregistering them object: clear grafts when clearing parsed object pool gpg-interface: fix misdesigned signing key interfaces send-pack: fix leaking push cert nonce remote: fix leak in reachability check of a remote-tracking ref remote: fix leaking tracking refs builtin/submodule--helper: fix leaking refs on push-check submodule: fix leaking fetch task data upload-pack: fix leaking child process data on reachability checks builtin/push: fix leaking refspec query result send-pack: fix leaking common object IDs ...
2024-09-05builtin/pack-objects: plug leaking list of keep-packsPatrick Steinhardt1-0/+1
The `--keep-pack` option of git-pack-objects(1) populates the arguments into a string list. And while the list is marked as `NODUP` and thus won't duplicate the strings, the list entries themselves still need to be free'd. We don't though, causing a leak. Plug it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-16run-command: fix detaching when running auto maintenancePatrick Steinhardt1-3/+3
In the past, we used to execute `git gc --auto` as part of our automatic housekeeping routines. As git-gc(1) may require quite some time to perform the housekeeping, it knows to detach itself and run in the background so that the user can continue their work. Eventually, we refactored our automatic housekeeping to instead use the more flexible git-maintenance(1) command. The upside of this new infra is that the user can configure which maintenance tasks are performed, at least to a certain degree. So while it continues to run git-gc(1) by default, it can also be adapted to e.g. use git-multi-pack-index(1) for maintenance of the object database. The auto-detach of the new infra is somewhat broken though once the user configures non-standard tasks. The problem is essentially that we detach at the wrong level in the process hierarchy: git-maintenance(1) never detaches itself, but instead it continues to be git-gc(1) which does. When configured to only run the git-gc(1) maintenance task, then the result is basically the same as before. But when configured to run other tasks, then git-maintenance(1) will wait for these to run to completion. Even worse, it may be that git-gc(1) runs concurrently with other housekeeping tasks, stomping on each others feet. Fix this bug by asking git-gc(1) to not detach when it is being invoked via git-maintenance(1). Instead, git-maintenance(1) now respects a new config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and detaches itself into the background when running as part of our auto maintenance. This should continue to behave the same for all users which use the git-gc(1) task, only. For others though, it means that we now properly perform all tasks in the background. The default behaviour of git-maintenance(1) when executed by the user does not change, it will remain in the foreground unless they pass the `--detach` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-02tests: teach callers of test_i18ngrep to use test_grepJunio C Hamano1-11/+11
They are equivalents and the former still exists, so as long as the only change this commit makes are to rewrite test_i18ngrep to test_grep, there won't be any new bug, even if there still are callers of test_i18ngrep remaining in the tree, or when merged to other topics that add new uses of test_i18ngrep. This patch was produced more or less with git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' | xargs perl -p -i -e 's/test_i18ngrep /test_grep /' and a good way to sanity check the result yourself is to run the above in a checkout of c4603c1c (test framework: further deprecate test_i18ngrep, 2023-10-31) and compare the resulting working tree contents with the result of applying this patch to the same commit. You'll see that test_i18ngrep in a few t/lib-*.sh files corrected, in addition to the manual reproduction. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-26config: don't BUG when both kvi and source are setGlen Choo1-2/+8
When iterating through config, we read config source metadata from global values - either a "struct config_source + enum config_scope" or a "struct key_value_info", using the current_config* functions. Prior to the series starting from 0c60285147 (config.c: create config_reader and the_reader, 2023-03-28), we weren't very picky about which values we should read in which situation; we did note that both groups of values generally shouldn't be set together, but if both were set, current_config* preferentially reads key_value_info. When that series added more structure, we enforced that either the former (when parsing a config source) can be set, or the latter (when iterating a config set), but *never* both at the same time. See 9828453ff0 (config.c: remove current_config_kvi, 2023-03-28) and 5cdf18e7cd (config.c: remove current_parsing_scope, 2023-03-28). That was a good simplifying constraint that helped us reason about the global state, but it turns out that there is at least one situation where we need both to be set at the same time: in a blobless partial clone where .gitmodules is missing. "git fetch" in such a repo will start a config parse over .gitmodules (setting the config_source), and Git will attempt to lazy-fetch it from the promisor remote. However, when we try to read the promisor configuration, we start iterating a config set (setting the key_value_info), and we BUG() out because that's not allowed any more. Teaching config_reader to gracefully handle this is somewhat complicated, but fortunately, there are proposed changes to the config.c machinery to get rid of this global state, and make the BUG() obsolete [1]. We should rely on that as the eventual solution, and avoid doing yet another refactor in the meantime. Therefore, fix the bug by removing the BUG() check. We're reverting to an older, less safe state, but that's generally okay since key_value_info is always preferentially read, so we'd always read the correct values when we iterate a config set in the middle of a config parse (like we are here). The reverse would be wrong, but extremely unlikely to happen since very few callers parse config without going through a config set. [1] https://lore.kernel.org/git/pull.1497.v3.git.git.1687290231.gitgitgadget@gmail.com Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-26read-tree: add "--super-prefix" option, eliminate globalÆvar Arnfjörð Bjarmason1-1/+1
The "--super-prefix" option to "git" was initially added in [1] for use with "ls-files"[2], and shortly thereafter "submodule--helper"[3] and "grep"[4]. It wasn't until [5] that "read-tree" made use of it. At the time [5] made sense, but since then we've made "ls-files" recurse in-process in [6], "grep" in [7], and finally "submodule--helper" in the preceding commits. Let's also remove it from "read-tree", which allows us to remove the option to "git" itself. We can do this because the only remaining user of it is the submodule API, which will now invoke "read-tree" with its new "--super-prefix" option. It will only do so when the "submodule_move_head()" function is called. That "submodule_move_head()" function was then only invoked by "read-tree" itself, but now rather than setting an environment variable to pass "--super-prefix" between cmd_read_tree() we: - Set a new "super_prefix" in "struct unpack_trees_options". The "super_prefixed()" function in "unpack-trees.c" added in [5] will now use this, rather than get_super_prefix() looking up the environment variable we set earlier in the same process. - Add the same field to the "struct checkout", which is only needed to ferry the "super_prefix" in the "struct unpack_trees_options" all the way down to the "entry.c" callers of "submodule_move_head()". Those calls which used the super prefix all originated in "cmd_read_tree()". The only other caller is the "unlink_entry()" caller in "builtin/checkout.c", which now passes a "NULL". 1. 74866d75793 (git: make super-prefix option, 2016-10-07) 2. e77aa336f11 (ls-files: optionally recurse into submodules, 2016-10-07) 3. 89c86265576 (submodule helper: support super prefix, 2016-12-08) 4. 0281e487fd9 (grep: optionally recurse into submodules, 2016-12-16) 5. 3d415425c7b (unpack-trees: support super-prefix option, 2017-01-17) 6. 188dce131fa (ls-files: use repository object, 2017-06-22) 7. f9ee2fcdfa0 (grep: recurse in-process using 'struct repository', 2017-08-02) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-26read-tree + fetch tests: test failing "--super-prefix" interactionGlen Choo1-0/+43
Ever since "git fetch --refetch" was introduced in 0f5e8851737 (Merge branch 'rc/fetch-refetch', 2022-04-04) the test being added here would fail. This is because "restore" will "read-tree .. --reset <hash>", which will in turn invoke "fetch". The "fetch" will then die with: fatal: fetch doesn't support --super-prefix This edge case and other "--super-prefix" bugs will be fixed in subsequent commits, but let's first add a "test_expect_failure" test for it. It passes until the very last command in the test. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-06Sync with 2.36.3Taylor Blau1-0/+2
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.35.5Taylor Blau1-0/+2
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.34.5Taylor Blau1-0/+2
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.31.5Taylor Blau1-0/+2
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.30.6Taylor Blau1-0/+2
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-01t/t5NNN: allow local submodulesTaylor Blau1-0/+2
To prepare for the default value of `protocol.file.allow` to change to "user", ensure tests that rely on local submodules can initialize them over the file protocol. Tests that only need to interact with submodules in a limited capacity have individual Git commands annotated with the appropriate configuration via `-c`. Tests that interact with submodules a handful of times use `test_config_global` instead. Test scripts that rely on submodules throughout use a `git config --global` during a setup test towards the beginning of the script. Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-09-13Merge branch 'jk/is-promisor-object-keep-tree-in-use' into maintJunio C Hamano1-0/+7
An earlier optimization discarded a tree-object buffer that is still in use, which has been corrected. * jk/is-promisor-object-keep-tree-in-use: is_promisor_object(): fix use-after-free of tree buffer
2022-08-14is_promisor_object(): fix use-after-free of tree bufferJeff King1-0/+7
Since commit fcc07e980b (is_promisor_object(): free tree buffer after parsing, 2021-04-13), we'll always free the buffers attached to a "struct tree" after searching them for promisor links. But there's an important case where we don't want to do so: if somebody else is already using the tree! This can happen during a "rev-list --missing=allow-promisor" traversal in a partial clone that is missing one or more trees or blobs. The backtrace for the free looks like this: #1 free_tree_buffer tree.c:147 #2 add_promisor_object packfile.c:2250 #3 for_each_object_in_pack packfile.c:2190 #4 for_each_packed_object packfile.c:2215 #5 is_promisor_object packfile.c:2272 #6 finish_object__ma builtin/rev-list.c:245 #7 finish_object builtin/rev-list.c:261 #8 show_object builtin/rev-list.c:274 #9 process_blob list-objects.c:63 #10 process_tree_contents list-objects.c:145 #11 process_tree list-objects.c:201 #12 traverse_trees_and_blobs list-objects.c:344 [...] We're in the middle of walking through the entries of a tree object via process_tree_contents(). We see a blob (or it could even be another tree entry) that we don't have, so we call is_promisor_object() to check it. That function loops over all of the objects in the promisor packfile, including the tree we're currently walking. When we're done with it there, we free the tree buffer. But as we return to the walk in process_tree_contents(), it's still holding on to a pointer to that buffer, via its tree_desc iterator, and it accesses the freed memory. Even a trivial use of "--missing=allow-promisor" triggers this problem, as the included test demonstrates (it's just a vanilla --blob:none clone). We can detect this case by only freeing the tree buffer if it was allocated on our behalf. This is a little tricky since that happens inside parse_object(), and it doesn't tell us whether the object was already parsed, or whether it allocated the buffer itself. But by checking for an already-parsed tree beforehand, we can distinguish the two cases. That feels a little hacky, and does incur an extra lookup in the object-hash table. But that cost is fairly minimal compared to actually loading objects (and since we're iterating the whole pack here, we're likely to be loading most objects, rather than reusing cached results). It may also be a good direction for this function in general, as there are other possible optimizations that rely on doing some analysis before parsing: - we could detect blobs and avoid reading their contents; they can't link to other objects, but parse_object() doesn't know that we don't care about checking their hashes. - we could avoid allocating object structs entirely for most objects (since we really only need them in the oidset), which would save some memory. - promisor commits could use the commit-graph rather than loading the object from disk This commit doesn't do any of those optimizations, but I think it argues that this direction is reasonable, rather than relying on parse_object() and trying to teach it to give us more information about whether it parsed. The included test fails reliably under SANITIZE=address just when running "rev-list --missing=allow-promisor". Checking the output isn't strictly necessary to detect the bug, but it seems like a reasonable addition given the general lack of coverage for "allow-promisor" in the test suite. Reported-by: Andrew Olsen <andrew.olsen@koordinates.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-28fetch: after refetch, encourage auto gc repackingRobert Coup1-0/+29
After invoking `fetch --refetch`, the object db will likely contain many duplicate objects. If auto-maintenance is enabled, invoke it with appropriate settings to encourage repacking/consolidation. * gc.autoPackLimit: unless this is set to 0 (disabled), override the value to 1 to force pack consolidation. * maintenance.incremental-repack.auto: unless this is set to 0, override the value to -1 to force incremental repacking. Signed-off-by: Robert Coup <robert@coup.net.nz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-28t5615-partial-clone: add test for fetch --refetchRobert Coup1-1/+51
Add a test for doing a refetch to apply a changed partial clone filter under protocol v0 and v2. Signed-off-by: Robert Coup <robert@coup.net.nz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13t5000-t5999: detect and signal failure within loopEric Sunshine1-6/+6
Failures within `for` and `while` loops can go unnoticed if not detected and signaled manually since the loop itself does not abort when a contained command fails, nor will a failure necessarily be detected when the loop finishes since the loop returns the exit code of the last command it ran on the final iteration, which may not be the command which failed. Therefore, detect and signal failures manually within loops using the idiom `|| return 1` (or `|| exit 1` within subshells). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13tests: fix broken &&-chains in compound statementsEric Sunshine1-9/+9
The top-level &&-chain checker built into t/test-lib.sh causes tests to magically exit with code 117 if the &&-chain is broken. However, it has the shortcoming that the magic does not work within `{...}` groups, `(...)` subshells, `$(...)` substitutions, or within bodies of compound statements, such as `if`, `for`, `while`, `case`, etc. `chainlint.sed` partly fills in the gap by catching broken &&-chains in `(...)` subshells, but bugs can still lurk behind broken &&-chains in the other cases. Fix broken &&-chains in compound statements in order to reduce the number of possible lurking bugs. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-28repack: avoid loosening promisor objects in partial clonesRafael Silva1-0/+8
When `git repack -A -d` is run in a partial clone, `pack-objects` is invoked twice: once to repack all promisor objects, and once to repack all non-promisor objects. The latter `pack-objects` invocation is with --exclude-promisor-objects and --unpack-unreachable, which loosens all objects unused during this invocation. Unfortunately, this includes promisor objects. Because the -d argument to `git repack` subsequently deletes all loose objects also in packs, these just-loosened promisor objects will be immediately deleted. However, this extra disk churn is unnecessary in the first place. For example, in a newly-cloned partial repo that filters all blob objects (e.g. `--filter=blob:none`), `repack` ends up unpacking all trees and commits into the filesystem because every object, in this particular case, is a promisor object. Depending on the repo size, this increases the disk usage considerably: In my copy of the linux.git, the object directory peaked 26GB of more disk usage. In order to avoid this extra disk churn, pass the names of the promisor packfiles as --keep-pack arguments to the second invocation of `pack-objects`. This informs `pack-objects` that the promisor objects are already in a safe packfile and, therefore, do not need to be loosened. For testing, we need to validate whether any object was loosened. However, the "evidence" (loosened objects) is deleted during the process which prevents us from inspecting the object directory. Instead, let's teach `pack-objects` to count loosened objects and emit via trace2 thus allowing inspecting the debug events after the process is finished. This new event is used on the added regression test. Lastly, add a new perf test to evaluate the performance impact made by this changes (tested on git.git): Test HEAD^ HEAD ---------------------------------------------------------- 5600.3: gc 134.38(41.93+90.95) 7.80(6.72+1.35) -94.2% For a bigger repository, such as linux.git, the improvement is even bigger: Test HEAD^ HEAD ------------------------------------------------------------------- 5600.3: gc 6833.00(918.07+3162.74) 268.79(227.02+39.18) -96.1% These improvements are particular big because every object in the newly-cloned partial repository is a promisor object. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jeff King <peff@peff.net> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25Merge branch 'js/default-branch-name-tests-final-stretch'Junio C Hamano1-25/+28
Prepare tests not to be affected by the name of the default branch "git init" creates. * js/default-branch-name-tests-final-stretch: (28 commits) tests: drop prereq `PREPARE_FOR_MAIN_BRANCH` where no longer needed t99*: adjust the references to the default branch name "main" tests(git-p4): transition to the default branch name `main` t9[5-7]*: adjust the references to the default branch name "main" t9[0-4]*: adjust the references to the default branch name "main" t8*: adjust the references to the default branch name "main" t7[5-9]*: adjust the references to the default branch name "main" t7[0-4]*: adjust the references to the default branch name "main" t6[4-9]*: adjust the references to the default branch name "main" t64*: preemptively adjust alignment to prepare for `master` -> `main` t6[0-3]*: adjust the references to the default branch name "main" t5[6-9]*: adjust the references to the default branch name "main" t55[4-9]*: adjust the references to the default branch name "main" t55[23]*: adjust the references to the default branch name "main" t551*: adjust the references to the default branch name "main" t550*: adjust the references to the default branch name "main" t5503: prepare aligned comment for replacing `master` with `main` t5[0-4]*: adjust the references to the default branch name "main" t5323: prepare centered comment for `master` -> `main` t4*: adjust the references to the default branch name "main" ...
2020-12-17Merge branch 'tb/partial-clone-filters-fix'Junio C Hamano1-1/+9
Fix potential server side resource deallocation issues when responding to a partial clone request. * tb/partial-clone-filters-fix: upload-pack.c: don't free allowed_filters util pointers builtin/clone.c: don't ignore transport_fetch_refs() errors
2020-12-03upload-pack.c: don't free allowed_filters util pointersTaylor Blau1-1/+9
To keep track of which object filters are allowed or not, 'git upload-pack' stores the name of each filter in a string_list, and sets it ->util pointer to be either 0 or 1, indicating whether it is banned or allowed. Later on, we attempt to clear that list, but we incorrectly ask for the util pointers to be free()'d, too. This behavior (introduced back in 6dd3456a8c (upload-pack.c: allow banning certain object filter(s), 2020-08-03)) leads to an invalid free, and causes us to crash. In order to trigger this, one needs to fetch from a server that (a) has at least one object filter allowed, and (b) issue a fetch that contains a subset of the allowed filters (i.e., we cannot ask for a banned filter, since this causes us to die() before we hit the bogus string_list_clear()). In that case, whatever banned filters exist will cause a noop free() (since those ->util pointers are set to 0), but the first allowed filter we try to free will crash us. We never noticed this in the tests because we didn't have an example of setting 'uploadPackFilter' configuration variables and then following up with a valid fetch. The first new 'git clone' prevents further regression here. For good measure on top, add a test which checks the same behavior at a tree depth greater than 0. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-03upload-pack: propagate return value from object filter config callbackJeff King1-0/+8
If we encounter an error in parse_filter_object_config(), we'll complain to stderr but won't actually propagate the return value up the stack. This is unlike most of our config callbacks, which return the error to git_config() so it can die (this includes the call just below us to parse_hide_refs_config(), which can also produce errors). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19t5[6-9]*: adjust the references to the default branch name "main"Johannes Schindelin1-26/+26
This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t5[6-9]*.sh) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19tests: mark tests relying on the current default for `init.defaultBranch`Johannes Schindelin1-0/+3
In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-03Merge branch 'jt/lazy-fetch'Junio C Hamano1-0/+20
Updates to on-demand fetching code in lazily cloned repositories. * jt/lazy-fetch: fetch: no FETCH_HEAD display if --no-write-fetch-head fetch-pack: remove no_dependents code promisor-remote: lazy-fetch objects in subprocess fetch-pack: do not lazy-fetch during ref iteration fetch: only populate existing_refs if needed fetch: avoid reading submodule config until needed fetch: allow refspecs specified through stdin negotiator/noop: add noop fetch negotiator
2020-08-20fetch-pack: in partial clone, pass --promisorJonathan Tan1-0/+16
When fetching a pack from a promisor remote, the corresponding .promisor file needs to be created. "fetch-pack" originally did this by passing "--promisor" to "index-pack", but in 5374a290aa ("fetch-pack: write fetched refs to .promisor", 2019-10-16), "fetch-pack" was taught to do this itself instead, because it needed to store ref information in the .promisor file. This causes a problem with superprojects when transfer.fsckobjects is set, because in the current implementation, it is "index-pack" that calls fsck_finish() to check the objects; before 5374a290aa, fsck_finish() would see that .gitmodules is a promisor object and tolerate it being missing, but after, there is no .promisor file (at the time of the invocation of fsck_finish() by "index-pack") to tell it that .gitmodules is a promisor object, so it returns an error. Therefore, teach "fetch-pack" to pass "--promisor" to index pack once again. "fetch-pack" will subsequently overwrite this file with the ref information. An alternative is to instead move object checking to "fetch-pack", and let "index-pack" only index the files. However, since "index-pack" has to inflate objects in order to index them, it seems reasonable to also let it check the objects (which also require inflated files). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18fetch-pack: do not lazy-fetch during ref iterationJonathan Tan1-0/+20
In order to determine negotiation tips, "fetch-pack" iterates over all refs and dereferences all annotated tags found. This causes the existence of targets of refs and annotated tags to be checked. Avoiding this is especially important when we use "git fetch" (which invokes "fetch-pack") to perform lazy fetches in a partial clone because a target of such a ref or annotated tag may need to be itself lazy-fetched (and otherwise causing an infinite loop). Therefore, teach "fetch-pack" not to lazy fetch whenever iterating over refs. This is done by using the raw form of ref iteration and by dereferencing tags ourselves. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-11Merge branch 'tb/upload-pack-filters'Junio C Hamano1-0/+33
The component to respond to "git fetch" request is made more configurable to selectively allow or reject object filtering specification used for partial cloning. * tb/upload-pack-filters: t5616: use test_i18ngrep for upload-pack errors upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth' upload-pack.c: allow banning certain object filter(s) list_objects_filter_options: introduce 'list_object_filter_config_name'
2020-08-05t5616: use test_i18ngrep for upload-pack errorsJeff King1-4/+4
The tests added to t5616 in 6dd3456a8c (upload-pack.c: allow banning certain object filter(s), 2020-08-03) can fail racily, but only with GETTEXT_POISON enabled. The tests in question look something like this: test_must_fail ok=sigpipe git clone --filter=blob:none ... 2>err && grep "filter blob:none not supported' err The remote upload-pack process writes that error message both as an ERR packet, but also via a die() message. In theory we should see the message twice in the "err" file. The client relays the message from the packet to its stderr (with a "remote error:" prefix), and because this is a local-system clone, upload-pack's stderr goes to the same place. But because clone may be writing to the pipe when upload-pack calls die(), it may get SIGPIPE and fail to relay the message. That's why we need our "ok=sigpipe" trick. But our grep should still work reliably in that case. Either: - we got SIGPIPE on the client, which means upload-pack completed its die(), and we'll see that version of the message. - the client didn't get SIGPIPE, and so it successfully relays the message. In theory we'd see both copies of the message in the second case. But now always! As soon as the client sees ERR, it exits and we run grep. But we have no guarantee that the upload-pack process has exited at this point, or even written its die() message. We might only see the client version of the message. Normally that's OK. We only need to see one or the other to pass the test. But now consider GETTEXT_POISON. upload-pack doesn't translate the die() message nor the ERR packet. But once the client receives it, it calls: die(_("remote error: %s"), buffer + 4); That message _is_ marked for translation. Normally we'd just replace the "remote error:" portion of it, but in GETTEXT_POISON mode, we replace the whole thing with "# GETTEXT POISON #" and don't include the "%s" part at all. So the whole text from the ERR packet is dropped, and so we may racily see a test failure if upload-pack's die() call wasn't yet written. We can fix it by using test_i18ngrep, which just makes this grep a noop in the poison mode. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'Taylor Blau1-0/+9
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s), 2020-02-26), we introduced functionality to disallow certain object filters from being chosen from within 'git upload-pack'. Traditionally, administrators use this functionality to disallow filters that are known to perform slowly, for e.g., those that do not have bitmap-level filtering. In the past, the '--filter=tree:<n>' was one such filter that does not have bitmap-level filtering support, and so was likely to be banned by administrators. However, in the previous couple of commits, we introduced bitmap-level filtering for the case when 'n' is equal to '0', i.e., as if we had a '--filter=tree:none' choice. While it would be sufficient to simply write $ git config uploadpackfilter.tree.allow true (since it would allow all values of 'n'), we would like to be able to allow this filter for certain values of 'n', i.e., those no greater than some pre-specified maximum. In order to do this, introduce a new configuration key, as follows: $ git config uploadpackfilter.tree.maxDepth <m> where '<m>' specifies the maximum allowed value of 'n' in the filter 'tree:n'. Administrators who wish to allow for only the value '0' can write: $ git config uploadpackfilter.tree.allow true $ git config uploadpackfilter.tree.maxDepth 0 which allows '--filter=tree:0', but no other values. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03upload-pack.c: allow banning certain object filter(s)Taylor Blau1-0/+24
Git clients may ask the server for a partial set of objects, where the set of objects being requested is refined by one or more object filters. Server administrators can configure 'git upload-pack' to allow or ban these filters by setting the 'uploadpack.allowFilter' variable to 'true' or 'false', respectively. However, administrators using bitmaps may wish to allow certain kinds of object filters, but ban others. Specifically, they may wish to allow object filters that can be optimized by the use of bitmaps, while rejecting other object filters which aren't and represent a perceived performance degradation (as well as an increased load factor on the server). Allow configuring 'git upload-pack' to support object filters on a case-by-case basis by introducing two new configuration variables: - 'uploadpackfilter.allow' - 'uploadpackfilter.<kind>.allow' where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on. Setting the second configuration variable for any valid value of '<kind>' explicitly allows or disallows restricting that kind of object filter. If a client requests the object filter <kind> and the respective configuration value is not set, 'git upload-pack' will default to the value of 'uploadpackfilter.allow', which itself defaults to 'true' to maintain backwards compatibility. Note that this differs from 'uploadpack.allowfilter', which controls whether or not the 'filter' capability is advertised. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16upload-pack: do not lazy-fetch "have" objectsJonathan Tan1-0/+38
When upload-pack receives a request containing "have" hashes, it (among other things) checks if the served repository has the corresponding objects. However, it does not do so with the OBJECT_INFO_SKIP_FETCH_OBJECT flag, so if serving a partial clone, a lazy fetch will be triggered first. This was discovered at $DAYJOB when a user fetched from a partial clone (into another partial clone - although this would also happen if the repo to be fetched into is not a partial clone). Therefore, whenever "have" hashes are checked for existence, pass the OBJECT_INFO_SKIP_FETCH_OBJECT flag. Also add the OBJECT_INFO_QUICK flag to improve performance, as it is typical that such objects do not exist in the serving repo, and the consequences of a false negative are minor (usually, a slightly larger pack sent). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13Merge branch 'cc/upload-pack-v2-fetch-fix'Junio C Hamano1-4/+3
Serving a "git fetch" client over "git://" and "ssh://" protocols using the on-wire protocol version 2 was buggy on the server end when the client needs to make a follow-up request to e.g. auto-follow tags. * cc/upload-pack-v2-fetch-fix: upload-pack: clear filter_options for each v2 fetch command
2020-05-08upload-pack: clear filter_options for each v2 fetch commandChristian Couder1-4/+3
Because of the request/response model of protocol v2, the upload_pack_v2() function is sometimes called twice in the same process, while 'struct list_objects_filter_options filter_options' was declared as static at the beginning of 'upload-pack.c'. This made the check in list_objects_filter_die_if_populated(), which is called by process_args(), fail the second time upload_pack_v2() is called, as filter_options had already been populated the first time. To fix that, filter_options is not static any more. It's now owned directly by upload_pack(). It's now also part of 'struct upload_pack_data', so that it's owned indirectly by upload_pack_v2(). In the long term, the goal is to also have upload_pack() use 'struct upload_pack_data', so adding filter_options to this struct makes more sense than to have it owned directly by upload_pack_v2(). This fixes the first of the 2 bugs documented by d0badf8797 (partial-clone: demonstrate bugs in partial fetch, 2020-02-21). Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Jeff King <peff@peff.net> Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28Merge branch 'dd/test-with-busybox'Junio C Hamano1-1/+1
Various tests have been updated to work around issues found with shell utilities that come with busybox etc. * dd/test-with-busybox: t5703: feed raw data into test-tool unpack-sideband t4124: tweak test so that non-compliant diff(1) can also be used t7063: drop non-POSIX argument "-ls" from find(1) t5616: use rev-parse instead to get HEAD's object_id t5003: skip conversion test if unzip -a is unavailable t5003: drop the subshell in test_lazy_prereq test-lib-functions: test_cmp: eval $GIT_TEST_CMP t4061: use POSIX compliant regex(7)
2020-04-22Merge branch 'jk/use-quick-lookup-in-clone-for-tag-following'Junio C Hamano1-0/+8
The logic to auto-follow tags by "git clone --single-branch" was not careful to avoid lazy-fetching unnecessary tags, which has been corrected. * jk/use-quick-lookup-in-clone-for-tag-following: clone: use "quick" lookup while following tags
2020-04-01clone: use "quick" lookup while following tagsJeff King1-0/+8
When cloning with --single-branch, we implement git-fetch's usual tag-following behavior, grabbing any tag objects that point to objects we have locally. When we're a partial clone, though, our has_object_file() check will actually lazy-fetch each tag. That not only defeats the purpose of --single-branch, but it does it incredibly slowly, potentially kicking off a new fetch for each tag. This is even worse for a shallow clone, which implies --single-branch, because even tags which are supersets of each other will be fetched individually. We can fix this by passing OBJECT_INFO_SKIP_FETCH_OBJECT to the call, which is what git-fetch does in this case. Likewise, let's include OBJECT_INFO_QUICK, as that's what git-fetch does. The rationale is discussed in 5827a03545 (fetch: use "quick" has_sha1_file for tag following, 2016-10-13), but here the tradeoff would apply even more so because clone is very unlikely to be racing with another process repacking our newly-created repository. This may provide a very small speedup even in the non-partial case case, as we'd avoid calling reprepare_packed_git() for each tag (though in practice, we'd only have a single packfile, so that reprepare should be quite cheap). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-25t5616: use rev-parse instead to get HEAD's object_idĐoàn Trần Công Danh1-1/+1
Only HEAD's object_id is necessary, rev-list is an overkill. Despite POSIX requires grep(1) treat single pattern with <newline> as multiple patterns. busybox's grep(1) (as of v1.31.1) haven't implemented it yet. Use rev-parse to simplify the test and avoid busybox unimplemented features. Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-05Merge branch 'js/ci-windows-update'Junio C Hamano1-17/+21
Updates to the CI settings. * js/ci-windows-update: Azure Pipeline: switch to the latest agent pools ci: prevent `perforce` from being quarantined t/lib-httpd: avoid using macOS' sed
2020-03-02Merge branch 'ds/partial-clone-fixes'Junio C Hamano1-0/+31
Fix for a bug revealed by a recent change to make the protocol v2 the default. * ds/partial-clone-fixes: partial-clone: avoid fetching when looking for objects partial-clone: demonstrate bugs in partial fetch
2020-02-27t/lib-httpd: avoid using macOS' sedJohannes Schindelin1-17/+21
Among other differences relative to GNU sed, macOS' sed always ends its output with a trailing newline, even if the input did not have such a trailing newline. Surprisingly, this makes three httpd-based tests fail on macOS: t5616, t5702 and t5703. ("Surprisingly" because those tests have been around for some time, but apparently nobody runs them on macOS with a working Apache2 setup.) The reason is that we use `sed` in those tests to filter the response of the web server. Apart from the fact that we use GNU constructs (such as using a space after the `c` command instead of a backslash and a newline), we have another problem: macOS' sed LF-only newlines while webservers are supposed to use CR/LF ones. Even worse, t5616 uses `sed` to replace a binary part of the response with a new binary part (kind of hoping that the replaced binary part does not contain a 0x0a byte which would be interpreted as a newline). To that end, it calls on Perl to read the binary pack file and hex-encode it, then calls on `sed` to prefix every hex digit pair with a `\x` in order to construct the text that the `c` statement of the `sed` invocation is supposed to insert. So we call Perl and sed to construct a sed statement. The final nail in the coffin is that macOS' sed does not even interpret those `\x<hex>` constructs. Let's just replace all of that by Perl snippets. With Perl, at least, we do not have to deal with GNU vs macOS semantics, we do not have to worry about unwanted trailing newlines, and we do not have to spawn commands to construct arguments for other commands to be spawned (i.e. we can avoid a whole lot of shell scripting complexity). The upshot is that this fixes t5616, t5702 and t5703 on macOS with Apache2. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-22partial-clone: avoid fetching when looking for objectsDerrick Stolee1-1/+1
When using partial clone, find_non_local_tags() in builtin/fetch.c checks each remote tag to see if its object also exists locally. There is no expectation that the object exist locally, but this function nevertheless triggers a lazy fetch if the object does not exist. This can be extremely expensive when asking for a commit, as we are completely removed from the context of the non-existent object and thus supply no "haves" in the request. 6462d5eb9a (fetch: remove fetch_if_missing=0, 2019-11-05) removed a global variable that prevented these fetches in favor of a bitflag. However, some object existence checks were not updated to use this flag. Update find_non_local_tags() to use OBJECT_INFO_SKIP_FETCH_OBJECT in addition to OBJECT_INFO_QUICK. The _QUICK option only prevents repreparing the pack-file structures. We need to be extremely careful about supplying _SKIP_FETCH_OBJECT when we expect an object to not exist due to updated refs. This resolves a broken test in t5616-partial-clone.sh. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-22partial-clone: demonstrate bugs in partial fetchDerrick Stolee1-0/+31
While testing partial clone, I noticed some odd behavior. I was testing a way of running 'git init', followed by manually configuring the remote for partial clone, and then running 'git fetch'. Astonishingly, I saw the 'git fetch' process start asking the server for multiple rounds of pack-file downloads! When tweaking the situation a little more, I discovered that I could cause the remote to hang up with an error. Add two tests that demonstrate these two issues. In the first test, we find that when fetching with blob filters from a repository that previously did not have any tags, the 'git fetch --tags origin' command fails because the server sends "multiple filter-specs cannot be combined". This only happens when using protocol v2. In the second test, we see that a 'git fetch origin' request with several ref updates results in multiple pack-file downloads. This must be due to Git trying to fault-in the objects pointed by the refs. What makes this matter particularly nasty is that this goes through the do_oid_object_info_extended() method, so there are no "haves" in the negotiation. This leads the remote to send every reachable commit and tree from each new ref, providing a quadratic amount of data transfer! This test is fixed if we revert 6462d5eb9a (fetch: remove fetch_if_missing=0, 2019-11-05), but that revert causes other test failures. The real fix will need more care. The tests are ordered in this way because if I swap the test order the tag test will succeed instead of fail. I believe this is because somehow we need the srv.bare repo to not have any tags when we clone, but then have tags in our next fetch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-27t5616: make robust to delta base changeJonathan Tan1-13/+23
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08) contains a test that relies on having to lazily fetch the delta base of a blob, but assumes that the tree being fetched (as part of the test) is sent as a non-delta object. This assumption may not hold in the future; for example, a change in the length of the object hash might result in the tree being sent as a delta instead. Make the test more robust by relying on having to lazily fetch the delta base of the tree instead, and by making no assumptions on whether the blobs are sent as delta or non-delta. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-01Merge branch 'jt/fetch-remove-lazy-fetch-plugging'Junio C Hamano1-0/+70
"git fetch" codepath had a big "do not lazily fetch missing objects when I ask if something exists" switch. This has been corrected by marking the "does this thing exist?" calls with "if not please do not lazily fetch it" flag. * jt/fetch-remove-lazy-fetch-plugging: promisor-remote: remove fetch_if_missing=0 clone: remove fetch_if_missing=0 fetch: remove fetch_if_missing=0
2019-11-08fetch: remove fetch_if_missing=0Jonathan Tan1-0/+70
In fetch_pack() (and all functions it calls), pass OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be a tree or blob that we do not want to be lazy-fetched even if it is absent. Thus, the only lazy-fetches occurring for trees and blobs are when resolving deltas. Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove this, and also add a test ensuring that such objects are not lazy-fetched. (We might be able to remove fetch_if_missing=0 from other places too, but I have limited myself to builtin/fetch.c in this commit because I have not written tests for the other commands yet.) Note that commits and tags may still be lazy-fetched. I limited myself to objects that could be trees or blobs here because Git does not support creating such commit- and tag-excluding clones yet, and even if such a clone were manually created, Git does not have good support for fetching a single commit (when fetching a commit, it and all its ancestors would be sent). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-16fetch-pack: write fetched refs to .promisorJonathan Tan1-0/+8
The specification of promisor packfiles (in partial-clone.txt) states that the .promisor files that accompany packfiles do not matter (just like .keep files), so whenever a packfile is fetched from the promisor remote, Git has been writing empty .promisor files. But these files could contain more useful information. So instead of writing empty files, write the refs fetched to these files. This makes it easier to debug issues with partial clones, as we can identify what refs (and their associated hashes) were fetched at the time the packfile was downloaded, and if necessary, compare those hashes against what the promisor remote reports now. This is implemented by teaching fetch-pack to write its own non-empty .promisor file whenever it knows the name of the pack's lockfile. This covers the case wherein the user runs "git fetch" with an internal protocol or HTTP protocol v2 (fetch_refs_via_pack() in transport.c sets lock_pack) and with HTTP protocol v0/v1 (fetch_git() in remote-curl.c passes "--lock-pack" to "fetch-pack"). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Acked-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07Merge branch 'jk/partial-clone-sparse-blob'Junio C Hamano1-0/+36
The name of the blob object that stores the filter specification for sparse cloning/fetching was interpreted in a wrong place in the code, causing Git to abort. * jk/partial-clone-sparse-blob: list-objects-filter: use empty string instead of NULL for sparse "base" list-objects-filter: give a more specific error sparse parsing error list-objects-filter: delay parsing of sparse oid t5616: test cloning/fetching with sparse:oid=<oid> filter
2019-09-18Merge branch 'md/list-objects-filter-combo'Junio C Hamano1-0/+19
The list-objects-filter API (used to create a sparse/lazy clone) learned to take a combined filter specification. * md/list-objects-filter-combo: list-objects-filter-options: make parser void list-objects-filter-options: clean up use of ALLOC_GROW list-objects-filter-options: allow mult. --filter strbuf: give URL-encoding API a char predicate fn list-objects-filter-options: make filter_spec a string_list list-objects-filter-options: move error check up list-objects-filter: implement composite filters list-objects-filter-options: always supply *errbuf list-objects-filter: put omits set in filter struct list-objects-filter: encapsulate filter components
2019-09-18Merge branch 'cc/multi-promisor'Junio C Hamano1-2/+2
Teach the lazy clone machinery that there can be more than one promisor remote and consult them in order when downloading missing objects on demand. * cc/multi-promisor: Move core_partial_clone_filter_default to promisor-remote.c Move repository_format_partial_clone to promisor-remote.c Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} remote: add promisor and partial clone config to the doc partial-clone: add multiple remotes in the doc t0410: test fetching from many promisor remotes builtin/fetch: remove unique promisor remote limitation promisor-remote: parse remote.*.partialclonefilter Use promisor_remote_get_direct() and has_promisor_remote() promisor-remote: use repository_format_partial_clone promisor-remote: add promisor_remote_reinit() promisor-remote: implement promisor_remote_get_direct() Add initial support for many promisor remotes fetch-object: make functions return an error code t0410: remove pipes after git commands
2019-09-16list-objects-filter: give a more specific error sparse parsing errorJon Simons1-1/+1
The sparse:oid filter has two error modes: we might fail to resolve the name to an OID, or we might fail to parse the contents of that OID. In the latter case, let's give a less generic error message, and mention the OID we did find. While we're here, let's also mark both messages as translatable. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-16list-objects-filter: delay parsing of sparse oidJeff King1-2/+2
The list-objects-filter code has two steps to its initialization: 1. parse_list_objects_filter() makes sure the spec is a filter we know about and is syntactically correct. This step is done by "rev-list" or "upload-pack" that is going to apply a filter, but also by "git clone" or "git fetch" before they send the spec across the wire. 2. list_objects_filter__init() runs the type-specific initialization (using function pointers established in step 1). This happens at the start of traverse_commit_list_filtered(), when we're about to actually use the filter. It's a good idea to parse as much as we can in step 1, in order to catch problems early (e.g., a blob size limit that isn't a number). But one thing we _shouldn't_ do is resolve any oids at that step (e.g., for sparse-file contents specified by oid). In the case of a fetch, the oid has to be resolved on the remote side. The current code does resolve the oid during the parse phase, but ignores any error (which we must do, because we might just be sending the spec across the wire). This leads to two bugs: - if we're not in a repository (e.g., because it's git-clone parsing the spec), then we trigger a BUG() trying to resolve the name - if we did hit the error case, we still have to notice that later and bail. The code path in rev-list handles this, but the one in upload-pack does not, leading to a segfault. We can fix both by moving the oid resolution into the sparse-oid init function. At that point we know we have a repository (because we're about to traverse), and handling the error there fixes the segfault. As a bonus, we can drop the NULL sparse_oid_value check in rev-list, since this is now handled in the sparse-oid-filter init function. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-16t5616: test cloning/fetching with sparse:oid=<oid> filterJon Simons1-0/+36
We test in t5317 that "sparse:oid" filters work with rev-list, but there's no coverage at all confirming that they work with a fetch or clone (and in fact, there are several bugs). Let's do a basic test that a clone fetches the correct objects. [jk: extracted from Jon's earlier fix patches. I also simplified the setup down to a single sparse file, and I added checks that we got the right blobs] Signed-off-by: Jon Simons <jon@jonsimons.org> Signed-off-by: Jeff King <peff@peff.net> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-02t: warn against adding non-httpd-specific tests after sourcing 'lib-httpd'SZEDER Gábor1-0/+3
We have a couple of test scripts that are not completely httpd-specific, but do run a few httpd-specific tests at the end. These test scripts source 'lib-httpd.sh' somewhere mid-script, which then skips all the rest of the test script if the dependencies for running httpd tests are not fulfilled. As the previous two patches in this series show, already on two occasions non-httpd-specific tests were appended at the end of such test scripts, and, consequently, they were skipped as well when httpd tests couldn't be run. Add a comment at the end of these test scripts to warn against adding non-httpd-specific tests at the end, in the hope that they will help prevent similar issues in the future. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28list-objects-filter-options: allow mult. --filterMatthew DeVore1-0/+19
Allow combining of multiple filters by simply repeating the --filter flag. Before this patch, the user had to combine them in a single flag somewhat awkwardly (e.g. --filter=combine:FOO+BAR), including URL-encoding the individual filters. To make this work, in the --filter flag parsing callback, rather than error out when we detect that the filter_options struct is already populated, we modify it in-place to contain the added sub-filter. The existing sub-filter becomes the lhs of the combined filter, and the next sub-filter becomes the rhs. We also have to URL-encode the LHS and RHS sub-filters. We can simplify the operation if the LHS is already a combine: filter. In that case, we just append the URL-encoded RHS sub-filter to the LHS spec to get the new spec. Helped-by: Emily Shaffer <emilyshaffer@google.com> Helped-by: Jeff Hostetler <git@jeffhostetler.com> Helped-by: Jeff King <peff@peff.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25promisor-remote: parse remote.*.partialclonefilterChristian Couder1-1/+1
This makes it possible to specify a different partial clone filter for each promisor remote. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25Use promisor_remote_get_direct() and has_promisor_remote()Christian Couder1-1/+1
Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11t5616: cover case of client having delta baseJonathan Tan1-11/+28
When fetching into a partial clone, Git first prefetches missing REF_DELTA bases from the promisor remote. (This feature was introduced in [1].) But as can be seen in a recent test coverage report [2], the case in which a REF_DELTA base is already present is not covered by tests. Extend the tests slightly to cover this case. [1] 8a30a1efd1 ("index-pack: prefetch missing REF_DELTA bases", 2019-05-15). [2] https://public-inbox.org/git/396091fc-5572-19a5-4f18-61c258590dd5@gmail.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11t5616: use correct flag to check object is missingJonathan Tan1-1/+1
If we want to check whether an object is missing, the correct flag to pass to rev-list is --ignore-missing; --exclude-promisor-objects will exclude any object that came from the promisor remote, whether it is present or missing. Use the correct flag. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-15index-pack: prefetch missing REF_DELTA basesJonathan Tan1-0/+61
When fetching, the client sends "have" commit IDs indicating that the server does not need to send any object referenced by those commits, reducing network I/O. When the client is a partial clone, the client still sends "have"s in this way, even if it does not have every object referenced by a commit it sent as "have". If a server omits such an object, it is fine: the client could lazily fetch that object before this fetch, and it can still do so after. The issue is when the server sends a thin pack containing an object that is a REF_DELTA against such a missing object: index-pack fails to fix the thin pack. When support for lazily fetching missing objects was added in 8b4c0103a9 ("sha1_file: support lazily fetching missing objects", 2017-12-08), support in index-pack was turned off in the belief that it accesses the repo only to do hash collision checks. However, this is not true: it also needs to access the repo to resolve REF_DELTA bases. Support for lazy fetching should still generally be turned off in index-pack because it is used as part of the lazy fetching process itself (if not, infinite loops may occur), but we do need to fetch the REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that those are REF_DELTA themselves, because we do not send "have" when making such fetches.) To resolve this, prefetch all missing REF_DELTA bases before attempting to resolve them. This both ensures that all bases are attempted to be fetched, and ensures that we make only one request per index-pack invocation, and not one request per missing object. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-15t5616: refactor packfile replacementJonathan Tan1-13/+21
A subsequent patch will perform the same packfile replacement that is already done twice, so refactor it into its own function. Also, the same subsequent patch will use, in another way, part of the packfile replacement functionality, so extract those out too. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-14tests: use 'test_atexit' to stop httpdSZEDER Gábor1-2/+0
Use 'test_atexit' to run cleanup commands to stop httpd at the end of the test script or upon interrupt or failure, as it is shorter, simpler, and more robust than registering such cleanup commands in the trap on EXIT in the test scripts. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-04Merge branch 'nd/i18n'Junio C Hamano1-1/+1
More _("i18n") markings. * nd/i18n: fsck: mark strings for translation fsck: reduce word legos to help i18n parse-options.c: mark more strings for translation parse-options.c: turn some die() to BUG() parse-options: replace opterror() with optname() repack: mark more strings for translation remote.c: mark messages for translation remote.c: turn some error() or die() to BUG() reflog: mark strings for translation read-cache.c: add missing colon separators read-cache.c: mark more strings for translation read-cache.c: turn die("internal error") to BUG() attr.c: mark more string for translation archive.c: mark more strings for translation alias.c: mark split_cmdline_strerror() strings for translation git.c: mark more strings for translation
2018-10-30Merge branch 'md/filter-trees'Junio C Hamano1-0/+42
The "rev-list --filter" feature learned to exclude all trees via "tree:0" filter. * md/filter-trees: list-objects: support for skipping tree traversal filter-trees: code clean-up of tests list-objects-filter: implement filter tree:0 list-objects-filter-options: do not over-strbuf_init list-objects-filter: use BUG rather than die revision: mark non-user-given objects instead rev-list: handle missing tree objects properly list-objects: always parse trees gently list-objects: refactor to process_tree_contents list-objects: store common func args in struct
2018-10-19Merge branch 'jt/fetch-tips-in-partial-clone'Junio C Hamano1-0/+17
"git fetch $repo $object" in a partial clone did not correctly fetch the asked-for object that is referenced by an object in promisor packfile, which has been fixed. * jt/fetch-tips-in-partial-clone: fetch: in partial clone, check presence of targets connected: document connectivity in partial clones
2018-10-15filter-trees: code clean-up of testsMatthew DeVore1-1/+1
A few trivial updates to test to match the current best practices. - avoid "grep -q" that strips potentially useful output from tests running under "-v". - use test_write_lines to prepare multi-line expected output file. - reserve use of test_must_fail to "git" commands. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07list-objects-filter: implement filter tree:0Matthew DeVore1-0/+42
Teach list-objects the "tree:0" filter which allows for filtering out all tree and blob objects (unless other objects are explicitly specified by the user). The purpose of this patch is to allow smaller partial clones. The name of this filter - tree:0 - does not explicitly specify that it also filters out all blobs, but this should not cause much confusion because blobs are not at all useful without the trees that refer to them. I also considered only:commits as a name, but this is inaccurate because it suggests that annotated tags are omitted, but actually they are included. The name "tree:0" allows later filtering based on depth, i.e. "tree:1" would filter out all but the root tree and blobs. In order to avoid confusion between 0 and capital O, the documentation was worded in a somewhat round-about way that also hints at this future improvement to the feature. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07tests: order arguments to git-rev-list properlyMatthew DeVore1-11/+15
It is a common mistake to put positional arguments before flags when invoking git-rev-list. Order the positional arguments last. This patch skips git-rev-list invocations which include the --not flag, since the ordering of flags and positional arguments affects the behavior. This patch also skips invocations of git-rev-list that occur in command substitution in which the exit code is discarded, since fixing those properly will require a more involved cleanup. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07tests: don't swallow Git errors upstream of pipesMatthew DeVore1-6/+8
Some pipes in tests lose the exit code of git processes, which can mask unexpected behavior like crashes. Split these pipes up so that git commands are only at the end of pipes rather than the beginning or middle. The violations fixed in this patch were found in the process of fixing pipe placement in a prior patch. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07tests: standardize pipe placementMatthew DeVore1-13/+19
Instead of using a line-continuation and pipe on the second line, take advantage of the shell's implicit line continuation after a pipe character. So for example, instead of some long line \ | next line use some long line | next line And add a blank line before and after the pipe where it aids readability (it usually does). This better matches the coding style documented in Documentation/CodingGuidelines and used in shell scripts elsewhere in the tree. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21fetch: in partial clone, check presence of targetsJonathan Tan1-0/+17
When fetching an object that is known as a promisor object to the local repository, the connectivity check in quickfetch() in builtin/fetch.c succeeds, causing object transfer to be bypassed. However, this should not happen if that object is merely promised and not actually present. Because this happens, when a user invokes "git fetch origin <sha-1>" on the command-line, the <sha-1> object may not actually be fetched even though the command returns an exit code of 0. This is a similar issue (but with a different cause) to the one fixed by a0c9016abd ("upload-pack: send refs' objects despite "filter"", 2018-07-09). Therefore, update quickfetch() to also directly check for the presence of all objects to be fetched. Its documentation and name are also updated to better reflect what it does. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-16tag: don't warn if target is missing but promisedJonathan Tan1-2/+7
deref_tag() prints a warning if the object that a tag refers to does not exist. However, when a partial clone has an annotated tag from its promisor remote, but not the object that it refers to, printing a warning on such a tag is incorrect. This occurs, for example, when the checkout that happens after a partial clone causes some objects to be fetched - and as part of the fetch, all local refs are read. The test included in this patch demonstrates this situation. Therefore, do not print a warning in this case. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-16revision: tolerate promised targets of tagsJonathan Tan1-0/+39
In handle_commit(), it is fatal for an annotated tag to point to a non-existent object. --exclude-promisor-objects should relax this rule and allow non-existent objects that are promisor objects, but this is not the case. Update handle_commit() to tolerate this situation. This was observed when cloning from a repository with an annotated tag pointing to a blob. The test included in this patch demonstrates this case. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-09clone: check connectivity even if clone is partialJonathan Tan1-0/+48
The commit that introduced the partial clone feature - 548719fbdc ("clone: partial clone", 2017-12-08) - excluded connectivity checks for partial clones, but this also meant that it is possible for a clone to succeed, yet not have all objects either present or promised. Specifically, if cloning with --filter=blob:none from a repository that has a tag pointing to a blob, and the blob is not sent in the packfile, the clone will pass, even if the blob is not referenced by any tree in the packfile. Turn on connectivity checks for partial clone. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-09upload-pack: send refs' objects despite "filter"Jonathan Tan1-0/+16
A filter line in a request to upload-pack filters out objects regardless of whether they are directly referenced by a "want" line or not. This means that cloning with "--filter=blob:none" (or another filter that excludes blobs) from a repository with at least one ref pointing to a blob (for example, the Git repository itself) results in output like the following: error: missing object referenced by 'refs/tags/junio-gpg-pub' and if that particular blob is not referenced by a fetched tree, the resulting clone fails fsck because there is no object from the remote to vouch that the missing object is a promisor object. Update both the protocol and the upload-pack implementation to include all explicitly specified "want" objects in the packfile regardless of the filter specification. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15fetch-pack: do not check links for partial fetchJonathan Tan1-0/+11
When doing a partial clone or fetch with transfer.fsckobjects=1, use the --fsck-objects instead of the --strict flag when invoking index-pack so that links are not checked, only objects. This is because incomplete links are expected when doing a partial clone or fetch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08t5616: test bulk prefetch after partial fetchJeff Hostetler1-1/+31
Add test to t5616 to bulk fetch missing objects following a partial fetch. A technique like this could be used in a pre-command hook for example. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08fetch: inherit filter-spec from partial cloneJeff Hostetler1-1/+21
Teach (partial) fetch to inherit the filter-spec used by the partial clone. Extend --no-filter to override this inheritance. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08t5616: end-to-end tests for partial cloneJeff Hostetler1-0/+96
Additional end-to-end tests for partial clone. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>