aboutsummaryrefslogtreecommitdiffstats
path: root/builtin/upload-pack.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-29Sync with 2.44.1Johannes Schindelin1-0/+2
* maint-2.44: (41 commits) Git 2.44.1 Git 2.43.4 Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel ...
2024-04-19Sync with 2.41.1Johannes Schindelin1-0/+2
* maint-2.41: (38 commits) Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs ...
2024-04-19Sync with 2.40.2Johannes Schindelin1-0/+2
* maint-2.40: (39 commits) Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default ...
2024-04-17upload-pack: disable lazy-fetching by defaultJeff King1-0/+2
The upload-pack command tries to avoid trusting the repository in which it's run (e.g., by not running any hooks and not using any config that contains arbitrary commands). But if the server side of a fetch or a clone is a partial clone, then either upload-pack or its child pack-objects may run a lazy "git fetch" under the hood. And it is very easy to convince fetch to run arbitrary commands. The "server" side can be a local repository owned by someone else, who would be able to configure commands that are run during a clone with the current user's permissions. This issue has been designated CVE-2024-32004. The fix in this commit's parent helps in this scenario, as well as in related scenarios using SSH to clone, where the untrusted .git directory is owned by a different user id. But if you received one as a zip file, on a USB stick, etc, it may be owned by your user but still untrusted. This has been designated CVE-2024-32465. To mitigate the issue more completely, let's disable lazy fetching entirely during `upload-pack`. While fetching from a partial repository should be relatively rare, it is certainly not an unreasonable workflow. And thus we need to provide an escape hatch. This commit works by respecting a GIT_NO_LAZY_FETCH environment variable (to skip the lazy-fetch), and setting it in upload-pack, but only when the user has not already done so (which gives us the escape hatch). The name of the variable is specifically chosen to match what has already been added in 'master' via e6d5479e7a (git: extend --no-lazy-fetch to work across subprocesses, 2024-02-27). Since we're building this fix as a backport for older versions, we could cherry-pick that patch and its earlier steps. However, we don't really need the niceties (like a "--no-lazy-fetch" option) that it offers. By using the same name, everything should just work when the two are eventually merged, but here are a few notes: - the blocking of the fetch in e6d5479e7a is incomplete! It sets fetch_if_missing to 0 when we setup the repository variable, but that isn't enough. pack-objects in particular will call prefetch_to_pack() even if that variable is 0. This patch by contrast checks the environment variable at the lowest level before we call the lazy fetch, where we can be sure to catch all code paths. Possibly the setting of fetch_if_missing from e6d5479e7a can be reverted, but it may be useful to have. For example, some code may want to use that flag to change behavior before it gets to the point of trying to start the fetch. At any rate, that's all outside the scope of this patch. - there's documentation for GIT_NO_LAZY_FETCH in e6d5479e7a. We can live without that here, because for the most part the user shouldn't need to set it themselves. The exception is if they do want to override upload-pack's default, and that requires a separate documentation section (which is added here) - it would be nice to use the NO_LAZY_FETCH_ENVIRONMENT macro added by e6d5479e7a, but those definitions have moved from cache.h to environment.h between 2.39.3 and master. I just used the raw string literals, and we can replace them with the macro once this topic is merged to master. At least with respect to CVE-2024-32004, this does render this commit's parent commit somewhat redundant. However, it is worth retaining that commit as defense in depth, and because it may help other issues (e.g., symlink/hardlink TOCTOU races, where zip files are not really an interesting attack vector). The tests in t0411 still pass, but now we have _two_ mechanisms ensuring that the evil command is not run. Let's beef up the existing ones to check that they failed for the expected reason, that we refused to run upload-pack at all with an alternate user id. And add two new ones for the same-user case that both the restriction and its escape hatch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-02-28upload-pack: always turn off save_commit_bufferJeff King1-0/+2
When the client sends us "want $oid" lines, we call parse_object($oid) to get an object struct. It's important to parse the commits because we need to traverse them in the negotiation phase. But of course we don't need to hold on to the commit messages for each one. We've turned off the save_commit_buffer flag in get_common_commits() for a long time, since f0243f26f6 (git-upload-pack: More efficient usage of the has_sha1 array, 2005-10-28). That helps with the commits we see while actually traversing. But: 1. That function is only used by the v0 protocol. I think the v2 protocol's code path leaves the flag on (and thus pays the extra memory penalty), though I didn't measure it specifically. 2. If the client sends us a bunch of "want" lines, that happens before the negotiation phase. So we'll hold on to all of those commit messages. Generally the number of "want" lines scales with the refs, not with the number of objects in the repo. But a malicious client could send a lot in order to waste memory. As an example of (2), if I generate a request to fetch all commits in git.git like this: pktline() { local msg="$*" printf "%04x%s\n" $((1+4+${#msg})) "$msg" } want_commits() { pktline command=fetch printf 0001 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | while read oid type; do test "$type" = "commit" || continue pktline want $oid done pktline done printf 0000 } want_commits | GIT_PROTOCOL=version=2 valgrind --tool=massif git-upload-pack . >/dev/null before this patch upload-pack peaks at ~125MB, and after at ~35MB. The difference is not coincidentally about the same as the sum of all commit object sizes as computed by: git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' | perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }' In a larger repository like linux.git, that number is ~1GB. In a repository with a full commit-graph file this will have no impact (and the commit graph would save us from parsing at all, so is a much better solution!). But it's easy to do, might help a little in real-world cases (where even if you have a commit graph it might not be fully up to date), and helps a lot for a worst-case malicious request. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29Merge branch 'en/header-split-cache-h-part-3'Junio C Hamano1-1/+1
Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...
2023-06-21repository: remove unnecessary include of path.hElijah Newren1-0/+1
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren1-1/+0
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12repository: create disable_replace_refs()Derrick Stolee1-1/+1
Several builtins depend on being able to disable the replace references so we actually operate on each object individually. These currently do so by directly mutating the 'read_replace_refs' global. A future change will move this global into a different place, so it will be necessary to change all of these lines. However, we can simplify that transition by abstracting the purpose of these global assignments with a method call. We will need to keep this read_replace_refs global forever, as we want to make sure that we never use replace refs throughout the life of the process if this method is called. Future changes may present a repository-scoped version of the variable to represent that repository's core.useReplaceRefs config value, but a zero-valued read_replace_refs will always override such a setting. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23replace-object.h: move read_replace_refs declaration from cache.h to hereElijah Newren1-0/+1
Adjust several files to be more explicit about their dependency on replace-objects to accommodate this change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-13doc txt & -h consistency: add missing options and labelsÆvar Arnfjörð Bjarmason1-1/+2
Fix various issues of SYNOPSIS and -h output syntax where: * Options such as --force were missing entirely * ...or the short option, such as -f * We said "opts" or "options", but could instead enumerate the (small) set of supported options * Options that were missing entirely (ls-remote's --sort=<key>) As we can specify "--sort" multiple times (it's backed by a string-list" it should really be "[(--sort=<key>)...]", which is what "git for-each-ref" lists it as, but let's leave that issue for a subsequent cleanup, and stop at making these consistent. Other "ref-filter.h" users share the same issue, e.g. "git-branch.txt". * For "verify-tag" and "verify-commit" we were missing the "--raw" option. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05upload-pack: document and rename --advertise-refsÆvar Arnfjörð Bjarmason1-2/+3
The --advertise-refs documentation in git-upload-pack added in 9812f2136b3 (upload-pack.c: use parse-options API, 2016-05-31) hasn't been entirely true ever since v2 support was implemented in e52449b6722 (connect: request remote refs using v2, 2018-03-15). Under v2 we don't advertise the refs at all, but rather dump the capabilities header. This option has always been an obscure internal implementation detail, it wasn't even documented for git-receive-pack. Since it has exactly one user let's rename it to --http-backend-info-refs, which is more accurate and points the reader in the right direction. Let's also cross-link this from the protocol v1 and v2 documentation. I'm retaining a hidden --advertise-refs alias in case there's any external users of this, and making both options hidden to the bash completion (as with most other internal-only options). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05serve.[ch]: remove "serve_options", split up --advertise-refs codeÆvar Arnfjörð Bjarmason1-13/+12
The "advertise capabilities" mode of serve.c added in ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by the http-backend.c to call {upload,receive}-pack with the --advertise-refs parameter. See 42526b478e3 (Add stateless RPC options to upload-pack, receive-pack, 2009-10-30). Let's just make cmd_upload_pack() take the two (v2) or three (v2) parameters the the v2/v1 servicing functions need directly, and pass those in via the function signature. The logic of whether daemon mode is implied by the timeout belongs in the v1 function (only used there). Once we split up the "advertise v2 refs" from "serve v2 request" it becomes clear that v2 never cared about those in combination. The only time it mattered was for v1 to emit its ref advertisement, in that case we wanted to emit the smart-http-only "no-done" capability. Since we only do that in the --advertise-refs codepath let's just have it set "do_done" itself in v1's upload_pack() just before send_ref(), at that point --advertise-refs and --stateless-rpc in combination are redundant (the only user is get_info_refs() in http-backend.c), so we can just pass in --advertise-refs only. Since we need to touch all the serve() and advertise_capabilities() codepaths let's rename them to less clever and obvious names, it's been suggested numerous times, the latest of which is [1]'s suggestion for protocol_v2_serve_loop(). Let's go with that. 1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-13builtin: consistently pass cmd_* prefix to parse_optionsJeff King1-1/+1
If a builtin uses RUN_SETUP to request that git.c enter the repository directory, we'll get passed in a "prefix" variable with the path to the original directory. It's important to pass this to parse_options(), since we may use it to fix up relative OPT_FILENAME() options. Some builtins don't bother; let's make sure we do so consistently. There may not be any particular bugs fixed here; OPT_FILENAME is actually pretty rare, and none of these commands use it directly. However, this does future-proof us against somebody adding an option that uses it and creating a subtle bug that only shows up when you're in a subdirectory of the repository. In some cases, like hash-object and upload-pack, we don't specify RUN_SETUP, so we know the prefix will always be empty. It's still worth passing the variable along to keep the idiom consistent across all builtins (and of course it protects us if they ever _did_ switch to using RUN_SETUP). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-18check_replace_refs: rename to read_replace_refsJeff King1-1/+1
This was added as a NEEDSWORK in c3c36d7de2 (replace-object: check_replace_refs is safe in multi repo environment, 2018-04-11), waiting for a calmer period. Since doing so now doesn't conflict with anything in 'pu', it seems as good a time as any. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08Merge branch 'bw/protocol-v2'Junio C Hamano1-0/+74
The beginning of the next-gen transfer protocol. * bw/protocol-v2: (35 commits) remote-curl: don't request v2 when pushing remote-curl: implement stateless-connect command http: eliminate "# service" line when using protocol v2 http: don't always add Git-Protocol header http: allow providing extra headers for http requests remote-curl: store the protocol version the server responded with remote-curl: create copy of the service name pkt-line: add packet_buf_write_len function transport-helper: introduce stateless-connect transport-helper: refactor process_connect_service transport-helper: remove name parameter connect: don't request v2 when pushing connect: refactor git_connect to only get the protocol version once fetch-pack: support shallow requests fetch-pack: perform a fetch using v2 upload-pack: introduce fetch server command push: pass ref prefixes when pushing fetch: pass ref prefixes when fetching ls-remote: pass ref prefixes when requesting a remote's refs transport: convert transport_get_remote_refs to take a list of ref prefixes ...
2018-03-15connect: request remote refs using v2Brandon Williams1-5/+5
Teach the client to be able to request a remote's refs using protocol v2. This is done by having a client issue a 'ls-refs' request to a v2 server. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14protocol: introduce enum protocol_version value protocol_v2Brandon Williams1-0/+7
Introduce protocol_v2, a new value for 'enum protocol_version'. Subsequent patches will fill in the implementation of protocol_v2. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14upload-pack: convert to a builtinBrandon Williams1-0/+67
In order to allow for code sharing with the server-side of fetch in protocol-v2 convert upload-pack to be a builtin. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>