aboutsummaryrefslogtreecommitdiffstats
path: root/builtin/fetch-pack.c
AgeCommit message (Collapse)AuthorFilesLines
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan1-1/+0
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-25Merge branch 'en/header-split-cache-h'Junio C Hamano1-0/+1
Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano1-0/+1
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-03-28builtins: mark unused prefix parametersJeff King1-1/+1
All builtins receive a "prefix" parameter, but it is only useful if they need to adjust filenames given by the user on the command line. For builtins that do not even call parse_options(), they often don't look at the prefix at all, and -Wunused-parameter complains. Let's annotate those to silence the compiler warning. I gave a quick scan of each of these cases, and it seems like they don't have anything they _should_ be using the prefix for (i.e., there is no hidden bug that we are missing). The only questionable cases I saw were: - in git-unpack-file, we create a tempfile which will always be at the root of the repository, even if the command is run from a subdir. Arguably this should be created in the subdir from which we're run (as we report the path only as a relative name). However, nobody has complained, and I'm hesitant to change something that is deep plumbing going back to April 2005 (though I think within our scripts, the sole caller in git-merge-one-file would be OK, as it moves to the toplevel itself). - in fetch-pack, local-filesystem remotes are taken as relative to the project root, not the current directory. So: git init server.git [...put stuff in server.git...] git init client.git cd client.git mkdir subdir cd subdir git fetch-pack ../../server.git ... won't work, as we quietly move to the top of the repository before interpreting the path (so "../server.git" would work). This is weird, but again, nobody has complained and this is how it has always worked. And this is how "git fetch" works, too. Plus it raises questions about how a configured remote like: git config remote.origin.url ../server.git should behave. I can certainly come up with a reasonable set of behavior, but it may not be worth stirring up complications in a plumbing tool. So I've left the behavior untouched in both of those cases. If anybody really wants to revisit them, it's easy enough to drop the UNUSED marker. This commit is just about removing them as obstacles to turning on -Wunused-parameter all the time. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28Merge branch 'jk/fix-proto-downgrade-to-v0'Junio C Hamano1-2/+2
Transports that do not support protocol v2 did not correctly fall back to protocol v0 under certain conditions, which has been corrected. * jk/fix-proto-downgrade-to-v0: git_connect(): fix corner cases in downgrading v2 to v0
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17git_connect(): fix corner cases in downgrading v2 to v0Jeff King1-2/+2
There's code in git_connect() that checks whether we are doing a push with protocol_v2, and if so, drops us to protocol_v0 (since we know how to do v2 only for fetches). But it misses some corner cases: 1. it checks the "prog" variable, which is actually the path to receive-pack on the remote side. By default this is just "git-receive-pack", but it could be an arbitrary string (like "/path/to/git receive-pack", etc). We'd accidentally stay in v2 mode in this case. 2. besides "receive-pack" and "upload-pack", there's one other value we'd expect: "upload-archive" for handling "git archive --remote". Like receive-pack, this doesn't understand v2, and should use the v0 protocol. In practice, neither of these causes bugs in the real world so far. We do send a "we understand v2" probe to the server, but since no server implements v2 for anything but upload-pack, it's simply ignored. But this would eventually become a problem if we do implement v2 for those endpoints, as older clients would falsely claim to understand it, leading to a server response they can't parse. We can fix (1) by passing in both the program path and the "name" of the operation. I treat the name as a string here, because that's the pattern set in transport_connect(), which is one of our callers (we were simply throwing away the "name" value there before). We can fix (2) by allowing only known-v2 protocols ("upload-pack"), rather than blocking unknown ones ("receive-pack" and "upload-archive"). That will mean whoever eventually implements v2 push will have to adjust this list, but that's reasonable. We'll do the safe, conservative thing (sticking to v0) by default, and anybody working on v2 will quickly realize this spot needs to be updated. The new tests cover the receive-pack and upload-archive cases above, and re-confirm that we allow v2 with an arbitrary "--upload-pack" path (that already worked before this patch, of course, but it would be an easy thing to break if we flipped the allow/block logic without also handling "name" separately). Here are a few miscellaneous implementation notes, since I had to do a little head-scratching to understand who calls what: - transport_connect() is called only for git-upload-archive. For non-http git remotes, that resolves to the virtual connect_git() function (which then calls git_connect(); confused yet?). So plumbing through "name" in connect_git() covers that. - for regular fetches and pushes, callers use higher-level functions like transport_fetch_refs(). For non-http git remotes, that means calling git_connect() under the hood via connect_setup(). And that uses the "for_push" flag to decide which name to use. - likewise, plumbing like fetch-pack and send-pack may call git_connect() directly; they each know which name to use. - for remote helpers (including http), we already have separate parameters for "name" and "exec" (another name for "prog"). In process_connect_service(), we feed the "name" to the helper via "connect" or "stateless-connect" directives. There's also a "servpath" option, which can be used to tell the helper about the "exec" path. But no helpers we implement support it! For http it would be useless anyway (no reasonable server implementation will allow you to send a shell command to run the server). In theory it would be useful for more obscure helpers like remote-ext, but even there it is not implemented. It's tempting to get rid of it simply to reduce confusion, but we have publicly documented it since it was added in fa8c097cc9 (Support remote helpers implementing smart transports, 2009-12-09), so it's possible some helper in the wild is using it. - So for v2, helpers (again, including http) are mainly used via stateless-connect, driven by the main program. But they do still need to decide whether to do a v2 probe. And so there's similar logic in remote-curl.c's discover_refs() that looks for "git-receive-pack". But it's not buggy in the same way. Since it doesn't support servpath, it is always dealing with a "service" string like "git-receive-pack". And since it doesn't support straight "connect", it can't be used for "upload-archive". So we could leave that spot alone. But I've updated it here to match the logic we're changing in connect_git(). That seems like the least confusing thing for somebody who has to touch both of these spots later (say, to add v2 push support). I didn't add a new test to make sure this doesn't break anything; we already have several tests (in t5551 and elsewhere) that make sure we are using v2 over http. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren1-0/+1
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-12list-objects-filter: add and use initializersJeff King1-0/+1
In 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec strings, 2022-09-08), we noted that the filter_spec string_list was inconsistent in how it handled memory ownership of strings stored in the list. The fix there was a bit of a band-aid to set the "strdup_strings" variable right before adding anything. That works OK, and it lets the users of the API continue to zero-initialize the struct. But it makes the code a bit hard to follow and accident-prone, as any other spots appending the filter_spec need to think about whether to set the strdup_strings value, too (there's one such spot in partial_clone_get_default_filter_spec(), which is probably a possible memory leak). So let's do that full cleanup now. We'll introduce a LIST_OBJECTS_FILTER_INIT macro and matching function, and use them as appropriate (though it is for the "_options" struct, this matches the corresponding list_objects_filter_release() function). This is harder than it seems! Many other structs, like git_transport_data, embed the filter struct. So they need to initialize it themselves even if the rest of the enclosing struct is OK with zero-initialization. I found all of the relevant spots by grepping manually for declarations of list_objects_filter_options. And then doing so recursively for structs which embed it, and ones which embed those, and so on. I'm pretty sure I got everything, but there's no change that would alert the compiler if any topics in flight added new declarations. To catch this case, we now double-check in the parsing function that things were initialized as expected and BUG() if appropriate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-04Merge branch 'rc/fetch-refetch'Junio C Hamano1-0/+4
"git fetch --refetch" learned to fetch everything without telling the other side what we already have, which is useful when you cannot trust what you have in the local object store. * rc/fetch-refetch: docs: mention --refetch fetch option fetch: after refetch, encourage auto gc repacking t5615-partial-clone: add test for fetch --refetch fetch: add --refetch option builtin/fetch-pack: add --refetch option fetch-pack: add refetch fetch-negotiator: add specific noop initializer
2022-03-28builtin/fetch-pack: add --refetch optionRobert Coup1-0/+4
Add a refetch option to fetch-pack to force a full fetch. Use when applying a new partial clone filter to refetch all matching objects. Signed-off-by: Robert Coup <robert@coup.net.nz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-23list-objects-filter: remove CL_ARG__FILTERDerrick Stolee1-2/+2
We have established the command-line interface for the --[no-]filter options for a while now, so we do not need a helper to make this editable in the future. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-05connect, transport: encapsulate arg in structJonathan Tan1-1/+2
In a future patch we plan to return the name of an unborn current branch from deep in the callchain to a caller via a new pointer parameter that points at a variable in the caller when the caller calls get_remote_refs() and transport_get_remote_refs(). In preparation for that, encapsulate the existing ref_prefixes parameter into a struct. The aforementioned unborn current branch will go into this new struct in the future patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18fetch-pack: remove no_dependents codeJonathan Tan1-4/+0
Now that Git has switched to using a subprocess to lazy-fetch missing objects, remove the no_dependents code as it is no longer used. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25Merge branch 'jt/cdn-offload'Junio C Hamano1-6/+11
The "fetch/clone" protocol has been updated to allow the server to instruct the clients to grab pre-packaged packfile(s) in addition to the packed object data coming over the wire. * jt/cdn-offload: upload-pack: fix a sparse '0 as NULL pointer' warning upload-pack: send part of packfile response as uri fetch-pack: support more than one pack lockfile upload-pack: refactor reading of pack-objects out Documentation: add Packfile URIs design doc Documentation: order protocol v2 sections http-fetch: support fetching packfiles by URL http-fetch: refactor into function http: refactor finish_http_pack_request() http: use --stdin when indexing dumb HTTP pack
2020-06-10fetch-pack: support more than one pack lockfileJonathan Tan1-6/+11
Whenever a fetch results in a packfile being downloaded, a .keep file is generated, so that the packfile can be preserved (from, say, a running "git repack") until refs are written referring to the contents of the packfile. In a subsequent patch, a successful fetch using protocol v2 may result in more than one .keep file being generated. Therefore, teach fetch_pack() and the transport mechanism to support multiple .keep files. Implementation notes: - builtin/fetch-pack.c normally does not generate .keep files, and thus is unaffected by this or future changes. However, it has an undocumented "--lock-pack" feature, used by remote-curl.c when implementing the "fetch" remote helper command. In keeping with the remote helper protocol, only one "lock" line will ever be written; the rest will result in warnings to stderr. However, in practice, warnings will never be written because the remote-curl.c "fetch" is only used for protocol v0/v1 (which will not generate multiple .keep files). (Protocol v2 uses the "stateless-connect" command, not the "fetch" command.) - connected.c has an optimization in that connectivity checks on a ref need not be done if the target object is in a pack known to be self-contained and connected. If there are multiple packfiles, this optimization can no longer be done. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24stateless-connect: send response end packetDenton Liu1-1/+1
Currently, remote-curl acts as a proxy and blindly forwards packets between an HTTP server and fetch-pack. In the case of a stateless RPC connection where the connection is terminated before the transaction is complete, remote-curl will blindly forward the packets before waiting on more input from fetch-pack. Meanwhile, fetch-pack will read the transaction and continue reading, expecting more input to continue the transaction. This results in a deadlock between the two processes. This can be seen in the following command which does not terminate: $ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... whereas the v1 version does terminate as expected: $ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... fatal: the remote end hung up unexpectedly Instead of blindly forwarding packets, make remote-curl insert a response end packet after proxying the responses from the remote server when using stateless_connect(). On the RPC client side, ensure that each response ends as described. A separate control packet is chosen because we need to be able to differentiate between what the remote server sends and remote-curl's control packets. By ensuring in the remote-curl code that a server cannot send response end packets, we prevent a malicious server from being able to perform a denial of service attack in which they spoof a response end packet and cause the described deadlock to happen. Reported-by: Force Charlie <charlieio@outlook.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30oid_array: rename source file from sha1-arrayJeff King1-1/+1
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to oid_array, 2017-03-31), but the file is still called sha1-array. Besides being slightly confusing, it makes it more annoying to grep for leftover occurrences of "sha1" in various files, because the header is included in so many places. Let's complete the transition by renaming the source and header files (and fixing up a few comment references). I kept the "-" in the name, as that seems to be our style; cf. fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10). We also have oidmap.h and oidset.h without any punctuation, but those are "struct oidmap" and "struct oidset" in the code. We _could_ make this "oidarray" to match, but somehow it looks uglier to me because of the length of "array" (plus it would be a very invasive patch for little gain). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20fetch_pack(): drop unused parametersJeff King1-1/+1
We don't need the caller of fetch_pack() to pass in "dest", which is the remote URL. Since ba227857d2 (Reduce the number of connects when fetching, 2008-02-04), the caller is responsible for calling git_connect() itself, and our "dest" parameter is unused. That commit also started passing us the resulting "conn" child_process from git_connect(). But likewise, we do not need do anything with it. The descriptors in "fd" are enough for us, and the caller is responsible for cleaning up "conn". We can just drop both parameters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-05Merge branch 'jt/fetch-v2-sideband'Junio C Hamano1-1/+2
"git fetch" and "git upload-pack" learned to send all exchange over the sideband channel while talking the v2 protocol. * jt/fetch-v2-sideband: tests: define GIT_TEST_SIDEBAND_ALL {fetch,upload}-pack: sideband v2 fetch response sideband: reverse its dependency on pkt-line pkt-line: introduce struct packet_writer pack-protocol.txt: accept error packets in any context Use packet_reader instead of packet_read_line
2019-01-29Merge branch 'jt/fetch-pack-v2'Junio C Hamano1-3/+6
"git fetch-pack" now can talk the version 2 protocol. * jt/fetch-pack-v2: fetch-pack: support protocol version 2
2019-01-10fetch-pack: support protocol version 2Jonathan Tan1-3/+6
When the scaffolding for protocol version 2 was initially added in 8f6982b4e1 ("protocol: introduce enum protocol_version value protocol_v2", 2018-03-14). As seen in: git log -p -G'support for protocol v2 not implemented yet' --full-diff --reverse v2.17.0..v2.20.0 Many of those scaffolding "die" placeholders were removed, but we hadn't gotten around to fetch-pack yet. The test here for "fetch refs from cmdline" is very minimal. There's much better coverage when running the entire test suite under the WIP GIT_TEST_PROTOCOL_VERSION=2 mode[1], we should ideally have better coverage without needing to invoke a special test mode. 1. https://public-inbox.org/git/20181213155817.27666-1-avarab@gmail.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02pack-protocol.txt: accept error packets in any contextMasaya Suzuki1-1/+2
In the Git pack protocol definition, an error packet may appear only in a certain context. However, servers can face a runtime error (e.g. I/O error) at an arbitrary timing. This patch changes the protocol to allow an error packet to be sent instead of any packet. Without this protocol spec change, when a server cannot process a request, there's no way to tell that to a client. Since the server cannot produce a valid response, it would be forced to cut a connection without telling why. With this protocol spec change, the server can be more gentle in this situation. An old client may see these error packets as an unexpected packet, but this is not worse than having an unexpected EOF. Following this protocol spec change, the error packet handling code is moved to pkt-line.c. Implementation wise, this implementation uses pkt-line to communicate with a subprocess. Since this is not a part of Git protocol, it's possible that a packet that is not supposed to be an error packet is mistakenly parsed as an error packet. This error packet handling is enabled only for the Git pack protocol parsing code considering this. Signed-off-by: Masaya Suzuki <masayasuzuki@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-15builtin/fetch-pack: remove constants with parse_oid_hexbrian m. carlson1-6/+7
Instead of using GIT_SHA1_HEXSZ, use parse_oid_hex to compute a pointer and use that in comparisons. This is both simpler to read and works independent of the hash length. Update references to SHA-1 in the same function to refer to object IDs instead. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08Merge branch 'bw/protocol-v2'Junio C Hamano1-2/+18
The beginning of the next-gen transfer protocol. * bw/protocol-v2: (35 commits) remote-curl: don't request v2 when pushing remote-curl: implement stateless-connect command http: eliminate "# service" line when using protocol v2 http: don't always add Git-Protocol header http: allow providing extra headers for http requests remote-curl: store the protocol version the server responded with remote-curl: create copy of the service name pkt-line: add packet_buf_write_len function transport-helper: introduce stateless-connect transport-helper: refactor process_connect_service transport-helper: remove name parameter connect: don't request v2 when pushing connect: refactor git_connect to only get the protocol version once fetch-pack: support shallow requests fetch-pack: perform a fetch using v2 upload-pack: introduce fetch server command push: pass ref prefixes when pushing fetch: pass ref prefixes when fetching ls-remote: pass ref prefixes when requesting a remote's refs transport: convert transport_get_remote_refs to take a list of ref prefixes ...
2018-03-15fetch-pack: perform a fetch using v2Brandon Williams1-1/+1
When communicating with a v2 server, perform a fetch by requesting the 'fetch' command. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14protocol: introduce enum protocol_version value protocol_v2Brandon Williams1-0/+2
Introduce protocol_v2, a new value for 'enum protocol_version'. Subsequent patches will fill in the implementation of protocol_v2. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14connect: discover protocol version outside of get_remote_headsBrandon Williams1-1/+15
In order to prepare for the addition of protocol_v2 push the protocol version discovery outside of 'get_remote_heads()'. This will allow for keeping the logic for processing the reference advertisement for protocol_v1 and protocol_v0 separate from the logic for protocol_v2. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08fetch: inherit filter-spec from partial cloneJeff Hostetler1-1/+1
Teach (partial) fetch to inherit the filter-spec used by the partial clone. Extend --no-filter to override this inheritance. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08fetch-pack: add --no-filterJeff Hostetler1-0/+4
Fixup fetch-pack to accept --no-filter to be consistent with rev-list and pack-objects. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08fetch-pack, index-pack, transport: partial cloneJeff Hostetler1-0/+4
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08sha1_file: support lazily fetching missing objectsJonathan Tan1-0/+2
Teach sha1_file to fetch objects from the remote configured in extensions.partialclone whenever an object is requested but missing. The fetching of objects can be suppressed through a global variable. This is used by fsck and index-pack. However, by default, such fetching is not suppressed. This is meant as a temporary measure to ensure that all Git commands work in such a situation. Future patches will update some commands to either tolerate missing objects (without fetching them) or be more efficient in fetching them. In order to determine the code changes in sha1_file.c necessary, I investigated the following: (1) functions in sha1_file.c that take in a hash, without the user regarding how the object is stored (loose or packed) (2) functions in packfile.c (because I need to check callers that know about the loose/packed distinction and operate on both differently, and ensure that they can handle the concept of objects that are neither loose nor packed) (1) is handled by the modification to sha1_object_info_extended(). For (2), I looked at for_each_packed_object and others. For for_each_packed_object, the callers either already work or are fixed in this patch: - reachable - only to find recent objects - builtin/fsck - already knows about missing objects - builtin/cat-file - warning message added in this commit Callers of the other functions do not need to be changed: - parse_pack_index - http - indirectly from http_get_info_packs - find_pack_entry_one - this searches a single pack that is provided as an argument; the caller already knows (through other means) that the sought object is in a specific pack - find_sha1_pack - fast-import - appears to be an optimization to not store a file if it is already in a pack - http-walker - to search through a struct alt_base - http-push - to search through remote packs - has_sha1_pack - builtin/fsck - already knows about promisor objects - builtin/count-objects - informational purposes only (check if loose object is also packed) - builtin/prune-packed - check if object to be pruned is packed (if not, don't prune it) - revision - used to exclude packed objects if requested by user - diff - just for optimization Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05introduce fetch-object: fetch one promisor objectJonathan Tan1-0/+8
Introduce fetch-object, providing the ability to fetch one object from a promisor remote. This uses fetch-pack. To do this, the transport mechanism has been updated with 2 flags, "from-promisor" to indicate that the resulting pack comes from a promisor remote (and thus should be annotated as such by index-pack), and "no-dependents" to indicate that only the objects themselves need to be fetched (but fetching additional objects is nevertheless safe). Whenever "no-dependents" is used, fetch-pack will refrain from using any object flags, because it is most likely invoked as part of a dynamic object fetch by another Git command (which may itself use object flags). An alternative to this is to leave fetch-pack alone, and instead update the allocation of flags so that fetch-pack's flags never overlap with any others, but this will end up shrinking the number of flags available to nearly every other Git command (that is, every Git command that accesses objects), so the approach in this commit was used instead. This will be tested in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31Rename sha1_array to oid_arraybrian m. carlson1-1/+1
Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02fetch-pack: move code to report unmatched refs to a functionMatt McCutchen1-6/+1
Prepare to reuse this code in transport.c for "git fetch". While we're here, internationalize the existing error message. Signed-off-by: Matt McCutchen <matt@mattmccutchen.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10Merge branch 'nd/shallow-deepen'Junio C Hamano1-6/+21
The existing "git fetch --depth=<n>" option was hard to use correctly when making the history of an existing shallow clone deeper. A new option, "--deepen=<n>", has been added to make this easier to use. "git clone" also learned "--shallow-since=<date>" and "--shallow-exclude=<tag>" options to make it easier to specify "I am interested only in the recent N months worth of history" and "Give me only the history since that version". * nd/shallow-deepen: (27 commits) fetch, upload-pack: --deepen=N extends shallow boundary by N commits upload-pack: add get_reachable_list() upload-pack: split check_unreachable() in two, prep for get_reachable_list() t5500, t5539: tests for shallow depth excluding a ref clone: define shallow clone boundary with --shallow-exclude fetch: define shallow boundary with --shallow-exclude upload-pack: support define shallow boundary by excluding revisions refs: add expand_ref() t5500, t5539: tests for shallow depth since a specific date clone: define shallow clone boundary based on time with --shallow-since fetch: define shallow boundary with --shallow-since upload-pack: add deepen-since to cut shallow repos based on time shallow.c: implement a generic shallow boundary finder based on rev-list fetch-pack: use a separate flag for fetch in deepening mode fetch-pack.c: mark strings for translating fetch-pack: use a common function for verbose printing fetch-pack: use skip_prefix() instead of starts_with() upload-pack: move rev-list code out of check_non_tip() upload-pack: make check_non_tip() clean things up on error upload-pack: tighten number parsing at "deepen" lines ...
2016-06-13fetch, upload-pack: --deepen=N extends shallow boundary by N commitsNguyễn Thái Ngọc Duy1-0/+4
In git-fetch, --depth argument is always relative with the latest remote refs. This makes it a bit difficult to cover this use case, where the user wants to make the shallow history, say 3 levels deeper. It would work if remote refs have not moved yet, but nobody can guarantee that, especially when that use case is performed a couple months after the last clone or "git fetch --depth". Also, modifying shallow boundary using --depth does not work well with clones created by --since or --not. This patch fixes that. A new argument --deepen=<N> will add <N> more (*) parent commits to the current history regardless of where remote refs are. Have/Want negotiation is still respected. So if remote refs move, the server will send two chunks: one between "have" and "want" and another to extend shallow history. In theory, the client could send no "want"s in order to get the second chunk only. But the protocol does not allow that. Either you send no want lines, which means ls-remote; or you have to send at least one want line that carries deep-relative to the server.. The main work was done by Dongcan Jiang. I fixed it up here and there. And of course all the bugs belong to me. (*) We could even support --deepen=<N> where <N> is negative. In that case we can cut some history from the shallow clone. This operation (and --depth=<shorter depth>) does not require interaction with remote side (and more complicated to implement as a result). Helped-by: Duy Nguyen <pclouds@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13fetch: define shallow boundary with --shallow-excludeNguyễn Thái Ngọc Duy1-0/+7
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13fetch: define shallow boundary with --shallow-sinceNguyễn Thái Ngọc Duy1-0/+4
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13fetch-pack: use skip_prefix() instead of starts_with()Nguyễn Thái Ngọc Duy1-6/+6
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-01fetch-pack: fix object_id of exact sha1Gabriel Souza Franco1-3/+13
Commit 58f2ed0 (remote-curl: pass ref SHA-1 to fetch-pack as well, 2013-12-05) added support for specifying a SHA-1 as well as a ref name. Add support for specifying just a SHA-1 and set the ref name to the same value in this case. Signed-off-by: Gabriel Souza Franco <gabrielfrancosouza@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-26Merge branch 'jk/tighten-alloc'Junio C Hamano1-18/+9
Update various codepaths to avoid manually-counted malloc(). * jk/tighten-alloc: (22 commits) ewah: convert to REALLOC_ARRAY, etc convert ewah/bitmap code to use xmalloc diff_populate_gitlink: use a strbuf transport_anonymize_url: use xstrfmt git-compat-util: drop mempcpy compat code sequencer: simplify memory allocation of get_message test-path-utils: fix normalize_path_copy output buffer size fetch-pack: simplify add_sought_entry fast-import: simplify allocation in start_packfile write_untracked_extension: use FLEX_ALLOC helper prepare_{git,shell}_cmd: use argv_array use st_add and st_mult for allocation size computation convert trivial cases to FLEX_ARRAY macros use xmallocz to avoid size arithmetic convert trivial cases to ALLOC_ARRAY convert manual allocations to argv_array argv-array: add detach function add helpers for allocating flex-array structs harden REALLOC_ARRAY and xcalloc against size_t overflow tree-diff: catch integer overflow in combine_diff_path allocation ...
2016-02-22fetch-pack: simplify add_sought_entryJeff King1-18/+9
We have two variants of this function, one that takes a string and one that takes a ptr/len combo. But we only call the latter with the length of a NUL-terminated string, so our first simplification is to drop it in favor of the string variant. Since we know we have a string, we can also replace the manual memory computation with a call to alloc_ref(). Furthermore, we can rely on get_oid_hex() to complain if it hits the end of the string. That means we can simplify the check for "<sha1> <ref>" versus just "<ref>". Rather than manage the ptr/len pair, we can just bump the start of our string forward. The original code over-allocated based on the original "namelen" (which wasn't _wrong_, but was simply wasteful and confusing). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-15strbuf: introduce strbuf_getline_{lf,nul}()Junio C Hamano1-1/+1
The strbuf_getline() interface allows a byte other than LF or NUL as the line terminator, but this is only because I wrote these codepaths anticipating that there might be a value other than NUL and LF that could be useful when I introduced line_termination long time ago. No useful caller that uses other value has emerged. By now, it is clear that the interface is overly broad without a good reason. Many codepaths have hardcoded preference to read either LF terminated or NUL terminated records from their input, and then call strbuf_getline() with LF or NUL as the third parameter. This step introduces two thin wrappers around strbuf_getline(), namely, strbuf_getline_lf() and strbuf_getline_nul(), and mechanically rewrites these call sites to call either one of them. The changes contained in this patch are: * introduction of these two functions in strbuf.[ch] * mechanical conversion of all callers to strbuf_getline() with either '\n' or '\0' as the third parameter to instead call the respective thin wrapper. After this step, output from "git grep 'strbuf_getline('" would become a lot smaller. An interim goal of this series is to make this an empty set, so that we can have strbuf_getline_crlf() take over the shorter name strbuf_getline(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-11-20add_sought_entry_mem: convert to struct object_idbrian m. carlson1-6/+8
Convert this function to use struct object_id. Express several hardcoded constants in terms of GIT_SHA1_HEXSZ. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-11-20Convert struct ref to use object_id.brian m. carlson1-2/+2
Use struct object_id in three fields in struct ref and convert all the necessary places that use it. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-01-14standardize usage info string formatAlex Henrie1-1/+1
This patch puts the usage info strings that were not already in docopt- like format into docopt-like format, which will be a litle easier for end users and a lot easier for translators. Changes include: - Placing angle brackets around fill-in-the-blank parameters - Putting dashes in multiword parameter names - Adding spaces to [-f|--foobar] to make [-f | --foobar] - Replacing <foobar>* with [<foobar>...] Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-17Merge branch 'nd/shallow-clone'Junio C Hamano1-3/+20
Fetching from a shallow-cloned repository used to be forbidden, primarily because the codepaths involved were not carefully vetted and we did not bother supporting such usage. This attempts to allow object transfer out of a shallow-cloned repository in a controlled way (i.e. the receiver become a shallow repository with truncated history). * nd/shallow-clone: (31 commits) t5537: fix incorrect expectation in test case 10 shallow: remove unused code send-pack.c: mark a file-local function static git-clone.txt: remove shallow clone limitations prune: clean .git/shallow after pruning objects clone: use git protocol for cloning shallow repo locally send-pack: support pushing from a shallow clone via http receive-pack: support pushing to a shallow clone via http smart-http: support shallow fetch/clone remote-curl: pass ref SHA-1 to fetch-pack as well send-pack: support pushing to a shallow clone receive-pack: allow pushes that update .git/shallow connected.c: add new variant that runs with --shallow-file add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses receive/send-pack: support pushing from a shallow clone receive-pack: reorder some code in unpack() fetch: add --update-shallow to accept refs that update .git/shallow upload-pack: make sure deepening preserves shallow roots fetch: support fetching from a shallow repository clone: support remote shallow repository ...
2013-12-17Merge branch 'tb/clone-ssh-with-colon-for-port'Junio C Hamano1-3/+11
Be more careful when parsing remote repository URL given in the scp-style host:path notation. * tb/clone-ssh-with-colon-for-port: git_connect(): use common return point connect.c: refactor url parsing git_connect(): refactor the port handling for ssh git fetch: support host:/~repo t5500: add test cases for diag-url git fetch-pack: add --diag-url git_connect: factor out discovery of the protocol and its parts git_connect: remove artificial limit of a remote command t5601: add tests for ssh t5601: remove clear_ssh, refactor setup_ssh_wrapper
2013-12-10smart-http: support shallow fetch/cloneNguyễn Thái Ngọc Duy1-3/+13
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10remote-curl: pass ref SHA-1 to fetch-pack as wellNguyễn Thái Ngọc Duy1-0/+7
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10clone: support remote shallow repositoryNguyễn Thái Ngọc Duy1-1/+1
Cloning from a shallow repository does not follow the "8 steps for new .git/shallow" because if it does we need to get through step 6 for all refs. That means commit walking down to the bottom. Instead the rule to create .git/shallow is simpler and, more importantly, cheap: if a shallow commit is found in the pack, it's probably used (i.e. reachable from some refs), so we add it. Others are dropped. One may notice this method seems flawed by the word "probably". A shallow commit may not be reachable from any refs at all if it's attached to an object island (a group of objects that are not reachable by any refs). If that object island is not complete, a new fetch request may send more objects to connect it to some ref. At that time, because we incorrectly installed the shallow commit in this island, the user will not see anything after that commit (fsck is still ok). This is not desired. Given that object islands are rare (C Git never sends such islands for security reasons) and do not really harm the repository integrity, a tradeoff is made to surprise the user occasionally but work faster everyday. A new option --strict could be added later that follows exactly the 8 steps. "git prune" can also learn to remove dangling objects _and_ the shallow commits that are attached to them from .git/shallow. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10connect.c: teach get_remote_heads to parse "shallow" linesNguyễn Thái Ngọc Duy1-1/+1
No callers pass a non-empty pointer as shallow_points at this stage. As a result, all clients still refuse to talk to shallow repository on the other end. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-09git fetch-pack: add --diag-urlTorsten Bögershausen1-3/+11
The main purpose is to trace the URL parser called by git_connect() in connect.c The main features of the parser can be listed as this: - parse out host and path for URLs with a scheme (git:// file:// ssh://) - parse host names embedded by [] correctly - extract the port number, if present - separate URLs like "file" (which are local) from URLs like "host:repo" which should use ssh Add the new parameter "--diag-url" to "git fetch-pack", which prints the value for protocol, host and path to stderr and exits. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Christian Couder1-3/+3
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-09Merge branch 'jc/push-cas'Junio C Hamano1-0/+2
Allow a safer "rewind of the remote tip" push than blind "--force", by requiring that the overwritten remote ref to be unchanged since the new history to replace it was prepared. The machinery is more or less ready. The "--force" option is again the big red button to override any safety, thanks to J6t's sanity (the original round allowed --lockref to defeat --force). The logic to choose the default implemented here is fragile (e.g. "git fetch" after seeing a failure will update the remote-tracking branch and will make the next "push" pass, defeating the safety pretty easily). It is suitable only for the simplest workflows, and it may hurt users more than it helps them. * jc/push-cas: push: teach --force-with-lease to smart-http transport send-pack: fix parsing of --force-with-lease option t5540/5541: smart-http does not support "--force-with-lease" t5533: test "push --force-with-lease" push --force-with-lease: tie it all together push --force-with-lease: implement logic to populate old_sha1_expect[] remote.c: add command line option parser for "--force-with-lease" builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN cache.h: move remote/connect API out of it
2013-07-23smart http: use the same connectivity check on cloningNguyễn Thái Ngọc Duy1-0/+9
This is an extension of c6807a4 (clone: open a shortcut for connectivity check - 2013-05-26) to reduce the cost of connectivity check at clone time, this time with smart http protocol. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-08cache.h: move remote/connect API out of itJunio C Hamano1-0/+2
The definition of "struct ref" in "cache.h", a header file so central to the system, always confused me. This structure is not about the local ref used by sha1-name API to name local objects. It is what refspecs are expanded into, after finding out what refs the other side has, to define what refs are updated after object transfer succeeds to what values. It belongs to "remote.h" together with "struct refspec". While we are at it, also move the types and functions related to the Git transport connection to a new header file connect.h Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-01Merge branch 'jk/pkt-line-cleanup'Junio C Hamano1-7/+4
Clean up pkt-line API, implementation and its callers to make them more robust. * jk/pkt-line-cleanup: do not use GIT_TRACE_PACKET=3 in tests remote-curl: always parse incoming refs remote-curl: move ref-parsing code up in file remote-curl: pass buffer straight to get_remote_heads teach get_remote_heads to read from a memory buffer pkt-line: share buffer/descriptor reading implementation pkt-line: provide a LARGE_PACKET_MAX static buffer pkt-line: move LARGE_PACKET_MAX definition from sideband pkt-line: teach packet_read_line to chomp newlines pkt-line: provide a generic reading function with options pkt-line: drop safe_write function pkt-line: move a misplaced comment write_or_die: raise SIGPIPE when we get EPIPE upload-archive: use argv_array to store client arguments upload-archive: do not copy repo name send-pack: prefer prefixcmp over memcmp in receive_status fetch-pack: fix out-of-bounds buffer offset in get_ack upload-pack: remove packet debugging harness upload-pack: do not add duplicate objects to shallow list upload-pack: use get_sha1_hex to parse "shallow" lines
2013-02-24teach get_remote_heads to read from a memory bufferJeff King1-1/+1
Now that we can read packet data from memory as easily as a descriptor, get_remote_heads can take either one as a source. This will allow further refactoring in remote-curl. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20pkt-line: provide a LARGE_PACKET_MAX static bufferJeff King1-4/+3
Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20pkt-line: teach packet_read_line to chomp newlinesJeff King1-2/+0
The packets sent during ref negotiation are all terminated by newline; even though the code to chomp these newlines is short, we end up doing it in a lot of places. This patch teaches packet_read_line to auto-chomp the trailing newline; this lets us get rid of a lot of inline chomping code. As a result, some call-sites which are not reading line-oriented data (e.g., when reading chunks of packfiles alongside sideband) transition away from packet_read_line to the generic packet_read interface. This patch converts all of the existing callsites. Since the function signature of packet_read_line does not change (but its behavior does), there is a possibility of new callsites being introduced in later commits, silently introducing an incompatibility. However, since a later patch in this series will change the signature, such a commit would have to be merged directly into this commit, not to the tip of the series; we can therefore ignore the issue. This is an internal cleanup and should produce no change of behavior in the normal case. However, there is one corner case to note. Callers of packet_read_line have never been able to tell the difference between a flush packet ("0000") and an empty packet ("0004"), as both cause packet_read_line to return a length of 0. Readers treat them identically, even though Documentation/technical/protocol-common.txt says we must not; it also says that implementations should not send an empty pkt-line. By stripping out the newline before the result gets to the caller, we will now treat the newline-only packet ("0005\n") the same as an empty packet, which in turn gets treated like a flush packet. In practice this doesn't matter, as neither empty nor newline-only packets are part of git's protocols (at least not for the line-oriented bits, and readers who are not expecting line-oriented packets will be calling packet_read directly, anyway). But even if we do decide to care about the distinction later, it is orthogonal to this patch. The right place to tighten would be to stop treating empty packets as flush packets, and this change does not make doing so any harder. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-07fetch: use struct ref to represent refs to be fetchedJunio C Hamano1-8/+32
Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-29fetch-pack: move core code to libgit.aNguyễn Thái Ngọc Duy1-948/+0
fetch_pack() is used by transport.c, part of libgit.a while it stays in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get undefined reference if a program that uses libgit.a happens to pull it in. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net>
2012-10-29fetch-pack: remove global (static) configuration variable "args"Nguyễn Thái Ngọc Duy1-77/+83
This helps removes the hack in fetch_pack() that copies my_args to args. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net>
2012-09-12fetch-pack: eliminate spurious error messagesMichael Haggerty1-5/+5
It used to be that if "--all", "--depth", and also explicit references were sought, then the explicit references were not handled correctly in filter_refs() because the "--all --depth" code took precedence over the explicit reference handling, and the explicit references were never noted as having been found. So check for explicitly sought references before proceeding to the "--all --depth" logic. This fixes two test cases in t5500. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12cmd_fetch_pack(): simplify computation of return valueMichael Haggerty1-11/+10
Set the final value at initialization rather than initializing it then sometimes changing it. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12fetch-pack: report missing refs even if no existing refs were receivedMichael Haggerty1-1/+1
This fixes a test in t5500. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12cmd_fetch_pack(): return early if finish_connect() failsMichael Haggerty1-3/+3
This simplifies the logic without changing the behavior. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12filter_refs(): simplify logicMichael Haggerty1-19/+18
Simplify flow within loop: first decide whether to keep the reference, then keep/free it. This makes it clearer that each ref has exactly two possible destinies, and removes duplication of the code for appending the reference to the linked list. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12filter_refs(): build refs list as we goMichael Haggerty1-27/+4
Instead of temporarily storing matched refs to temporary array "return_refs", simply append them to newlist as we go. This changes the order of references in newlist to strictly sorted if "--all" and "--depth" and named references are all specified, but that usage is broken anyway (see the last two tests in t5500). This changes the last test in t5500 from segfaulting into just emitting a spurious error (this will be fixed in a moment). Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12filter_refs(): delete matched refs from sought listMichael Haggerty1-8/+15
Remove any references that are available from the remote from the sought list (rather than overwriting their names with NUL characters, as previously). Mark matching entries by writing a non-NULL pointer to string_list_item::util during the iteration, then use filter_string_list() later to filter out the entries that have been marked. Document this aspect of fetch_pack() in a comment in the header file. (More documentation is obviously still needed.) Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12fetch_pack(): update sought->nr to reflect number of unique entriesMichael Haggerty1-14/+1
fetch_pack() removes duplicates from the "sought" list, thereby shrinking the list. But previously, the caller was not informed about the shrinkage. This would cause a spurious error message to be emitted by cmd_fetch_pack() if "git fetch-pack" is called with duplicate refnames. Instead, remove duplicates using string_list_remove_duplicates(), which adjusts sought->nr to reflect the new length of the list. The last test of t5500 inexplicably *required* "git fetch-pack" to fail when fetching a list of references that contains duplicates; i.e., it insisted on the buggy behavior. So change the test to expect the correct behavior. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12filter_refs(): do not check the same sought_pos twiceMichael Haggerty1-1/+1
Once a match has been found at sought_pos, the entry is zeroed and no future attempts will match that entry. So increment sought_pos to avoid checking against the zeroed-out entry during the next iteration. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12Change fetch_pack() and friends to take string_list argumentsMichael Haggerty1-49/+39
Instead of juggling <nr_heads,heads> (sometimes called <nr_match,match>), pass around the list of references to be sought in a single string_list variable called "sought". Future commits will make more use of string_list functionality. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12fetch_pack(): reindent function decl and defnMichael Haggerty1-4/+4
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-13fetch-pack: mention server version with verbose outputJeff King1-1/+8
Fetch-pack's verbose mode is more of a debugging mode (and in fact takes two "-v" arguments to trigger via the porcelain layer). Let's mention the server version as another possible item of interest. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-10fetch-pack: do not ask for unadvertised capabilitiesJunio C Hamano1-0/+6
In the same spirit as the previous fix, stop asking for thin-pack, no-progress and include-tag capabilities when the other end does not claim to support them. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-10do not send client agent unless server does firstJeff King1-1/+6
Commit ff5effdf taught both clients and servers of the git protocol to send an "agent" capability that just advertises their version for statistics and debugging purposes. The protocol-capabilities.txt document however indicates that the client's advertisement is actually a response, and should never include capabilities not mentioned in the server's advertisement. Adding the unconditional advertisement in the server programs was OK, then, but the clients broke the protocol. The server implementation of git-core itself does not care, but at least one does: the Google Code git server (or any server using Dulwich), will hang up with an internal error upon seeing an unknown capability. Instead, each client must record whether we saw an agent string from the server, and respond with its agent only if the server mentioned it first. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-03include agent identifier in capability stringJeff King1-0/+2
Instead of having the client advertise a particular version number in the git protocol, we have managed extensions and backwards compatibility by having clients and servers advertise capabilities that they support. This is far more robust than having each side consult a table of known versions, and provides sufficient information for the protocol interaction to complete. However, it does not allow servers to keep statistics on which client versions are being used. This information is not necessary to complete the network request (the capabilities provide enough information for that), but it may be helpful to conduct a general survey of client versions in use. We already send the client version in the user-agent header for http requests; adding it here allows us to gather similar statistics for non-http requests. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-29Merge branch 'jk/fetch-pack-remove-dups-optim'Junio C Hamano1-22/+30
The way "fetch-pack" that is given multiple references to fetch tried to remove duplicates was very inefficient. By Jeff King * jk/fetch-pack-remove-dups-optim: fetch-pack: sort incoming heads list earlier fetch-pack: avoid quadratic loop in filter_refs fetch-pack: sort the list of incoming refs add sorting infrastructure for list refs fetch-pack: avoid quadratic behavior in remove_duplicates fetch-pack: sort incoming heads
2012-05-29Merge branch 'mh/fetch-pack-constness'Junio C Hamano1-74/+71
Tighten constness of some local variables in a callchain. By Michael Haggerty * mh/fetch-pack-constness: cmd_fetch_pack(): respect constness of argv parameter cmd_fetch_pack(): combine the loop termination conditions cmd_fetch_pack(): handle non-option arguments outside of the loop cmd_fetch_pack(): declare dest to be const
2012-05-24fetch-pack: sort incoming heads list earlierJeff King1-1/+1
Commit 4435968 started sorting heads fed to fetch-pack so that later commits could use more optimized algorithms; commit 7db8d53 switched the remove_duplicates function to such an algorithm. Of course, the sorting is more effective if you do it _before_ the algorithm in question. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22fetch-pack: avoid quadratic loop in filter_refsJeff King1-6/+13
We have a list of refs that we want to compare against the "match" array. The current code searches the match list linearly, giving quadratic behavior over the number of refs when you want to fetch all of them. Instead, we can compare the lists as we go, giving us linear behavior. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22fetch-pack: sort the list of incoming refsJeff King1-0/+2
Having the list sorted means we can avoid some quadratic algorithms when comparing lists. These should typically be sorted already, but they do come from the remote, so let's be extra careful. Our ref-sorting implementation does a mergesort, so we do not have to care about performance degrading in the common case that the list is already sorted. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22fetch-pack: avoid quadratic behavior in remove_duplicatesJeff King1-15/+6
We remove duplicate entries from the list of refs we are fed in fetch-pack. The original algorithm is quadratic over the number of refs, but since the list is now guaranteed to be sorted, we can do it in linear time. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22fetch-pack: sort incoming headsJeff King1-1/+9
There's no reason to preserve the incoming order of the heads we're requested to fetch. By having them sorted, we can replace some of the quadratic algorithms with linear ones. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22cmd_fetch_pack(): respect constness of argv parameterMichael Haggerty1-13/+10
The old code cast away the constness of the strings passed to the function in argument argv[], which could result in their being modified by filter_refs(). Fix by copying reference names from argv and putting them into our own array (similarly to how refnames passed to stdin were already handled). Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22cmd_fetch_pack(): combine the loop termination conditionsMichael Haggerty1-58/+55
If an argument that does not start with '-' is found, the loop is terminated. So move that check into the for-loop condition. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22cmd_fetch_pack(): handle non-option arguments outside of the loopMichael Haggerty1-5/+7
This makes it more obvious that the code is always executed unless there is an error, and that the first initialization of nr_heads is unnecessary. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22cmd_fetch_pack(): declare dest to be constMichael Haggerty1-3/+4
There is no need for it to be non-const, and this avoids the need for casting away the constness of an argv element. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-24Merge branch 'it/fetch-pack-many-refs'Junio C Hamano1-1/+41
When "git fetch" encounters repositories with too many references, the command line of "fetch-pack" that is run by a helper e.g. remote-curl, may fail to hold all of them. Now such an internal invocation can feed the references through the standard input of "fetch-pack". By Ivan Todoroski * it/fetch-pack-many-refs: remote-curl: main test case for the OS command line overflow fetch-pack: test cases for the new --stdin option remote-curl: send the refs to fetch-pack on stdin fetch-pack: new --stdin option to read refs from stdin
2012-04-02fetch-pack: new --stdin option to read refs from stdinIvan Todoroski1-1/+41
If a remote repo has too many tags (or branches), cloning it over the smart HTTP transport can fail because remote-curl.c puts all the refs from the remote repo on the fetch-pack command line. This can make the command line longer than the global OS command line limit, causing fetch-pack to fail. This is especially a problem on Windows where the command line limit is orders of magnitude shorter than Linux. There are already real repos out there that msysGit cannot clone over smart HTTP due to this problem. Here is an easy way to trigger this problem: git init too-many-refs cd too-many-refs echo bla > bla.txt git add . git commit -m test sha=$(git rev-parse HEAD) tag=$(perl -e 'print "bla" x 30') for i in `seq 50000`; do echo $sha refs/tags/$tag-$i >> .git/packed-refs done Then share this repo over the smart HTTP protocol and try cloning it: $ git clone http://localhost/.../too-many-refs/.git Cloning into 'too-many-refs'... fatal: cannot exec 'fetch-pack': Argument list too long 50k tags is obviously an absurd number, but it is required to demonstrate the problem on Linux because it has a much more generous command line limit. On Windows the clone fails with as little as 500 tags in the above loop, which is getting uncomfortably close to the number of tags you might see in real long lived repos. This is not just theoretical, msysGit is already failing to clone our company repo due to this. It's a large repo converted from CVS, nearly 10 years of history. Four possible solutions were discussed on the Git mailing list (in no particular order): 1) Call fetch-pack multiple times with smaller batches of refs. This was dismissed as inefficient and inelegant. 2) Add option --refs-fd=$n to pass a an fd from where to read the refs. This was rejected because inheriting descriptors other than stdin/stdout/stderr through exec() is apparently problematic on Windows, plus it would require changes to the run-command API to open extra pipes. 3) Add option --refs-from=$tmpfile to pass the refs using a temp file. This was not favored because of the temp file requirement. 4) Add option --stdin to pass the refs on stdin, one per line. In the end this option was chosen as the most efficient and most desirable from scripting perspective. There was however a small complication when using stdin to pass refs to fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for communication with the remote server. If we are going to sneak refs on stdin line by line, it would have to be done very carefully in the presence of --stateless-rpc, because when reading refs line by line we might read ahead too much data into our buffer and eat some of the remote protocol data which is also coming on stdin. One way to solve this would be to refactor get_remote_heads() in fetch-pack.c to accept a residual buffer from our stdin line parsing above, but this function is used in several places so other callers would be burdened by this residual buffer interface even when most of them don't need it. In the end we settled on the following solution: If --stdin is specified without --stateless-rpc, fetch-pack would read the refs from stdin one per line, in a script friendly format. However if --stdin is specified together with --stateless-rpc, fetch-pack would read the refs from stdin in packetized format (pkt-line) with a flush packet terminating the list of refs. This way we can read the exact number of bytes that we need from stdin, and then get_remote_heads() can continue reading from the same fd without losing a single byte of remote protocol data. This way the --stdin option only loses generality and scriptability when used together with --stateless-rpc, which is not easily scriptable anyway because it also uses pkt-line when talking to the remote server. Signed-off-by: Ivan Todoroski <grnch@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-20Merge branch 'cb/transfer-no-progress'Junio C Hamano1-1/+1
* cb/transfer-no-progress: push/fetch/clone --no-progress suppresses progress output
2012-02-13push/fetch/clone --no-progress suppresses progress outputClemens Buchacher1-1/+1
By default, progress output is disabled if stderr is not a terminal. The --progress option can be used to force progress output anyways. Conversely, --no-progress does not force progress output. In particular, if stderr is a terminal, progress output is enabled. This is unintuitive. Change --no-progress to force output off. Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-12everything_local(): mark alternate refs as completeMichael Haggerty1-0/+6
Objects in an alternate object database are already available to the local repository and therefore don't need to be fetched. So mark them as complete in everything_local(). This fixes a test in t5700. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-12fetch-pack.c: inline insert_alternate_refs()Michael Haggerty1-6/+1
The logic of the (single) caller is clearer without encapsulating this one line in a function. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-12fetch-pack.c: rename some parameters from "path" to "refname"Michael Haggerty1-5/+5
The parameters denote reference names, which are no longer 1:1 with filesystem paths. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-13fetch-pack: match refs exactlyJeff King1-4/+9
When we are determining the list of refs to fetch via fetch-pack, we have two sets of refs to compare: those on the remote side, and a "match" list of things we want to fetch. We iterate through the remote refs alphabetically, seeing if each one is wanted by the "match" list. Since def88e9 (Commit first cut at "git-fetch-pack", 2005-07-04), we have used the "path_match" function to do a suffix match, where a remote ref is considered wanted if any of the "match" elements is a suffix of the remote refname. This enables callers of fetch-pack to specify unqualified refs and have them matched up with remote refs (e.g., ask for "A" and get remote's "refs/heads/A"). However, if you provide a fully qualified ref, then there are corner cases where we provide the wrong answer. For example, given a remote with two refs: refs/foo/refs/heads/master refs/heads/master asking for "refs/heads/master" will first match "refs/foo/refs/heads/master" by the suffix rule, and we will erroneously fetch it instead of refs/heads/master. As it turns out, all callers of fetch_pack do provide fully-qualified refs for the match list. There are two ways fetch_pack can get match lists: 1. Through the transport code (i.e., via git-fetch) 2. On the command-line of git-fetch-pack In the first case, we will always be providing the names of fully-qualified refs from "struct ref" objects. We will have pre-matched those ref objects already (since we have to handle more advanced matching, like wildcard refspecs), and are just providing a list of the refs whose objects we need. In the second case, users could in theory be providing non-qualified refs on the command-line. However, the fetch-pack documentation claims that refs should be fully qualified (and has always done so since it was written in 2005). Let's change this path_match call to simply check for string equality, matching what the callers of fetch_pack are expecting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-13drop "match" parameter from get_remote_headsJeff King1-1/+1
The get_remote_heads function reads the list of remote refs during git protocol session. It dates all the way back to def88e9 (Commit first cut at "git-fetch-pack", 2005-07-04). At that time, the idea was to come up with a list of refs we were interested in, and then filter the list as we got it from the remote side. Later, 1baaae5 (Make maximal use of the remote refs, 2005-10-28) stopped filtering at the get_remote_heads layer, letting us use the non-matching refs to find common history. As a result, all callers now simply pass an empty match list (and any future callers will want to do the same). So let's drop these now-useless parameters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-10Merge branch 'mh/check-ref-format-3'Junio C Hamano1-1/+1
* mh/check-ref-format-3: (23 commits) add_ref(): verify that the refname is formatted correctly resolve_ref(): expand documentation resolve_ref(): also treat a too-long SHA1 as invalid resolve_ref(): emit warnings for improperly-formatted references resolve_ref(): verify that the input refname has the right format remote: avoid passing NULL to read_ref() remote: use xstrdup() instead of strdup() resolve_ref(): do not follow incorrectly-formatted symbolic refs resolve_ref(): extract a function get_packed_ref() resolve_ref(): turn buffer into a proper string as soon as possible resolve_ref(): only follow a symlink that contains a valid, normalized refname resolve_ref(): use prefixcmp() resolve_ref(): explicitly fail if a symlink is not readable Change check_refname_format() to reject unnormalized refnames Inline function refname_format_print() Make collapse_slashes() allocate memory for its result Do not allow ".lock" at the end of any refname component Refactor check_refname_format() Change check_ref_format() to take a flags argument Change bad_ref_char() to return a boolean value ...
2011-10-05Change check_ref_format() to take a flags argumentMichael Haggerty1-1/+1
Change check_ref_format() to take a flags argument that indicates what is acceptable in the reference name (analogous to "git check-ref-format"'s "--allow-onelevel" and "--refspec-pattern"). This is more convenient for callers and also fixes a failure in the test suite (and likely elsewhere in the code) by enabling "onelevel" and "refspec-pattern" to be allowed independently of each other. Also rename check_ref_format() to check_refname_format() to make it obvious that it deals with refnames rather than references themselves. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-05Merge branch 'jc/fetch-pack-fsck-objects'Junio C Hamano1-1/+19
* jc/fetch-pack-fsck-objects: test: fetch/receive with fsckobjects transfer.fsckobjects: unify fetch/receive.fsckobjects fetch.fsckobjects: verify downloaded objects Conflicts: Documentation/config.txt builtin/fetch-pack.c
2011-09-04transfer.fsckobjects: unify fetch/receive.fsckobjectsJunio C Hamano1-2/+12
This single variable can be used to set instead of setting fsckobjects variable for fetch & receive independently. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-04fetch.fsckobjects: verify downloaded objectsJunio C Hamano1-0/+8
This corresponds to receive.fsckobjects configuration variable added (a lot) earlier in 20dc001 (receive-pack: allow using --strict mode for unpacking objects, 2008-02-25). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-28Merge branch 'nd/decorate-grafts'Junio C Hamano1-0/+30
* nd/decorate-grafts: log: Do not decorate replacements with --no-replace-objects log: decorate "replaced" on to replaced commits log: decorate grafted commits with "grafted" Move write_shallow_commits to fetch-pack.c Add for_each_commit_graft() to iterate all grafts decoration: do not mis-decorate refs with same prefix
2011-08-18fetch-pack: check for valid commit from serverNguyễn Thái Ngọc Duy1-0/+2
A malicious server can return ACK with non-existent SHA-1 or not a commit. lookup_commit() in this case may return NULL. Do not let fetch-pack crash by accessing NULL address in this case. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-18Move write_shallow_commits to fetch-pack.cNguyễn Thái Ngọc Duy1-0/+30
This function produces network traffic and should be in fetch-pack. It has been in commit.c because it needs to iterate (private) graft list. It can now do so using for_each_commit_graft(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-29Merge branch 'jk/haves-from-alternate-odb'Junio C Hamano1-1/+1
* jk/haves-from-alternate-odb: receive-pack: eliminate duplicate .have refs bisect: refactor sha1_array into a generic sha1 list refactor refs_from_alternate_cb to allow passing extra data
2011-05-19refactor refs_from_alternate_cb to allow passing extra dataJeff King1-1/+1
The foreach_alt_odb function triggers a callback for each alternate object db we have, with room for a single void pointer as data. Currently, we always call refs_from_alternate_cb as the callback function, and then pass another callback (to receive each ref individually) as the void pointer. This has two problems: 1. C technically forbids stuffing a function pointer into a "void *". In practice, this probably doesn't matter on any architectures git runs on, but it never hurts to follow the letter of the law. 2. There is no room for an extra data pointer. Indeed, the alternate_ref_fn that refs_from_alternate_cb calls takes a void* for data, but we always pass it NULL. Instead, let's properly stuff our function pointer into a data struct, which also leaves room for an extra caller-supplied data pointer. And to keep things simple for existing callers, let's make a for_each_alternate_ref function that takes care of creating the extra struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-19fetch: avoid repeated commits in mark_completeJeff King1-2/+4
We add every local ref to a list so that we can mark them and all of their ancestors back to a certain cutoff point. However, if some refs point to the same commit, we will end up adding them to the list many times. Furthermore, since commit_lists are stored as linked lists, we must do an O(n) traversal of the list in order to find the right place to insert each commit. This makes building the list O(n^2) in the number of refs. For normal repositories, this isn't a big deal. We have a few hundreds refs at most, and most of them are unique. But consider an "alternates" repo that serves as an object database for many other similar repos. For reachability, it needs to keep a copy of the refs in each child repo. This means it may have a large number of refs, many of which point to the same commits. By noting commits we have already added to the list, we can shrink the size of "n" in such a repo to the number of unique commits, which is on the order of what a normal repo would contain (it's actually more than a normal repo, since child repos may have branches at different states, but in practice it tends to be much smaller than the list with duplicates). Here are the results on one particular giant repo (containing objects for all Rails forks on GitHub): $ git for-each-ref | wc -l 112514 [before] $ git fetch --no-tags ../remote.git 63.52user 0.12system 1:03.68elapsed 99%CPU (0avgtext+0avgdata 137648maxresident)k 1856inputs+48outputs (11major+19603minor)pagefaults 0swaps $ git fetch --no-tags ../remote.git 6.15user 0.08system 0:06.25elapsed 99%CPU (0avgtext+0avgdata 123856maxresident)k 0inputs+40outputs (0major+18872minor)pagefaults 0swaps Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-29Merge branch 'jc/fetch-progressive-stride'Junio C Hamano1-4/+5
* jc/fetch-progressive-stride: Fix potential local deadlock during fetch-pack
2011-03-29Merge branches 'sp/maint-fetch-pack-stop-early' and ↵Junio C Hamano1-1/+2
'sp/maint-upload-pack-stop-early' * sp/maint-fetch-pack-stop-early: enable "no-done" extension only when fetching over smart-http * sp/maint-upload-pack-stop-early: enable "no-done" extension only when serving over smart-http
2011-03-29Revert two "no-done" revertsJunio C Hamano1-3/+15
Last night I had to make these two emergency reverts, but now we have a better understanding of which part of the topic was broken, let's get rid of the revert to fix it correctly. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-29Fix potential local deadlock during fetch-packJunio C Hamano1-4/+5
The fetch-pack/upload-pack protocol relies on the underlying transport (local pipe or TCP socket) to have enough slack to allow one window worth of data in flight without blocking the writer. Traditionally we always relied on being able to have two windows of 32 "have"s in flight (roughly 3k bytes) to stream. The recent "progressive-stride" change allows "fetch-pack" to send up to 1024 "have"s without reading any response from "upload-pack". The outgoing pipe of "upload-pack" can be clogged with many ACK and NAK that are unread, while "fetch-pack" is still stuffing its outgoing pipe with more "have"s, leading to a deadlock. Revert the change unless we are in stateless rpc (aka smart-http) mode, as using a large window full of "have"s is still a good way to help reduce the number of back-and-forth, and there is no buffering issue there (it is strictly "ping-pong" without an overlap). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-29enable "no-done" extension only when fetching over smart-httpJunio C Hamano1-1/+2
When 'no-done' protocol extension is used, the upload-pack (i.e. the server side) process stops listening to the fetch-pack after issuing the final NAK, and starts sending the generated pack data back, but there may be more "have" send by the latter in flight that the fetch-pack is expecting to be responded with ACK/NAK. This will typically result in a deadlock (both will block on write that the other end never reads) or SIGPIPE on the fetch-pack end (upload-pack will finish writing a small pack and goes away). Disable it unless fetch-pack is running under smart-http, where there is no such streaming issue. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn O. Pearce <spearce@spearce.org>
2011-03-28Revert "fetch-pack: Implement no-done capability"Junio C Hamano1-15/+3
This reverts commit 761ecf0bc7b6cddf311f00877c59e6381cdbdeea.
2011-03-26Merge branch 'jc/fetch-progressive-stride'Junio C Hamano1-3/+18
* jc/fetch-progressive-stride: fetch-pack: use smaller handshake window for initial request fetch-pack: progressively use larger handshake windows fetch-pack: factor out hardcoded handshake window size Conflicts: builtin/fetch-pack.c
2011-03-22Merge branch 'sp/maint-fetch-pack-stop-early'Junio C Hamano1-2/+16
* sp/maint-fetch-pack-stop-early: fetch-pack: Implement no-done capability fetch-pack: Finish negotation if remote replies "ACK %s ready"
2011-03-22Merge branch 'jc/maint-fetch-alt'Junio C Hamano1-0/+12
* jc/maint-fetch-alt: fetch-pack: objects in our alternates are available to us refs_from_alternate: helper to use refs from alternates Conflicts: builtin/receive-pack.c
2011-03-22Fix sparse warningsStephen Boyd1-1/+1
Fix warnings from 'make check'. - These files don't include 'builtin.h' causing sparse to complain that cmd_* isn't declared: builtin/clone.c:364, builtin/fetch-pack.c:797, builtin/fmt-merge-msg.c:34, builtin/hash-object.c:78, builtin/merge-index.c:69, builtin/merge-recursive.c:22 builtin/merge-tree.c:341, builtin/mktag.c:156, builtin/notes.c:426 builtin/notes.c:822, builtin/pack-redundant.c:596, builtin/pack-refs.c:10, builtin/patch-id.c:60, builtin/patch-id.c:149, builtin/remote.c:1512, builtin/remote-ext.c:240, builtin/remote-fd.c:53, builtin/reset.c:236, builtin/send-pack.c:384, builtin/unpack-file.c:25, builtin/var.c:75 - These files have symbols which should be marked static since they're only file scope: submodule.c:12, diff.c:631, replace_object.c:92, submodule.c:13, submodule.c:14, trace.c:78, transport.c:195, transport-helper.c:79, unpack-trees.c:19, url.c:3, url.c:18, url.c:104, url.c:117, url.c:123, url.c:129, url.c:136, thread-utils.c:21, thread-utils.c:48 - These files redeclare symbols to be different types: builtin/index-pack.c:210, parse-options.c:564, parse-options.c:571, usage.c:49, usage.c:58, usage.c:63, usage.c:72 - These files use a literal integer 0 when they really should use a NULL pointer: daemon.c:663, fast-import.c:2942, imap-send.c:1072, notes-merge.c:362 While we're in the area, clean up some unused #includes in builtin files (mostly exec_cmd.h). Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-20fetch-pack: use smaller handshake window for initial requestJunio C Hamano1-2/+4
Start the initial request small by halving the INITIAL_FLUSH (we will try to stay one window ahead of the server, so we would end up giving twice as many "have" in flight at the very beginning). We may want to tweak these values even more, taking MTU into account. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn Pearce <spearce@spearce.org>
2011-03-20fetch-pack: progressively use larger handshake windowsJunio C Hamano1-1/+6
The client has to dig the history deeper when more recent parts of its history do not have any overlap with the server it is fetching from. Make the handshake window exponentially larger as we dig deeper, with a reasonable upper cap. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn Pearce <spearce@spearce.org>
2011-03-20fetch-pack: factor out hardcoded handshake window sizeJunio C Hamano1-3/+11
The "git fetch" client presents the most recent 32 commits it has to the server and gives a chance to the server to say "ok, we heard enough", and continues reporting what it has in chunks of 32 commits, digging its history down to older commits. Move the hardcoded size of the handshake window outside the code, so that we can tweak it more easily. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn Pearce <spearce@spearce.org>
2011-03-17fetch-pack: objects in our alternates are available to usJunio C Hamano1-0/+12
Use the helper function split from the receiving end of "git push" to allow the same optimization on the receiving end of "git fetch". Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn O. Pearce <spearce@spearce.org>
2011-03-15fetch-pack: Implement no-done capabilityShawn O. Pearce1-3/+15
If enabled on the connection "multi_ack_detailed no-done" as a pair allows the remote upload-pack process to send a PACK down to the client as soon as a "ACK %s ready" message was also sent. Over git:// and ssh:// where a bi-directional stream is in place this has very little difference over the classical version that waits for the client to send a "done\n" line by itself. It does slightly reduce the latency involved to start the pack stream as there is one less round-trip from client->server required. Over smart HTTP this avoids needing to send a final RPC that has all of the prior common objects. Instead the server is able to return a pack as soon as its ready to. For many common users the smart HTTP fetch is now just 2 requests: GET .../info/refs, and a POST .../git-upload-pack to not only negotiate but also receive the pack stream. Only users who have more than 32 local unshared commits with the remote will need additional requests to negotiate a common merge base. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-14fetch-pack: Finish negotation if remote replies "ACK %s ready"Shawn O. Pearce1-0/+2
If multi_ack_detailed was selected in the protocol capabilities (both client and server are >= Git 1.6.6) the upload-pack side will send "ACK %s ready" when it knows how to safely cut the graph and produce a reasonable pack for the want list that was already sent on the connection. Upon receiving "ACK %s ready" there is no point in looking at the remaining commits inside of rev_list. Sending additional "have %s" lines to the remote will not construct a smaller pack. It is unlikely a commit older than the current cut point will have a better delta base than the cut point itself has. The original design of this code had fetch-pack empty rev_list by marking a commit and its transitive ancestors COMMON whenever the remote side said "ACK %s {continue,common}" and skipping over any already COMMON commits during get_rev(). This approach does not work when most of rev_list is actually COMMON_REF, commits that are pointed to by a reference on the remote, which exist locally, and which have not yet been sent to the remote as a "have %s" line. Most of the common references are tags in the ref/tags namespace, using points in the commit graph that are more than 1 commit apart. In git.git itself, this is currently 340 tags, 339 of which point to commits in the commit graph. fetch-pack pushes all of these into rev_list, but is unable to mark them COMMON and discard during a remote's "ACK %s {continue,common}" because it does not parse through the entire parent chain. Not parsing the entire parent chain is an optimization to avoid walking back to the roots of the repository. Assuming the client is only following the remote (and does not make its own local commits), the client needs 11 rounds to spin through the entire list of tags (32 commits per round, ceil(339/32) == 11). Unfortunately the server knows on the first "have %s" line that it can produce a good pack, and does not need to see the remaining 320 tags in the other 10 rounds. Over git:// and ssh:// this isn't as bad as it sounds, the client is only transmitting an extra 16,000 bytes that it doesn't need to send. Over smart HTTP, the client must do an additional 10 HTTP POST requests, each of which incurs round-trip latency, and must upload the entire state vector of all known common objects. On the final POST request, this is 16 KiB worth of data. Fix all of this by clearing rev_list as soon as the remote side says it can construct a pack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-08add packet tracing debug codeJeff King1-0/+2
This shows a trace of all packets coming in or out of a given program. This can help with debugging object negotiation or other protocol issues. To keep the code changes simple, we operate at the lowest level, meaning we don't necessarily understand what's in the packets. The one exception is a packet starting with "PACK", which causes us to skip that packet and turn off tracing (since the gigantic pack data will not be interesting to read, at least not in the trace format). We show both written and read packets. In the local case, this may mean you will see packets twice (written by the sender and read by the receiver). However, for cases where the other end is remote, this allows you to see the full conversation. Packet tracing can be enabled with GIT_TRACE_PACKET=<foo>, where <foo> takes the same arguments as GIT_TRACE. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-29commit: Add commit_list prefix in two function names.Thiago Farina1-2/+2
Add commit_list prefix to insert_by_date function and to sort_by_date, so it's clear that these functions refer to commit_list structure. Signed-off-by: Thiago Farina <tfransosi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-22Move 'builtin-*' into a 'builtin/' subdirectoryLinus Torvalds1-0/+976
This shrinks the top-level directory a bit, and makes it much more pleasant to use auto-completion on the thing. Instead of [torvalds@nehalem git]$ em buil<tab> Display all 180 possibilities? (y or n) [torvalds@nehalem git]$ em builtin-sh builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o [torvalds@nehalem git]$ em builtin-shor<tab> builtin-shortlog.c builtin-shortlog.o [torvalds@nehalem git]$ em builtin-shortlog.c you get [torvalds@nehalem git]$ em buil<tab> [type] builtin/ builtin.h [torvalds@nehalem git]$ em builtin [auto-completes to] [torvalds@nehalem git]$ em builtin/sh<tab> [type] shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o [torvalds@nehalem git]$ em builtin/sho [auto-completes to] [torvalds@nehalem git]$ em builtin/shor<tab> [type] shortlog.c shortlog.o [torvalds@nehalem git]$ em builtin/shortlog.c which doesn't seem all that different, but not having that annoying break in "Display all 180 possibilities?" is quite a relief. NOTE! If you do this in a clean tree (no object files etc), or using an editor that has auto-completion rules that ignores '*.o' files, you won't see that annoying 'Display all 180 possibilities?' message - it will just show the choices instead. I think bash has some cut-off around 100 choices or something. So the reason I see this is that I'm using an odd editory, and thus don't have the rules to cut down on auto-completion. But you can simulate that by using 'ls' instead, or something similar. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>