aboutsummaryrefslogtreecommitdiffstats
path: root/builtin/index-pack.c
AgeCommit message (Collapse)AuthorFilesLines
2020-04-02promisor-remote: accept 0 as oid_nr in functionJonathan Tan1-3/+2
There are 3 callers to promisor_remote_get_direct() that first check if the number of objects to be fetched is equal to 0. Fold that check into promisor_remote_get_direct(), and in doing so, be explicit as to what promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success, immediately). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14Merge branch 'jk/index-pack-dupfix'Junio C Hamano1-1/+3
The index-pack code now diagnoses a bad input packstream that records the same object twice when it is used as delta base; the code used to declare a software bug when encountering such an input, but it is an input error. * jk/index-pack-dupfix: index-pack: downgrade twice-resolved REF_DELTA to die()
2020-02-04index-pack: downgrade twice-resolved REF_DELTA to die()Jeff King1-1/+3
When we're resolving a REF_DELTA, we compare-and-swap its type from REF_DELTA to whatever real type the base object has, as discussed in ab791dd138 (index-pack: fix race condition with duplicate bases, 2014-08-29). If the old type wasn't a REF_DELTA, we consider that a BUG(). But as discussed in that commit, we might see this case whenever we try to resolve an object twice, which may happen because we have multiple copies of the base object. So this isn't a bug at all, but rather a sign that the input pack is broken. And indeed, this case is triggered already in t5309.5 and t5309.6, which create packs with delta cycles and duplicate bases. But we never noticed because those tests are marked expect_failure. Those tests were added by b2ef3d9ebb (test index-pack on packs with recoverable delta cycles, 2013-08-23), which was leaving the door open for cases that we theoretically _could_ handle. And when we see an already-resolved object like this, in theory we could keep going after confirming that the previously resolved child->real_type matches base->obj->real_type. But: - enforcing the "only resolve once" rule here saves us from an infinite loop in other parts of the code. If we keep going, then the delta cycle in t5309.5 causes us to loop infinitely, as find_ref_delta_children() doesn't realize which objects have already been resolved. So there would be more changes needed to make this case work, and in the meantime we'd be worse off. - any pack that triggers this is broken anyway. It either has a duplicate base object, or it has a cycle which causes us to bring in a duplicate via --fix-thin. In either case, we'd end up rejecting the pack in write_idx_file(), which also detects duplicates. So the tests have little value in documenting what we _could_ be doing (and have been neglected for 6+ years). Let's switch them to confirming that we handle this case cleanly (and switch out the BUG() for a more informative die() so that we do so). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31sha1-file: allow check_object_signature() to handle any repoMatheus Tavares1-2/+3
Some callers of check_object_signature() can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repository is always used internally. To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31sha1-file: pass git_hash_algo to hash_object_file()Matheus Tavares1-1/+1
Allow hash_object_file() to work on arbitrary repos by introducing a git_hash_algo parameter. Change callers which have a struct repository pointer in their scope to pass on the git_hash_algo from the said repo. For all other callers, pass on the_hash_algo, which was already being used internally at hash_object_file(). This functionality will be used in the following patch to make check_object_signature() be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency at object.c:parse_object()). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31streaming: allow open_istream() to handle any repoMatheus Tavares1-1/+2
Some callers of open_istream() at archive-tar.c and archive-zip.c are capable of working on arbitrary repositories but the repo struct is not passed down to open_istream(), which uses the_repository internally. For now, that's not a problem since the said callers are only being called with the_repository. But to be consistent and avoid future problems, let's allow open_istream() to receive a struct repository and use that instead of the_repository. This parameter addition will also be used in a future patch to make sha1-file.c:check_object_signature() be able to work on arbitrary repos. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-11Merge branch 'bc/object-id-part17'Junio C Hamano1-2/+2
Preparation for SHA-256 upgrade continues. * bc/object-id-part17: (26 commits) midx: switch to using the_hash_algo builtin/show-index: replace sha1_to_hex rerere: replace sha1_to_hex builtin/receive-pack: replace sha1_to_hex builtin/index-pack: replace sha1_to_hex packfile: replace sha1_to_hex wt-status: convert struct wt_status to object_id cache: remove null_sha1 builtin/worktree: switch null_sha1 to null_oid builtin/repack: write object IDs of the proper length pack-write: use hash_to_hex when writing checksums sequencer: convert to use the_hash_algo bisect: switch to using the_hash_algo sha1-lookup: switch hard-coded constants to the_hash_algo config: use the_hash_algo in abbrev comparison combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo bundle: switch to use the_hash_algo connected: switch GIT_SHA1_HEXSZ to the_hash_algo show-index: switch hard-coded constants to the_hash_algo blame: remove needless comparison with GIT_SHA1_HEXSZ ...
2019-08-19builtin/index-pack: replace sha1_to_hexbrian m. carlson1-2/+2
Since sha1_to_hex is limited to SHA-1, replace it with hash_to_hex so this code works with other algorithms. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25Use promisor_remote_get_direct() and has_promisor_remote()Christian Couder1-4/+4
Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-15index-pack: prefetch missing REF_DELTA basesJonathan Tan1-2/+24
When fetching, the client sends "have" commit IDs indicating that the server does not need to send any object referenced by those commits, reducing network I/O. When the client is a partial clone, the client still sends "have"s in this way, even if it does not have every object referenced by a commit it sent as "have". If a server omits such an object, it is fine: the client could lazily fetch that object before this fetch, and it can still do so after. The issue is when the server sends a thin pack containing an object that is a REF_DELTA against such a missing object: index-pack fails to fix the thin pack. When support for lazily fetching missing objects was added in 8b4c0103a9 ("sha1_file: support lazily fetching missing objects", 2017-12-08), support in index-pack was turned off in the belief that it accesses the repo only to do hash collision checks. However, this is not true: it also needs to access the repo to resolve REF_DELTA bases. Support for lazy fetching should still generally be turned off in index-pack because it is used as part of the lazy fetching process itself (if not, infinite loops may occur), but we do need to fetch the REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that those are REF_DELTA themselves, because we do not send "have" when making such fetches.) To resolve this, prefetch all missing REF_DELTA bases before attempting to resolve them. This both ensures that all bases are attempted to be fetched, and ensures that we make only one request per index-pack invocation, and not one request per missing object. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01index-pack: show progress while checking objectsSZEDER Gábor1-1/+9
When 'git index-pack' is run by 'git clone', its check_objects() function usually doesn't take long enough to be a concern, but I just run into a situation where it took about a minute or so: I inadvertently put some memory pressure on my tiny laptop while cloning linux.git, and then there was quite a long silence between the "Resolving deltas" and "Checking connectivity" progress bars. Show a progress bar during the loop of check_objects() to let the user know that something is still going on. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08convert has_sha1_file() callers to has_object_file()Jeff King1-1/+1
The only remaining callers of has_sha1_file() actually have an object_id already. They can use the "object" variant, rather than dereferencing the hash themselves. The code changes here were completely generated by the included coccinelle patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-19Merge branch 'tb/print-size-t-with-uintmax-format'Junio C Hamano1-4/+5
Code preparation to replace ulong vars with size_t vars where appropriate. * tb/print-size-t-with-uintmax-format: Upcast size_t variables to uintmax_t when printing
2018-11-12Upcast size_t variables to uintmax_t when printingTorsten Bögershausen1-4/+5
When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05index-pack: remove #ifdef NO_PTHREADSNguyễn Thái Ngọc Duy1-49/+14
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "hashcmp() != 0" to "!hasheq()"Jeff King1-2/+2
This rounds out the previous three patches, covering the inequality logic for the "hash" variant of the functions. As with the previous three, the accompanying code changes are the mechanical result of applying the coccinelle patch; see those patches for more discussion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "oidcmp() == 0" to oideq()Jeff King1-2/+2
Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-15Merge branch 'jk/core-use-replace-refs'Junio C Hamano1-1/+1
A new configuration variable core.usereplacerefs has been added, primarily to help server installations that want to ignore the replace mechanism altogether. * jk/core-use-replace-refs: add core.usereplacerefs config option check_replace_refs: rename to read_replace_refs check_replace_refs: fix outdated comment
2018-07-18check_replace_refs: rename to read_replace_refsJeff King1-1/+1
This was added as a NEEDSWORK in c3c36d7de2 (replace-object: check_replace_refs is safe in multi repo environment, 2018-04-11), waiting for a calmer period. Since doing so now doesn't conflict with anything in 'pu', it seems as good a time as any. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29blob: add repository argument to lookup_blobStefan Beller1-1/+1
Add a repository argument to allow the callers of lookup_blob to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to parse_object_bufferStefan Beller1-1/+2
Add a repository argument to allow the callers of parse_object_buffer to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-13Merge branch 'jk/index-pack-maint'Junio C Hamano1-2/+6
"index-pack --strict" has been taught to make sure that it runs the final object integrity checks after making the freshly indexed packfile available to itself. * jk/index-pack-maint: index-pack: correct install_packed_git() args index-pack: handle --strict checks of non-repo packs prepare_commit_graft: treat non-repository as a noop
2018-06-11index-pack: correct install_packed_git() argsJunio C Hamano1-1/+1
The function does not start taking the repository object as a parameter before v2.18 track. Make the topic mergeable to v2.17 maintenance track by dropping it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-01index-pack: handle --strict checks of non-repo packsJeff King1-2/+6
Commit 73c3f0f704 (index-pack: check .gitmodules files with --strict, 2018-05-04) added a call to add_packed_git(), with the intent that the newly-indexed objects would be available to the process when we run fsck_finish(). But that's not what add_packed_git() does. It only allocates the struct, and you must install_packed_git() on the result. So that call was effectively doing nothing (except leaking a struct). But wait, we passed all of the tests! Does that mean we don't need the call at all? For normal cases, no. When we run "index-pack --stdin" inside a repository, we write the new pack into the object directory. If fsck_finish() needs to access one of the new objects, then our initial lookup will fail to find it, but we'll follow up by running reprepare_packed_git() and looking again. That logic was meant to handle somebody else repacking simultaneously, but it ends up working for us here. But there is a case that does need this, that we were not testing. You can run "git index-pack foo.pack" on any file, even when it is not inside the object directory. Or you may not even be in a repository at all! This case fails without doing the proper install_packed_git() call. We can make this work by adding the install call. Note that we should be prepared to handle add_packed_git() failing. We can just silently ignore this case, though. If fsck_finish() later needs the objects and they're not available, it will complain itself. And if it doesn't (because we were able to resolve the whole fsck in the first pass), then it actually isn't an interesting error at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-30Merge branch 'bc/object-id'Junio C Hamano1-1/+2
Conversion from uchar[20] to struct object_id continues. * bc/object-id: (42 commits) merge-one-file: compute empty blob object ID add--interactive: compute the empty tree value Update shell scripts to compute empty tree object ID sha1_file: only expose empty object constants through git_hash_algo dir: use the_hash_algo for empty blob object ID sequencer: use the_hash_algo for empty tree object ID cache-tree: use is_empty_tree_oid sha1_file: convert cached object code to struct object_id builtin/reset: convert use of EMPTY_TREE_SHA1_BIN builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX wt-status: convert two uses of EMPTY_TREE_SHA1_HEX submodule: convert several uses of EMPTY_TREE_SHA1_HEX sequencer: convert one use of EMPTY_TREE_SHA1_HEX merge: convert empty tree constant to the_hash_algo builtin/merge: switch tree functions to use object_id builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo sha1-file: add functions for hex empty tree and blob OIDs builtin/receive-pack: avoid hard-coded constants for push certs diff: specify abbreviation size in terms of the_hash_algo upload-pack: replace use of several hard-coded constants ...
2018-05-30Merge branch 'js/use-bug-macro'Junio C Hamano1-2/+2
Developer support update, by using BUG() macro instead of die() to mark codepaths that should not happen more clearly. * js/use-bug-macro: BUG_exit_code: fix sparse "symbol not declared" warning Convert remaining die*(BUG) messages Replace all die("BUG: ...") calls by BUG() ones run-command: use BUG() to report bugs, not die() test-tool: help verifying BUG() code paths
2018-05-29Sync with Git 2.17.1Junio C Hamano1-1/+11
* maint: (25 commits) Git 2.17.1 Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 fsck: complain when .gitmodules is a symlink index-pack: check .gitmodules files with --strict unpack-objects: call fsck_finish() after fscking objects fsck: call fsck_finish() after fscking objects fsck: check .gitmodules content fsck: handle promisor objects in .gitmodules check fsck: detect gitmodules files fsck: actually fsck blob data fsck: simplify ".git" check index-pack: make fsck error message more specific verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant ...
2018-05-23Merge branch 'sb/oid-object-info'Junio C Hamano1-2/+2
The codepath around object-info API has been taught to take the repository object (which in turn tells the API which object store the objects are to be located). * sb/oid-object-info: cache.h: allow oid_object_info to handle arbitrary repositories packfile: add repository argument to cache_or_unpack_entry packfile: add repository argument to unpack_entry packfile: add repository argument to read_object packfile: add repository argument to packed_object_info packfile: add repository argument to packed_to_object_type packfile: add repository argument to retry_bad_packed_offset cache.h: add repository argument to oid_object_info cache.h: add repository argument to oid_object_info_extended
2018-05-21index-pack: check .gitmodules files with --strictJeff King1-0/+10
Now that the internal fsck code has all of the plumbing we need, we can start checking incoming .gitmodules files. Naively, it seems like we would just need to add a call to fsck_finish() after we've processed all of the objects. And that would be enough to cover the initial test included here. But there are two extra bits: 1. We currently don't bother calling fsck_object() at all for blobs, since it has traditionally been a noop. We'd actually catch these blobs in fsck_finish() at the end, but it's more efficient to check them when we already have the object loaded in memory. 2. The second pass done by fsck_finish() needs to access the objects, but we're actually indexing the pack in this process. In theory we could give the fsck code a special callback for accessing the in-pack data, but it's actually quite tricky: a. We don't have an internal efficient index mapping oids to packfile offsets. We only generate it on the fly as part of writing out the .idx file. b. We'd still have to reconstruct deltas, which means we'd basically have to replicate all of the reading logic in packfile.c. Instead, let's avoid running fsck_finish() until after we've written out the .idx file, and then just add it to our internal packed_git list. This does mean that the objects are "in the repository" before we finish our fsck checks. But unpack-objects already exhibits this same behavior, and it's an acceptable tradeoff here for the same reason: the quarantine mechanism means that pushes will be fully protected. In addition to a basic push test in t7415, we add a sneaky pack that reverses the usual object order in the pack, requiring that index-pack access the tree and blob during the "finish" step. This already works for unpack-objects (since it will have written out loose objects), but we'll check it with this sneaky pack for good measure. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21index-pack: make fsck error message more specificJeff King1-1/+1
If fsck reports an error, we say only "Error in object". This isn't quite as bad as it might seem, since the fsck code would have dumped some errors to stderr already. But it might help to give a little more context. The earlier output would not have even mentioned "fsck", and that may be a clue that the "fsck.*" or "*.fsckObjects" config may be relevant. Signed-off-by: Jeff King <peff@peff.net>
2018-05-08Merge branch 'ds/commit-graph'Junio C Hamano1-1/+1
Precompute and store information necessary for ancestry traversal in a separate file to optimize graph walking. * ds/commit-graph: commit-graph: implement "--append" option commit-graph: build graph from starting commits commit-graph: read only from specific pack-indexes commit: integrate commit graph with commit parsing commit-graph: close under reachability commit-graph: add core.commitGraph setting commit-graph: implement git commit-graph read commit-graph: implement git-commit-graph write commit-graph: implement write_commit_graph() commit-graph: create git-commit-graph builtin graph: add commit graph design document commit-graph: add format document csum-file: refactor finalize_hashfile() method csum-file: rename hashclose() to finalize_hashfile()
2018-05-06Replace all die("BUG: ...") calls by BUG() onesJohannes Schindelin1-2/+2
In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in 588a538ae55 (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02index-pack: abstract away hash function constantbrian m. carlson1-1/+2
The code for reading certain pack v2 offsets had a hard-coded 5 representing the number of uint32_t words that we needed to skip over. Specify this value in terms of a value from the_hash_algo. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26cache.h: add repository argument to oid_object_infoStefan Beller1-2/+2
Add a repository argument to allow the callers of oid_object_info to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11sha1_file.c: rename to use dash in file nameStefan Beller1-1/+1
This is more consistent with the project style. The majority of Git's source files use dashes in preference to underscores in their file names. Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11exec_cmd: rename to use dash in file nameStefan Beller1-1/+1
This is more consistent with the project style. The majority of Git's source files use dashes in preference to underscores in their file names. Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11Merge branch 'sb/object-store'Junio C Hamano1-0/+1
Refactoring the internal global data structure to make it possible to open multiple repositories, work with and then close them. Rerolled by Duy on top of a separate preliminary clean-up topic. The resulting structure of the topics looked very sensible. * sb/object-store: (27 commits) sha1_file: allow sha1_loose_object_info to handle arbitrary repositories sha1_file: allow map_sha1_file to handle arbitrary repositories sha1_file: allow map_sha1_file_1 to handle arbitrary repositories sha1_file: allow open_sha1_file to handle arbitrary repositories sha1_file: allow stat_sha1_file to handle arbitrary repositories sha1_file: allow sha1_file_name to handle arbitrary repositories sha1_file: add repository argument to sha1_loose_object_info sha1_file: add repository argument to map_sha1_file sha1_file: add repository argument to map_sha1_file_1 sha1_file: add repository argument to open_sha1_file sha1_file: add repository argument to stat_sha1_file sha1_file: add repository argument to sha1_file_name sha1_file: allow prepare_alt_odb to handle arbitrary repositories sha1_file: allow link_alt_odb_entries to handle arbitrary repositories sha1_file: add repository argument to prepare_alt_odb sha1_file: add repository argument to link_alt_odb_entries sha1_file: add repository argument to read_info_alternates sha1_file: add repository argument to link_alt_odb_entry sha1_file: add raw_object_store argument to alt_odb_usable pack: move approximate object count to object store ...
2018-04-10Merge branch 'bc/object-id'Junio C Hamano1-21/+22
Conversion from uchar[20] to struct object_id continues. * bc/object-id: (36 commits) convert: convert to struct object_id sha1_file: introduce a constant for max header length Convert lookup_replace_object to struct object_id sha1_file: convert read_sha1_file to struct object_id sha1_file: convert read_object_with_reference to object_id tree-walk: convert tree entry functions to object_id streaming: convert istream internals to struct object_id tree-walk: convert get_tree_entry_follow_symlinks internals to object_id builtin/notes: convert static functions to object_id builtin/fmt-merge-msg: convert remaining code to object_id sha1_file: convert sha1_object_info* to object_id Convert remaining callers of sha1_object_info_extended to object_id packfile: convert unpack_entry to struct object_id sha1_file: convert retry_bad_packed_offset to struct object_id sha1_file: convert assert_sha1_type to object_id builtin/mktree: convert to struct object_id streaming: convert open_istream to use struct object_id sha1_file: convert check_sha1_signature to struct object_id sha1_file: convert read_loose_object to use struct object_id builtin/index-pack: convert struct ref_delta_entry to object_id ...
2018-04-02csum-file: rename hashclose() to finalize_hashfile()Derrick Stolee1-1/+1
The hashclose() method behaves very differently depending on the flags parameter. In particular, the file descriptor is not always closed. Perform a simple rename of "hashclose()" to "finalize_hashfile()" in preparation for functional changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26object-store: move packed_git and packed_git_mru to object storeStefan Beller1-0/+1
In a process with multiple repositories open, packfile accessors should be associated to a single repository and not shared globally. Move packed_git and packed_git_mru into the_repository and adjust callers to reflect this. [nd: while at there, wrap access to these two fields in get_packed_git() and get_packed_git_mru(). This allows us to lazily initialize these fields without caller doing that explicitly] Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15index-pack: support checking objects but not linksJonathan Tan1-2/+4
The index-pack command currently supports the --check-self-contained-and-connected argument, for internal use only, that instructs it to only check for broken links and not broken objects. For partial clones, we need the inverse, so add a --fsck-objects argument that checks for broken objects and not broken links, also for internal use only. This will be used by fetch-pack in a subsequent patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14Merge branch 'nd/object-allocation-comments'Junio C Hamano1-0/+1
Code doc update. * nd/object-allocation-comments: object.h: realign object flag allocation comment object.h: update flag allocation comment
2018-03-14sha1_file: convert read_sha1_file to struct object_idbrian m. carlson1-2/+3
Convert read_sha1_file to take a pointer to struct object_id and rename it read_object_file. Do the same for read_sha1_file_extended. Convert one use in grep.c to use the new function without any other code change, since the pointer being passed is a void pointer that is already initialized with a pointer to struct object_id. Update the declaration and definitions of the modified functions, and apply the following semantic patch to convert the remaining callers: @@ expression E1, E2, E3; @@ - read_sha1_file(E1.hash, E2, E3) + read_object_file(&E1, E2, E3) @@ expression E1, E2, E3; @@ - read_sha1_file(E1->hash, E2, E3) + read_object_file(E1, E2, E3) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1.hash, E2, E3, E4) + read_object_file_extended(&E1, E2, E3, E4) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1->hash, E2, E3, E4) + read_object_file_extended(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14sha1_file: convert sha1_object_info* to object_idbrian m. carlson1-2/+2
Convert sha1_object_info and sha1_object_info_extended to take pointers to struct object_id and rename them to use "oid" instead of "sha1" in their names. Update the declaration and definition and apply the following semantic patch, plus the standard object_id transforms: @@ expression E1, E2; @@ - sha1_object_info(E1.hash, E2) + oid_object_info(&E1, E2) @@ expression E1, E2; @@ - sha1_object_info(E1->hash, E2) + oid_object_info(E1, E2) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1.hash, E2, E3) + oid_object_info_extended(&E1, E2, E3) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1->hash, E2, E3) + oid_object_info_extended(E1, E2, E3) Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14streaming: convert open_istream to use struct object_idbrian m. carlson1-1/+1
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14sha1_file: convert check_sha1_signature to struct object_idbrian m. carlson1-1/+1
Convert this function to take a pointer to struct object_id and rename it check_object_signature. Introduce temporaries to convert the return values of lookup_replace_object and lookup_replace_object_extended into struct object_id. The temporaries are needed because in order to convert lookup_replace_object, open_istream needs to be converted, and open_istream needs check_sha1_signature to be converted, causing a loop of dependencies. The temporaries will be removed in a future patch. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14builtin/index-pack: convert struct ref_delta_entry to object_idbrian m. carlson1-17/+17
Convert this struct to use a member of type object_id. Convert various static functions as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-06Merge branch 'bw/c-plus-plus'Junio C Hamano1-6/+6
Avoid using identifiers that clash with C++ keywords. Even though it is not a goal to compile Git with C++ compilers, changes like this help use of code analysis tools that targets C++ on our codebase. * bw/c-plus-plus: (37 commits) replace: rename 'new' variables trailer: rename 'template' variables tempfile: rename 'template' variables wrapper: rename 'template' variables environment: rename 'namespace' variables diff: rename 'template' variables environment: rename 'template' variables init-db: rename 'template' variables unpack-trees: rename 'new' variables trailer: rename 'new' variables submodule: rename 'new' variables split-index: rename 'new' variables remote: rename 'new' variables ref-filter: rename 'new' variables read-cache: rename 'new' variables line-log: rename 'new' variables imap-send: rename 'new' variables http: rename 'new' variables entry: rename 'new' variables diffcore-delta: rename 'new' variables ...
2018-03-06object.h: update flag allocation commentNguyễn Thái Ngọc Duy1-0/+1
Since the "flags" is shared, it's a good idea to keep track of who uses what bit. When we need to use more flags in library code, we can be sure it won't be re-used for another purpose by some caller. While at there, fix the location of "5" (should be in a different column than "4" two lines down) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-15Merge branch 'bc/hash-algo'Junio C Hamano1-57/+57
More abstraction of hash function from the codepath. * bc/hash-algo: hash: update obsolete reference to SHA1_HEADER bulk-checkin: abstract SHA-1 usage csum-file: abstract uses of SHA-1 csum-file: rename sha1file to hashfile read-cache: abstract away uses of SHA-1 pack-write: switch various SHA-1 values to abstract forms pack-check: convert various uses of SHA-1 to abstract forms fast-import: switch various uses of SHA-1 to the_hash_algo sha1_file: switch uses of SHA-1 to the_hash_algo builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo builtin/index-pack: improve hash function abstraction hash: create union for hash context allocation hash: move SHA-1 macros to hash.h
2018-02-15Merge branch 'po/object-id'Junio C Hamano1-3/+2
Conversion from uchar[20] to struct object_id continues. * po/object-id: sha1_file: rename hash_sha1_file_literally sha1_file: convert write_loose_object to object_id sha1_file: convert force_object_loose to object_id sha1_file: convert write_sha1_file to object_id notes: convert write_notes_tree to object_id notes: convert combine_notes_* to object_id commit: convert commit_tree* to object_id match-trees: convert splice_tree to object_id cache: clear whole hash buffer with oidclr sha1_file: convert hash_sha1_file to object_id dir: convert struct sha1_stat to use object_id sha1_file: convert pretend_sha1_file to object_id
2018-02-14object: rename function 'typename' to 'type_name'Brandon Williams1-6/+6
Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-13Merge branch 'jh/fsck-promisors'Junio C Hamano1-45/+66
In preparation for implementing narrow/partial clone, the machinery for checking object connectivity used by gc and fsck has been taught that a missing object is OK when it is referenced by a packfile specially marked as coming from trusted repository that promises to make them available on-demand and lazily. * jh/fsck-promisors: gc: do not repack promisor packfiles rev-list: support termination at promisor objects sha1_file: support lazily fetching missing objects introduce fetch-object: fetch one promisor object index-pack: refactor writing of .keep files fsck: support promisor objects as CLI argument fsck: support referenced promisor objects fsck: support refs pointing to promisor objects fsck: introduce partialclone extension extension.partialclone: introduce partial clone extension
2018-02-02csum-file: rename sha1file to hashfilebrian m. carlson1-10/+10
Rename struct sha1file to struct hashfile, along with all of its related functions. The transformation in this commit was made by global search-and-replace. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02builtin/index-pack: improve hash function abstractionbrian m. carlson1-45/+45
Convert several uses of unsigned char [20] to struct object_id and convert various hard-coded constants and uses of SHA-1 functions to use the_hash_algo. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30sha1_file: convert hash_sha1_file to object_idPatryk Obara1-3/+2
Convert the declaration and definition of hash_sha1_file to use struct object_id and adjust all function calls. Rename this function to hash_object_file. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-11index-pack: use skip_to_optional_arg()Christian Couder1-8/+3
Let's simplify index-pack option parsing using skip_to_optional_arg(). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08sha1_file: support lazily fetching missing objectsJonathan Tan1-0/+6
Teach sha1_file to fetch objects from the remote configured in extensions.partialclone whenever an object is requested but missing. The fetching of objects can be suppressed through a global variable. This is used by fsck and index-pack. However, by default, such fetching is not suppressed. This is meant as a temporary measure to ensure that all Git commands work in such a situation. Future patches will update some commands to either tolerate missing objects (without fetching them) or be more efficient in fetching them. In order to determine the code changes in sha1_file.c necessary, I investigated the following: (1) functions in sha1_file.c that take in a hash, without the user regarding how the object is stored (loose or packed) (2) functions in packfile.c (because I need to check callers that know about the loose/packed distinction and operate on both differently, and ensure that they can handle the concept of objects that are neither loose nor packed) (1) is handled by the modification to sha1_object_info_extended(). For (2), I looked at for_each_packed_object and others. For for_each_packed_object, the callers either already work or are fixed in this patch: - reachable - only to find recent objects - builtin/fsck - already knows about missing objects - builtin/cat-file - warning message added in this commit Callers of the other functions do not need to be changed: - parse_pack_index - http - indirectly from http_get_info_packs - find_pack_entry_one - this searches a single pack that is provided as an argument; the caller already knows (through other means) that the sought object is in a specific pack - find_sha1_pack - fast-import - appears to be an optimization to not store a file if it is already in a pack - http-walker - to search through a struct alt_base - http-push - to search through remote packs - has_sha1_pack - builtin/fsck - already knows about promisor objects - builtin/count-objects - informational purposes only (check if loose object is also packed) - builtin/prune-packed - check if object to be pruned is packed (if not, don't prune it) - revision - used to exclude packed objects if requested by user - diff - just for optimization Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05introduce fetch-object: fetch one promisor objectJonathan Tan1-3/+13
Introduce fetch-object, providing the ability to fetch one object from a promisor remote. This uses fetch-pack. To do this, the transport mechanism has been updated with 2 flags, "from-promisor" to indicate that the resulting pack comes from a promisor remote (and thus should be annotated as such by index-pack), and "no-dependents" to indicate that only the objects themselves need to be fetched (but fetching additional objects is nevertheless safe). Whenever "no-dependents" is used, fetch-pack will refrain from using any object flags, because it is most likely invoked as part of a dynamic object fetch by another Git command (which may itself use object flags). An alternative to this is to leave fetch-pack alone, and instead update the allocation of flags so that fetch-pack's flags never overlap with any others, but this will end up shrinking the number of flags available to nearly every other Git command (that is, every Git command that accesses objects), so the approach in this commit was used instead. This will be tested in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05index-pack: refactor writing of .keep filesJonathan Tan1-46/+53
In a subsequent commit, index-pack will be taught to write ".promisor" files which are similar to the ".keep" files it knows how to write. Refactor the writing of ".keep" files, so that the implementation of writing ".promisor" files becomes easier. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-10cleanup: fix possible overflow errors in binary searchDerrick Stolee1-2/+2
A common mistake when writing binary search is to allow possible integer overflow by using the simple average: mid = (min + max) / 2; Instead, use the overflow-safe version: mid = min + (max - min) / 2; This translation is safe since the operation occurs inside a loop conditioned on "min < max". The included changes were found using the following git grep: git grep '/ *2;' '*.c' Making this cleanup will prevent future review friction when a new binary search is contructed based on existing code. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23pack: move pack name-related functionsJonathan Tan1-0/+1
Currently, sha1_file.c and cache.h contain many functions, both related to and unrelated to packfiles. This makes both files very large and causes an unclear separation of concerns. Create a new file, packfile.c, to hold all packfile-related functions currently in sha1_file.c. It has a corresponding header packfile.h. In this commit, the pack name-related functions are moved. Subsequent commits will move the other functions. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-05Merge branch 'jt/unify-object-info'Junio C Hamano1-1/+2
Code clean-ups. * jt/unify-object-info: sha1_file: refactor has_sha1_file_with_flags sha1_file: do not access pack if unneeded sha1_file: teach sha1_object_info_extended more flags sha1_file: refactor read_object sha1_file: move delta base cache code up sha1_file: rename LOOKUP_REPLACE_OBJECT sha1_file: rename LOOKUP_UNKNOWN_OBJECT sha1_file: teach packed_object_info about typename
2017-06-26sha1_file: refactor has_sha1_file_with_flagsJonathan Tan1-1/+2
has_sha1_file_with_flags() implements many mechanisms in common with sha1_object_info_extended(). Make has_sha1_file_with_flags() a convenience function for sha1_object_info_extended() instead. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24Merge branch 'ab/free-and-null'Junio C Hamano1-4/+2
A common pattern to free a piece of memory and assign NULL to the pointer that used to point at it has been replaced with a new FREE_AND_NULL() macro. * ab/free-and-null: *.[ch] refactoring: make use of the FREE_AND_NULL() macro coccinelle: make use of the "expression" FREE_AND_NULL() rule coccinelle: add a rule to make "expression" code use FREE_AND_NULL() coccinelle: make use of the "type" FREE_AND_NULL() rule coccinelle: add a rule to make "type" code use FREE_AND_NULL() git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24Merge branch 'bw/config-h'Junio C Hamano1-0/+1
Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h
2017-06-16coccinelle: make use of the "type" FREE_AND_NULL() ruleÆvar Arnfjörð Bjarmason1-4/+2
Apply the result of the just-added coccinelle rule. This manually excludes a few occurrences, mostly things that resulted in many FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15config: don't include config.h by defaultBrandon Williams1-0/+1
Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08object: convert parse_object* to take struct object_idbrian m. carlson1-1/+2
Make parse_object, parse_object_or_die, and parse_object_buffer take a pointer to struct object_id. Remove the temporary variables inserted earlier, since they are no longer necessary. Transform all of the callers using the following semantic patch: @@ expression E1; @@ - parse_object(E1.hash) + parse_object(&E1) @@ expression E1; @@ - parse_object(E1->hash) + parse_object(E1) @@ expression E1, E2; @@ - parse_object_or_die(E1.hash, E2) + parse_object_or_die(&E1, E2) @@ expression E1, E2; @@ - parse_object_or_die(E1->hash, E2) + parse_object_or_die(E1, E2) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1.hash, E2, E3, E4, E5) + parse_object_buffer(&E1, E2, E3, E4, E5) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1->hash, E2, E3, E4, E5) + parse_object_buffer(E1, E2, E3, E4, E5) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08Convert lookup_blob to struct object_idbrian m. carlson1-1/+1
Convert lookup_blob to take a pointer to struct object_id. The commit was created with manual changes to blob.c and blob.h, plus the following semantic patch: @@ expression E1; @@ - lookup_blob(E1.hash) + lookup_blob(&E1) @@ expression E1; @@ - lookup_blob(E1->hash) + lookup_blob(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08Convert remaining callers of lookup_blob to object_idbrian m. carlson1-14/+14
All but a few callers of lookup_blob have been converted to struct object_id. Introduce a temporary, which will be removed later, into parse_object to ease the transition, and convert the remaining callers so that we can update lookup_blob to take struct object_id *. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08pack: convert struct pack_idx_entry to struct object_idbrian m. carlson1-14/+19
Convert struct pack_idx_entry to use struct object_id by changing the definition and applying the following semantic patch, plus the standard object_id transforms: @@ struct pack_idx_entry E1; @@ - E1.sha1 + E1.oid.hash @@ struct pack_idx_entry *E1; @@ - E1->sha1 + E1->oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-16Merge branch 'jk/loose-object-info-report-error'Junio C Hamano1-0/+2
Update error handling for codepath that deals with corrupt loose objects. * jk/loose-object-info-report-error: index-pack: detect local corruption in collision check sha1_loose_object_info: return error for corrupted objects
2017-04-01index-pack: detect local corruption in collision checkJeff King1-0/+2
When we notice that we have a local copy of an incoming object, we compare the two objects to make sure we haven't found a collision. Before we get to the actual object bytes, though, we compare the type and size from sha1_object_info(). If our local object is corrupted, then the type will be OBJ_BAD, which obviously will not match the incoming type, and we'll report "SHA1 COLLISION FOUND" (with capital letters and everything). This is confusing, as the problem is not a collision but rather local corruption. We should report that instead (just like we do if reading the rest of the object content fails a few lines later). Note that we _could_ just ignore the error and mark it as a non-collision. That would let you "git fetch" to replace a corrupted object. But it's not a very reliable method for repairing a repository. The earlier want/have negotiation tries to get the other side to omit objects we already have, and it would not realize that we are "missing" this corrupted object. So we're better off complaining loudly when we see corruption, and letting the user take more drastic measures to repair (like making a full clone elsewhere and copying the pack into place). Note that the test sets transfer.unpackLimit in the receiving repository so that we use index-pack (which is what does the collision check). Normally for such a small push we'd use unpack-objects, which would simply try to write the loose object, and discard the new one when we see that there's already an old one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-30replace unchecked snprintf calls with heap buffersJeff King1-4/+5
We'd prefer to avoid unchecked snprintf calls because truncation can lead to unexpected results. These are all cases where truncation shouldn't ever happen, because the input to snprintf is fixed in size. That makes them candidates for xsnprintf(), but it's simpler still to just use the heap, and then nobody has to wonder if "100" is big enough. We'll use xstrfmt() where possible, and a strbuf when we need the resulting size or to reuse the same buffer in a loop. Signed-off-by: Jeff King <peff@peff.net>
2017-03-28odb_mkstemp: write filename into strbufJeff King1-3/+3
The odb_mkstemp() function expects the caller to provide a fixed buffer to write the resulting tempfile name into. But it creates the template using snprintf without checking the return value. This means we could silently truncate the filename. In practice, it's unlikely that the truncation would end in the template-pattern that mkstemp needs to open the file. So we'd probably end up failing either way, unless the path was specially crafted. The simplest fix would be to notice the truncation and die. However, we can observe that most callers immediately xstrdup() the result anyway. So instead, let's switch to using a strbuf, which is easier for them (and isn't a big deal for the other 2 callers, who can just strbuf_release when they're done with it). Note that many of the callers used static buffers, but this was purely to avoid putting a large buffer on the stack. We never passed the static buffers out of the function, so there's no complicated memory handling we need to change. Signed-off-by: Jeff King <peff@peff.net>
2017-03-28do not check odb_mkstemp return value for errorsJeff King1-3/+4
The odb_mkstemp function does not return an error; it dies on failure instead. But many of its callers compare the resulting descriptor against -1 and die themselves. Mostly this is just pointless, but it does raise a question when looking at the callers: if they show the results of the "template" buffer after a failure, what's in it? The answer is: it doesn't matter, because it cannot happen. So let's make that clear by removing the bogus error checks. In bitmap_writer_finish(), we can drop the error-handling code entirely. In the other two cases, it's shared with the open() in another code path; we can just move the error-check next to that open() call. And while we're at it, let's flesh out the function's docstring a bit to make the error behavior clear. Signed-off-by: Jeff King <peff@peff.net>
2017-03-16index-pack: make pointer-alias fallbacks saferJeff King1-8/+12
The final() function accepts a NULL value for certain parameters, and falls back to writing into a reusable "name" buffer, and then either: 1. For "keep_name", requiring all uses to do "keep_name ? keep_name : name.buf". This is awkward, and it's easy to accidentally look at the maybe-NULL keep_name. 2. For "final_index_name" and "final_pack_name", aliasing those pointers to the "name" buffer. This is easier to use, but the aliased pointers become invalid after the buffer is reused (this isn't a bug now, but it's a potential pitfall). One way to make this safer would be to introduce an extra pointer to do the aliasing, and have its lifetime match the validity of the "name" buffer. But it's still easy to accidentally use the wrong name (i.e., to use "final_pack_name" instead of the aliased pointer). Instead, let's use three separate buffers that will remain valid through the function. That makes it safe to alias the pointers and use them consistently. The extra allocations shouldn't matter, as this function is not performance sensitive. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16replace snprintf with odb_pack_name()Jeff King1-16/+11
In several places we write the name of the pack filename into a fixed-size buffer using snprintf(), but do not check the return value. As a result, a very long object directory could cause us to quietly truncate the pack filename (potentially leading to a corrupted repository, as a newly written packfile could be missing its .pack extension). We can use odb_pack_name() to do this with a strbuf (and shorten the code, as well). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16odb_pack_keep(): stop generating keepfile nameJeff King1-3/+3
The odb_pack_keep() function generates the name of a .keep file and opens it. This has two problems: 1. It requires a fixed-size buffer to create the filename and doesn't notice when the result is truncated. 2. Of the two callers, one sometimes wants to open a filename it already has, which makes things awkward (it has to do so manually, and skips the leading-directory creation). Instead, let's have odb_pack_keep() just open the file. Generating the name isn't hard, and a future patch will switch callers over to odb_pack_name() anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-16index-pack: skip collision check when not in repositoryJeff King1-4/+6
You can run "git index-pack path/to/foo.pack" outside of a repository to generate an index file, or just to verify the contents. There's no point in doing a collision check, since we obviously do not have any objects to collide with. The current code will blindly look in .git/objects based on the result of setup_git_env(). That effectively gives us the right answer (since we won't find any objects), but it's a waste of time, and it conflicts with our desire to eventually get rid of the "fallback to .git" behavior of setup_git_env(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-16index-pack: complain when --stdin is used outside of a repoJeff King1-0/+2
The index-pack builtin is marked as RUN_SETUP_GENTLY, because it's perfectly fine to index a pack in the filesystem outside of any repository. However, --stdin mode will write the result to the object database, which does not make sense outside of a repository. Doing so creates a bogus ".git" directory with nothing in it except the newly-created pack and its index. Instead, let's flag this as an error and abort. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29use QSORT, part 2René Scharfe1-2/+1
Convert two more qsort(3) calls to QSORT to reduce code size and for better safety and consistency. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29use QSORTRené Scharfe1-5/+3
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code base, replacing calls of qsort(3) with QSORT. The resulting code is shorter and supports empty arrays with NULL pointers. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24index-pack: add --max-input-size=<size> optionJeff King1-0/+5
When receiving a pack-file, it can be useful to abort the `git index-pack`, if the pack-file is too big. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-03Merge branch 'jk/push-progress'Junio C Hamano1-1/+9
"git push" and "git clone" learned to give better progress meters to the end user who is waiting on the terminal. * jk/push-progress: receive-pack: send keepalives during quiet periods receive-pack: turn on connectivity progress receive-pack: relay connectivity errors to sideband receive-pack: turn on index-pack resolving progress index-pack: add flag for showing delta-resolution progress clone: use a real progress meter for connectivity check check_connected: add progress flag check_connected: relay errors to alternate descriptor check_everything_connected: use a struct with named options check_everything_connected: convert to argv_array rev-list: add optional progress reporting check_everything_connected: always pass --quiet to rev-list
2016-07-20receive-pack: send keepalives during quiet periodsJeff King1-0/+5
After a client has sent us the complete pack, we may spend some time processing the data and running hooks. If the client asked us to be quiet, receive-pack won't send any progress data during the index-pack or connectivity-check steps. And hooks may or may not produce their own progress output. In these cases, the network connection is totally silent from both ends. Git itself doesn't care about this (it will wait forever), but other parts of the system (e.g., firewalls, load-balancers, etc) might hang up the connection. So we'd like to send some sort of keepalive to let the network and the client side know that we're still alive and processing. We can use the same trick we did in 05e9515 (upload-pack: send keepalive packets during pack computation, 2013-09-08). Namely, we will send an empty sideband data packet every `N` seconds that we do not relay any stderr data over the sideband channel. As with 05e9515, this means that we won't bother sending keepalives when there's actual progress data, but will kick in when it has been disabled (or if there is a lull in the progress data). The concept is simple, but the details are subtle enough that they need discussing here. Before the client sends us the pack, we don't want to do any keepalives. We'll have sent our ref advertisement, and we're waiting for them to send us the pack (and tell us that they support sidebands at all). While we're receiving the pack from the client (or waiting for it to start), there's no need for keepalives; it's up to them to keep the connection active by sending data. Moreover, it would be wrong for us to do so. When we are the server in the smart-http protocol, we must treat our connection as half-duplex. So any keepalives we send while receiving the pack would potentially be buffered by the webserver. Not only does this make them useless (since they would not be delivered in a timely manner), but it could actually cause a deadlock if we fill up the buffer with keepalives. (It wouldn't be wrong to send keepalives in this phase for a full-duplex connection like ssh; it's simply pointless, as it is the client's responsibility to speak). As soon as we've gotten all of the pack data, then the client is waiting for us to speak, and we should start keepalives immediately. From here until the end of the connection, we send one any time we are not otherwise sending data. But there's a catch. Receive-pack doesn't know the moment we've gotten all the data. It passes the descriptor to index-pack, who reads all of the data, and then starts resolving the deltas. We have to communicate that back. To make this work, we instruct the sideband muxer to enable keepalives in three phases: 1. In the beginning, not at all. 2. While reading from index-pack, wait for a signal indicating end-of-input, and then start them. 3. Afterwards, always. The signal from index-pack in phase 2 has to come over the stderr channel which the muxer is reading. We can't use an extra pipe because the portable run-command interface only gives us stderr and stdout. Stdout is already used to pass the .keep filename back to receive-pack. We could also send a signal there, but then we would find out about it in the main thread. And the keepalive needs to be done by the async muxer thread (since it's the one writing sideband data back to the client). And we can't reliably signal the async thread from the main thread, because the async code sometimes uses threads and sometimes uses forked processes. Therefore the signal must come over the stderr channel, where it may be interspersed with other random human-readable messages from index-pack. This patch makes the signal a single NUL byte. This is easy to parse, should not appear in any normal stderr output, and we don't have to worry about any timing issues (like seeing half the signal bytes in one read(), and half in a subsequent one). This is a bit ugly, but it's simple to code and should work reliably. Another option would be to stop using an async thread for muxing entirely, and just poll() both stderr and stdout of index-pack from the main thread. This would work for index-pack (because we aren't doing anything useful in the main thread while it runs anyway). But it would make the connectivity check and the hook muxers much more complicated, as they need to simultaneously feed the sub-programs while reading their stderr. The index-pack phase is the only one that needs this signaling, so it could simply behave differently than the other two. That would mean having two separate implementations of copy_to_sideband (and the keepalive code), though. And it still doesn't get rid of the signaling; it just means we can write a nicer message like "END_OF_INPUT" or something on stdout, since we don't have to worry about separating it from the stderr cruft. One final note: this signaling trick is only done with index-pack, not with unpack-objects. There's no point in doing it for the latter, because by definition it only kicks in for a small number of objects, where keepalives are not as useful (and this conveniently lets us avoid duplicating the implementation). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20index-pack: add flag for showing delta-resolution progressJeff King1-1/+4
The index-pack command has two progress meters: one for "receiving objects", and one for "resolving deltas". You get neither by default, or both with "-v". But for a push through receive-pack, we would want only the "resolving deltas" phase, _not_ the "receiving objects" progress. There are two reasons for this. One is simply that existing clients are already printing "writing objects" progress at the same time. Arguably "receiving" from the far end is more useful, because it tells you what has actually gotten there, as opposed to what might be stuck in a buffer somewhere between the client and server. But that would require a protocol extension to tell clients not to print their progress. Possible, but complexity for little gain. The second reason is much more important. In a full-duplex connection like git-over-ssh, we can print progress while the pack is incoming, and it will immediately get to the client. But for a half-duplex connection like git-over-http, we should not say anything until we have received the full request. Anything we write is subject to being stuck in a buffer by the webserver. Worse, we can end up in a deadlock if that buffer fills up. So our best bet is to avoid writing anything that isn't a small fixed size until we've received the full pack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13index-pack: correct "offset" type in unpack_entry_data()Nguyễn Thái Ngọc Duy1-1/+1
unpack_entry_data() receives an off_t value from unpack_raw_entry(), which could be larger than unsigned long on 32-bit systems with large file support. Correct the type so truncation does not happen. This only affects bad object reporting though. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13index-pack: report correct bad object offsets even if they are largeNguyễn Thái Ngọc Duy1-3/+4
Use the right type for offsets in this case, off_t, which makes a difference on 32-bit systems with large file support, and change formatting code accordingly. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13index-pack: correct "len" type in unpack_data()Nguyễn Thái Ngọc Duy1-7/+7
On 32-bit systems with large file support, one entry could be larger than 4GB and overflow "len". Correct it so we can unpack a full entry. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-26Merge branch 'va/i18n-misc-updates' into maintJunio C Hamano1-1/+3
Mark several messages for translation. * va/i18n-misc-updates: i18n: unpack-trees: avoid substituting only a verb in sentences i18n: builtin/pull.c: split strings marked for translation i18n: builtin/pull.c: mark placeholders for translation i18n: git-parse-remote.sh: mark strings for translation i18n: branch: move comment for translators i18n: branch: unmark string for translation i18n: builtin/rm.c: remove a comma ',' from string i18n: unpack-trees: mark strings for translation i18n: builtin/branch.c: mark option for translation i18n: index-pack: use plural string instead of normal one
2016-05-17Merge branch 'va/i18n-misc-updates'Junio C Hamano1-1/+3
Mark several messages for translation. * va/i18n-misc-updates: i18n: unpack-trees: avoid substituting only a verb in sentences i18n: builtin/pull.c: split strings marked for translation i18n: builtin/pull.c: mark placeholders for translation i18n: git-parse-remote.sh: mark strings for translation i18n: branch: move comment for translators i18n: branch: unmark string for translation i18n: builtin/rm.c: remove a comma ',' from string i18n: unpack-trees: mark strings for translation i18n: builtin/branch.c: mark option for translation i18n: index-pack: use plural string instead of normal one
2016-04-14Merge branch 'jc/index-pack' into maintJunio C Hamano1-18/+17
Code clean-up. * jc/index-pack: index-pack: add a helper function to derive .idx/.keep filename index-pack: correct --keep[=<msg>]
2016-04-08i18n: index-pack: use plural string instead of normal oneVasco Almeida1-1/+3
Git could output "completed with 1 local objects", but in this case using "object" instead of "objects" is the correct form. Use Q_() instead of _(). Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-03Merge branch 'jc/index-pack'Junio C Hamano1-18/+17
Code clean-up. * jc/index-pack: index-pack: add a helper function to derive .idx/.keep filename
2016-04-03Merge branch 'jc/maint-index-pack-keep'Junio C Hamano1-1/+1
"git index-pack --keep[=<msg>] pack-$name.pack" simply did not work. * jc/maint-index-pack-keep: index-pack: correct --keep[=<msg>]
2016-03-04Merge branch 'jk/pack-idx-corruption-safety'Junio C Hamano1-0/+1
The code to read the pack data using the offsets stored in the pack idx file has been made more carefully check the validity of the data in the idx. * jk/pack-idx-corruption-safety: sha1_file.c: mark strings for translation use_pack: handle signed off_t overflow nth_packed_object_offset: bounds-check extended offset t5313: test bounds-checks of corrupted/malicious pack/idx files
2016-03-03index-pack: add a helper function to derive .idx/.keep filenameJunio C Hamano1-18/+17
These are automatically named by replacing .pack suffix in the name of the packfile. Add a small helper to do so, as I'll be adding another one soonish. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-03Merge branch 'jc/maint-index-pack-keep' into jc/index-packJunio C Hamano1-1/+1
* jc/maint-index-pack-keep: index-pack: correct --keep[=<msg>]
2016-03-03index-pack: correct --keep[=<msg>]Junio C Hamano1-1/+1
When 592ce208 (index-pack: use strip_suffix to avoid magic numbers, 2014-06-30) refactored the code to derive names of .idx and .keep files from the name of .pack file, a copy-and-paste typo crept in, mistakingly attempting to create and store the keep message file in the .idx file we just created, instead of .keep file. As we create the .keep file with O_CREAT|O_EXCL, and we do so after we write the .idx file, we luckily do not clobber the .idx file, but because we deliberately ignored EEXIST when creating .keep file (which is justifiable because only the existence of .keep file matters), nobody noticed this mistake so far. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-25nth_packed_object_offset: bounds-check extended offsetJeff King1-0/+1
If a pack .idx file has a corrupted offset for an object, we may try to access an offset in the .idx or .pack file that is larger than the file's size. For the .pack case, we have use_pack() to protect us, which realizes the access is out of bounds. But if the corrupted value asks us to look in the .idx file's secondary 64-bit offset table, we blindly add it to the mmap'd index data and access arbitrary memory. We can fix this with a simple bounds-check compared to the size we found when we opened the .idx file. Note that there's similar code in index-pack that is triggered only during "index-pack --verify". To support both, we pull the bounds-check into a separate function, which dies when it sees a corrupted file. It would be nice if we could return an error, so that the pack code could try to find a good copy of the object elsewhere. Currently nth_packed_object_offset doesn't have any way to return an error, but it could probably use "0" as a sentinel value (since no object can start there). This is the minimal fix, and we can improve the resilience later on top. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22use st_add and st_mult for allocation size computationJeff King1-2/+2
If our size computation overflows size_t, we may allocate a much smaller buffer than we expected and overflow it. It's probably impossible to trigger an overflow in most of these sites in practice, but it is easy enough convert their additions and multiplications into overflow-checking variants. This may be fixing real bugs, and it makes auditing the code easier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22convert trivial cases to ALLOC_ARRAYJeff King1-2/+2
Each of these cases can be converted to use ALLOC_ARRAY or REALLOC_ARRAY, which has two advantages: 1. It automatically checks the array-size multiplication for overflow. 2. It always uses sizeof(*array) for the element-size, so that it can never go out of sync with the declared type of the array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-11-20Remove get_object_hash.brian m. carlson1-1/+1
Convert all instances of get_object_hash to use an appropriate reference to the hash member of the oid member of struct object. This provides no functional change, as it is essentially a macro substitution. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-11-20Convert struct object to object_idbrian m. carlson1-4/+4
struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-11-20Add several uses of get_object_hash.brian m. carlson1-1/+1
Convert most instances where the sha1 member of struct object is dereferenced to use get_object_hash. Most instances that are passed to functions that have versions taking struct object_id, such as get_sha1_hex/get_oid_hex, or instances that can be trivially converted to use struct object_id instead, are not converted. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-09-25use xsnprintf for generating git object headersJeff King1-1/+1
We generally use 32-byte buffers to format git's "type size" header fields. These should not generally overflow unless you can produce some truly gigantic objects (and our types come from our internal array of constant strings). But it is a good idea to use xsnprintf to make sure this is the case. Note that we slightly modify the interface to write_sha1_file_prepare, which nows uses "hdrlen" as an "in" parameter as well as an "out" (on the way in it stores the allocated size of the header, and on the way out it returns the ultimate size of the header). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-19Merge branch 'jc/finalize-temp-file'Junio C Hamano1-2/+2
Long overdue micro clean-up. * jc/finalize-temp-file: sha1_file.c: rename move_temp_to_file() to finalize_object_file()
2015-08-10sha1_file.c: rename move_temp_to_file() to finalize_object_file()Junio C Hamano1-2/+2
Since 5a688fe4 ("core.sharedrepository = 0mode" should set, not loosen, 2009-03-25), we kept reminding ourselves: NEEDSWORK: this should be renamed to finalize_temp_file() as "moving" is only a part of what it does, when no patch between master to pu changes the call sites of this function. without doing anything about it. Let's do so. The purpose of this function was not to move but to finalize. The detail of the primarily implementation of finalizing was to link the temporary file to its final name and then to unlink, which wasn't even "moving". The alternative implementation did "move" by calling rename(2), which is a fun tangent. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-03Merge branch 'js/fsck-opt'Junio C Hamano1-4/+9
Allow ignoring fsck errors on specific set of known-to-be-bad objects, and also tweaking warning level of various kinds of non critical breakages reported. * js/fsck-opt: fsck: support ignoring objects in `git fsck` via fsck.skiplist fsck: git receive-pack: support excluding objects from fsck'ing fsck: introduce `git fsck --connectivity-only` fsck: support demoting errors to warnings fsck: document the new receive.fsck.<msg-id> options fsck: allow upgrading fsck warnings to errors fsck: optionally ignore specific fsck issues completely fsck: disallow demoting grave fsck errors to warnings fsck: add a simple test for receive.fsck.<msg-id> fsck: make fsck_tag() warn-friendly fsck: handle multiple authors in commits specially fsck: make fsck_commit() warn-friendly fsck: make fsck_ident() warn-friendly fsck: report the ID of the error/warning fsck (receive-pack): allow demoting errors to warnings fsck: offer a function to demote fsck errors to warnings fsck: provide a function to parse fsck message IDs fsck: introduce identifiers for fsck messages fsck: introduce fsck options
2015-07-27Merge branch 'jk/index-pack-reduce-recheck' into maintJunio C Hamano1-1/+1
Disable "have we lost a race with competing repack?" check while receiving a huge object transfer that runs index-pack. * jk/index-pack-reduce-recheck: index-pack: avoid excessive re-reading of pack directory
2015-07-09Merge branch 'jc/fix-alloc-sortbuf-in-index-pack'Junio C Hamano1-8/+8
A hotfix for what is in 2.5-rc but not in 2.4. * jc/fix-alloc-sortbuf-in-index-pack: index-pack: fix allocation of sorted_by_pos array
2015-07-04index-pack: fix allocation of sorted_by_pos arrayJunio C Hamano1-8/+8
When c6458e60 (index-pack: kill union delta_base to save memory, 2015-04-18) attempted to reduce the memory footprint of index-pack, one of the key thing it did was to keep track of ref-deltas and ofs-deltas separately. In fix_unresolved_deltas(), however it forgot that it now wants to look only at ref deltas in one place. The code allocated an array for nr_unresolved, which is sum of number of ref- and ofs-deltas minus nr_resolved, which may be larger or smaller than the number ref-deltas. Depending on nr_resolved, this was either under or over allocating. Also, the old code before this change had to use 'i' and 'n' because some of the things we see in the (old) deltas[] array we scanned with 'i' would not make it into the sorted_by_pos[] array in the old world order, but now because you have only ref delta in a separate ref_deltas[] array, they increment lock&step. We no longer need separate variables. And most importantly, we shouldn't pass the nr_unresolved parameter, as this number does not play a role in the working of this helper function. Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-24Merge branch 'jk/index-pack-reduce-recheck'Junio C Hamano1-1/+1
Disable "have we lost a race with competing repack?" check while receiving a huge object transfer that runs index-pack. * jk/index-pack-reduce-recheck: index-pack: avoid excessive re-reading of pack directory
2015-06-23fsck (receive-pack): allow demoting errors to warningsJohannes Schindelin1-0/+4
For example, missing emails in commit and tag objects can be demoted to mere warnings with git config receive.fsck.missingemail=warn The value is actually a comma-separated list. In case that the same key is listed in multiple receive.fsck.<msg-id> lines in the config, the latter configuration wins (this can happen for example when both $HOME/.gitconfig and .git/config contain message type settings). As git receive-pack does not actually perform the checks, it hands off the setting to index-pack or unpack-objects in the form of an optional argument to the --strict option. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22fsck: introduce fsck optionsJohannes Schindelin1-4/+5
Just like the diff machinery, we are about to introduce more settings, therefore it makes sense to carry them around as a (pointer to a) struct containing all of them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-16Merge branch 'nd/slim-index-pack-memory-usage'Junio C Hamano1-2/+6
An earlier optimization broke index-pack for a large object transfer; this fixes it before the breakage hits any released version. * nd/slim-index-pack-memory-usage: index-pack: fix truncation of off_t in comparison
2015-06-09index-pack: avoid excessive re-reading of pack directoryJeff King1-1/+1
Since 45e8a74 (has_sha1_file: re-check pack directory before giving up, 2013-08-30), we spend extra effort for has_sha1_file to give the right answer when somebody else is repacking. Usually this effort does not matter, because after finding that the object does not exist, the next step is usually to die(). However, some code paths make a large number of has_sha1_file checks which are _not_ expected to return 1. The collision test in index-pack.c is such a case. On a local system, this can cause a performance slowdown of around 5%. But on a system with high-latency system calls (like NFS), it can be much worse. This patch introduces a "quick" flag to has_sha1_file which callers can use when they would prefer high performance at the cost of false negatives during repacks. There may be other code paths that can use this, but the index-pack one is the most obviously critical, so we'll start with switching that one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-04index-pack: fix truncation of off_t in comparisonJeff King1-2/+6
Commit c6458e6 (index-pack: kill union delta_base to save memory, 2015-04-18) refactored the comparison functions used in sorting and binary searching our delta list. The resulting code does something like: int cmp_offsets(off_t a, off_t b) { return a - b; } This works most of the time, but produces nonsensical results when the difference between the two offsets is larger than what can be stored in an "int". This can lead to unresolved deltas if the packsize is larger than 2G (even on 64-bit systems, an int is still typically 32 bits): $ git clone git://github.com/mozilla/gecko-dev Cloning into 'gecko-dev'... remote: Counting objects: 4800161, done. remote: Compressing objects: 100% (178/178), done. remote: Total 4800161 (delta 88), reused 0 (delta 0), pack-reused 4799978 Receiving objects: 100% (4800161/4800161), 2.21 GiB | 3.26 MiB/s, done. Resolving deltas: 99% (3808820/3811944), completed with 0 local objects. fatal: pack has 3124 unresolved deltas fatal: index-pack failed We can fix it by doing direct comparisons between the offsets and returning constants; the callers only care about the sign of the comparison, not the magnitude. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-11Merge branch 'nd/slim-index-pack-memory-usage'Junio C Hamano1-111/+179
Memory usage of "git index-pack" has been trimmed by tens of per-cent. * nd/slim-index-pack-memory-usage: index-pack: kill union delta_base to save memory index-pack: reduce object_entry size to save memory
2015-04-18index-pack: kill union delta_base to save memoryNguyễn Thái Ngọc Duy1-100/+160
Once we know the number of objects in the input pack, we allocate an array of nr_objects of struct delta_entry. On x86-64, this struct is 32 bytes long. The union delta_base, which is part of struct delta_entry, provides enough space to store either ofs-delta (8 bytes) or ref-delta (20 bytes). Because ofs-delta encoding is more efficient space-wise and more performant at runtime than ref-delta encoding, Git packers try to use ofs-delta whenever possible, and it is expected that objects encoded as ref-delta are minority. In the best clone case where no ref-delta object is present, we waste (20-8) * nr_objects bytes because of this union. That's about 38MB out of 100MB for deltas[] with 3.4M objects, or 38%. deltas[] would be around 62MB without the waste. This patch attempts to eliminate that. deltas[] array is split into two: one for ofs-delta and one for ref-delta. Many functions are also duplicated because of this split. With this patch, ofs_deltas[] array takes 51MB. ref_deltas[] should remain unallocated in clone case (0 bytes). This array grows as we see ref-delta. We save about half in this case, or 25% of total bookkeeping. The saving is more than the calculation above because some padding in the old delta_entry struct is removed. ofs_delta_entry is 16 bytes, including the 4 bytes padding. That's 13MB for padding, but packing the struct could break platforms that do not support unaligned access. If someone on 32-bit is really low on memory and only deals with packs smaller than 2G, using 32-bit off_t would eliminate the padding and save 27MB on top. A note about ofs_deltas allocation. We could use ref_deltas memory allocation strategy for ofs_deltas. But that probably just adds more overhead on top. ofs-deltas are generally the majority (1/2 to 2/3) in any pack. Incremental realloc may lead to too many memcpy. And if we preallocate, say 1/2 or 2/3 of nr_objects initially, the growth rate of ALLOC_GROW() could make this array larger than nr_objects, wasting more memory. Brought-up-by: Matthew Sporleder <msporleder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-17Merge branch 'rs/deflate-init-cleanup'Junio C Hamano1-1/+0
Code simplification. * rs/deflate-init-cleanup: zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-05zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}René Scharfe1-1/+0
Clear the git_zstream variable at the start of git_deflate_init() etc. so that callers don't have to do that. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-27index-pack: reduce object_entry size to save memoryNguyễn Thái Ngọc Duy1-11/+19
For each object in the input pack, we need one struct object_entry. On x86-64, this struct is 64 bytes long. Although: - The 8 bytes for delta_depth and base_object_no are only useful when show_stat is set. And it's never set unless someone is debugging. - The three fields hdr_size, type and real_type take 4 bytes each even though they never use more than 4 bits. By moving delta_depth and base_object_no out of struct object_entry and make the other 3 fields one byte long instead of 4, we shrink 25% of this struct. On a 3.4M object repo (*) that's about 53MB. The saving is less impressive compared to index-pack memory use for basic bookkeeping (**), about 16%. (*) linux-2.6.git already has 4M objects as of v3.19-rc7 so this is not an unrealistic number of objects that we have to deal with. (**) 3.4M * (sizeof(object_entry) + sizeof(delta_entry)) = 311MB Brought-up-by: Matthew Sporleder <msporleder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-22Merge branch 'js/fsck-tag-validation'Junio C Hamano1-2/+2
New tag object format validation added in 2.2 showed garbage after a tagname it reported in its error message. * js/fsck-tag-validation: index-pack: terminate object buffers with NUL fsck: properly bound "invalid tag name" error message
2014-12-09index-pack: terminate object buffers with NULDuy Nguyen1-2/+2
We have some tricky checks in fsck that rely on a side effect of require_end_of_header(), and would otherwise easily run outside non-NUL-terminated buffers. This is a bit brittle, so let's make sure that only NUL-terminated buffers are passed around to begin with. Jeff "Peff" King contributed the detailed analysis which call paths are involved and pointed out that we also have to patch the get_data() function in unpack-objects.c, which is what Johannes "Dscho" Schindelin implemented. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Analyzed-by: Jeff King <peff@peff.net> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-13index-pack: fix compilation with NO_PTHREADSEtienne Buira1-0/+3
type_cas_lock/unlock() should be defined as no-op for NO_PTHREADS build, just like all the other locking primitives. Signed-off-by: Etienne Buira <etienne.buira@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-26Merge branch 'rs/realloc-array'Junio C Hamano1-3/+1
Code cleanup. * rs/realloc-array: use REALLOC_ARRAY for changing the allocation size of arrays add macro REALLOC_ARRAY
2014-09-26Merge branch 'js/fsck-tag-validation'Junio C Hamano1-1/+2
Teach "git fsck" to inspect the contents of annotated tag objects. * js/fsck-tag-validation: Make sure that index-pack --strict checks tag objects Add regression tests for stricter tag fsck'ing fsck: check tag objects' headers Make sure fsck_commit_buffer() does not run out of the buffer fsck_object(): allow passing object data separately from the object itself Refactor type_from_string() to allow continuing after detecting an error
2014-09-19Merge branch 'jk/index-pack-threading-races'Junio C Hamano1-2/+31
When receiving an invalid pack stream that records the same object twice, multiple threads got confused due to a race. We should reject or correct such a stream upon receiving, but that will be a larger change. * jk/index-pack-threading-races: index-pack: fix race condition with duplicate bases
2014-09-18use REALLOC_ARRAY for changing the allocation size of arraysRené Scharfe1-3/+1
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10fsck_object(): allow passing object data separately from the object itselfJohannes Schindelin1-1/+2
When fsck'ing an incoming pack, we need to fsck objects that cannot be read via read_sha1_file() because they are not local yet (and might even be rejected if transfer.fsckobjects is set to 'true'). For commits, there is a hack in place: we basically cache commit objects' buffers anyway, but the same is not true, say, for tag objects. By refactoring fsck_object() to take the object buffer and size as optional arguments -- optional, because we still fall back to the previous method to look at the cached commit objects if the caller passes NULL -- we prepare the machinery for the upcoming handling of tag objects. The assumption that such buffers are inherently NUL terminated is now wrong, of course, hence we pass the size of the buffer so that we can add a sanity check later, to prevent running past the end of the buffer. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29index-pack: fix race condition with duplicate basesJeff King1-2/+31
When we are resolving deltas in an indexed pack, we do it by first selecting a potential base (either one stored in full in the pack, or one created by resolving another delta), and then resolving any deltas that use that base. When we resolve a particular delta, we flip its "real_type" field from OBJ_{REF,OFS}_DELTA to whatever the real type is. We assume that traversing the objects this way will visit each delta only once. This is correct for most packs; we visit the delta only when we process its base, and each object (and thus each base) appears only once. However, if a base object appears multiple times in the pack, we will try to resolve any deltas based on it once for each instance. We can detect this case by noting that a delta we are about to resolve has already had its real_type field flipped, and we already do so with an assert(). However, if multiple threads are in use, we may race with another thread on comparing and flipping the field. We need to synchronize the access. The right mechanism for doing this is a compare-and-swap (we atomically "claim" the delta for our own and find out whether our claim was successful). We can implement this in C by using a pthread mutex to protect the operation. This is not the fastest way of doing a compare-and-swap; many processors provide instructions for this, and gcc and other compilers provide builtins to access them. However, some experiments showed that lock contention does not cause a significant slowdown here. Adding c-a-s support for many compilers would increase the maintenance burden (and we would still end up including the pthread version as a fallback). Note that we only need to touch the OBJ_REF_DELTA codepath here. An OBJ_OFS_DELTA object points to its base using an offset, and therefore has only one base, even if another copy of that base object appears in the pack (we do still touch it briefly because the setting of real_type is factored out of resolve_data). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-21Merge branch 'maint'Junio C Hamano1-2/+1
* maint: use xmemdupz() to allocate copies of strings given by start and length use xcalloc() to allocate zero-initialized memory
2014-07-21use xcalloc() to allocate zero-initialized memoryRené Scharfe1-2/+1
Use xcalloc() instead of xmalloc() followed by memset() to allocate and zero out memory because it's shorter and avoids duplicating the function parameters. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-16Merge branch 'jk/strip-suffix'Junio C Hamano1-15/+14
* jk/strip-suffix: prepare_packed_git_one: refactor duplicate-pack check verify-pack: use strbuf_strip_suffix strbuf: implement strbuf_strip_suffix index-pack: use strip_suffix to avoid magic numbers use strip_suffix instead of ends_with in simple cases replace has_extension with ends_with implement ends_with via strip_suffix add strip_suffix function sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
2014-07-16Merge branch 'jk/commit-buffer-length' into maintJunio C Hamano1-1/+2
A handful of code paths had to read the commit object more than once when showing header fields that are usually not parsed. The internal data structure to keep track of the contents of the commit object has been updated to reduce the need for this double-reading, and to allow the caller find the length of the object. * jk/commit-buffer-length: reuse cached commit buffer when parsing signatures commit: record buffer length in cache commit: convert commit->buffer to a slab commit-slab: provide a static initializer use get_commit_buffer everywhere convert logmsg_reencode to get_commit_buffer use get_commit_buffer to avoid duplicate code use get_cached_commit_buffer where appropriate provide helpers to access the commit buffer provide a helper to set the commit buffer provide a helper to free commit buffer sequencer: use logmsg_reencode in get_message logmsg_reencode: return const buffer do not create "struct commit" with xcalloc commit: push commit_index update into alloc_commit_node alloc: include any-object allocations in alloc_report replace dangerous uses of strbuf_attach commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-02Merge branch 'jk/commit-buffer-length'Junio C Hamano1-1/+2
Move "commit->buffer" out of the in-core commit object and keep track of their lengths. Use this to optimize the code paths to validate GPG signatures in commit objects. * jk/commit-buffer-length: reuse cached commit buffer when parsing signatures commit: record buffer length in cache commit: convert commit->buffer to a slab commit-slab: provide a static initializer use get_commit_buffer everywhere convert logmsg_reencode to get_commit_buffer use get_commit_buffer to avoid duplicate code use get_cached_commit_buffer where appropriate provide helpers to access the commit buffer provide a helper to set the commit buffer provide a helper to free commit buffer sequencer: use logmsg_reencode in get_message logmsg_reencode: return const buffer do not create "struct commit" with xcalloc commit: push commit_index update into alloc_commit_node alloc: include any-object allocations in alloc_report replace dangerous uses of strbuf_attach commit_tree: take a pointer/len pair rather than a const strbuf
2014-06-30index-pack: use strip_suffix to avoid magic numbersJeff King1-15/+14
We also switch to using strbufs, which lets us avoid the potentially dangerous combination of a manual malloc followed by a strcpy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30replace has_extension with ends_withJeff King1-2/+2
These two are almost the same function, with the exception that has_extension only matches if there is content before the suffix. So ends_with(".exe", ".exe") is true, but has_extension would not be. This distinction does not matter to any of the callers, though, and we can just replace uses of has_extension with ends_with. We prefer the "ends_with" name because it is more generic, and there is nothing about the function that requires it to be used for file extensions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-25Merge branch 'ym/fix-opportunistic-index-update-race' into maintJunio C Hamano1-1/+1
"git status", even though it is a read-only operation, tries to update the index with refreshed lstat(2) info to optimize future accesses to the working tree opportunistically, but this could race with a "read-write" operation that modify the index while it is running. Detect such a race and avoid overwriting the index. * ym/fix-opportunistic-index-update-race: read-cache.c: verify index file before we opportunistically update it wrapper.c: add xpread() similar to xread()
2014-06-25Merge branch 'jk/index-pack-report-missing' into maintJunio C Hamano1-2/+7
The error reporting from "git index-pack" has been improved to distinguish missing objects from type errors. * jk/index-pack-report-missing: index-pack: distinguish missing objects from type errors
2014-06-25Merge branch 'nd/index-pack-one-fd-per-thread' into maintJunio C Hamano1-10/+17
We used to disable threaded "git index-pack" on platforms without thread-safe pread(); use a different workaround for such platforms to allow threaded "git index-pack". * nd/index-pack-one-fd-per-thread: index-pack: work around thread-unsafe pread()
2014-06-13commit: record buffer length in cacheJeff King1-1/+1
Most callsites which use the commit buffer try to use the cached version attached to the commit, rather than re-reading from disk. Unfortunately, that interface provides only a pointer to the NUL-terminated buffer, with no indication of the original length. For the most part, this doesn't matter. People do not put NULs in their commit messages, and the log code is happy to treat it all as a NUL-terminated string. However, some code paths do care. For example, when checking signatures, we want to be very careful that we verify all the bytes to avoid malicious trickery. This patch just adds an optional "size" out-pointer to get_commit_buffer and friends. The existing callers all pass NULL (there did not seem to be any obvious sites where we could avoid an immediate strlen() call, though perhaps with some further refactoring we could). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13provide a helper to free commit bufferJeff King1-1/+2
This converts two lines into one at each caller. But more importantly, it abstracts the concept of freeing the buffer, which will make it easier to change later. Note that we also need to provide a "detach" mechanism for a tricky case in index-pack. We are passed a buffer for the object generated by processing the incoming pack. If we are not using --strict, we just calculate the sha1 on that buffer and return, leaving the caller to free it. But if we are using --strict, we actually attach that buffer to an object, pass the object to the fsck functions, and then detach the buffer from the object again (so that the caller can free it as usual). In this case, we don't want to free the buffer ourselves, but just make sure it is no longer associated with the commit. Note that we are making the assumption here that the attach/detach process does not impact the buffer at all (e.g., it is never reallocated or modified). That holds true now, and we have no plans to change that. However, as we abstract the commit_buffer code, this dependency becomes less obvious. So when we detach, let's also make sure that we get back the same buffer that we gave to the commit_buffer code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-06Merge branch 'jk/index-pack-report-missing'Junio C Hamano1-2/+7
* jk/index-pack-report-missing: index-pack: distinguish missing objects from type errors
2014-06-03Merge branch 'nd/index-pack-one-fd-per-thread'Junio C Hamano1-10/+17
Enable threaded index-pack on platforms without thread-unsafe pread() emulation. * nd/index-pack-one-fd-per-thread: index-pack: work around thread-unsafe pread()
2014-06-03Merge branch 'ym/fix-opportunistic-index-update-race'Junio C Hamano1-1/+1
Read-only operations such as "git status" that internally refreshes the index write out the refreshed index to the disk to optimize future accesses to the working tree, but this could race with a "read-write" operation that modify the index while it is running. Detect such a race and avoid overwriting the index. Duy raised a good point that we may need to do the same for the normal writeout codepath, not just the "opportunistic" update codepath. While that is true, nobody sane would be running two simultaneous operations that are clearly write-oriented competing with each other against the same index file. So in that sense that can be done as a less urgent follow-up for this topic. * ym/fix-opportunistic-index-update-race: read-cache.c: verify index file before we opportunistically update it wrapper.c: add xpread() similar to xread()
2014-05-12index-pack: distinguish missing objects from type errorsJeff King1-2/+7
When we fetch a pack that does not contain an object we expected to receive, we get an error like: $ git init --bare tmp.git && cd tmp.git $ git fetch ../parent.git [...] error: Could not read 964953ec7bcc0245cb1d0db4095455edd21a2f2e fatal: Failed to traverse parents of commit b8247b40caf6704fe52736cdece6d6aae87471aa error: ../parent.git did not send all necessary objects This comes from the check_everything_connected rev-list. If we try cloning the same repo (rather than a fetch), we end up using index-pack's --check-self-contained-and-connected option instead, which produces output like: $ git clone --no-local --bare parent.git tmp.git [...] fatal: object of unexpected type fatal: index-pack failed Not only is the sha1 missing, but it's a misleading message. There's no type problem, but rather a missing object problem; we don't notice the difference because we simply compare OBJ_BAD != OBJ_BLOB. Let's provide a different message for this case: $ git clone --no-local --bare parent.git tmp.git fatal: did not receive expected object 6b00a8c61ed379d5f925a72c1987c9c52129d364 fatal: index-pack failed While we're at it, let's also improve a true type mismatch error to look like fatal: object 6b00a8c61ed379d5f925a72c1987c9c52129d364: expected type blob, got tree Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-16index-pack: work around thread-unsafe pread()Nguyễn Thái Ngọc Duy1-10/+17
Multi-threaing of index-pack was disabled with c0f8654 (index-pack: Disable threading on cygwin - 2012-06-26), because pread() implementations for Cygwin and MSYS were not thread safe. Recent Cygwin does offer usable pread() and we enabled multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread, 2013-07-19). Work around this problem on platforms with a thread-unsafe pread() emulation by opening one file handle per thread; it would prevent parallel pread() on different file handles from stepping on each other. Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654 because it's no longer used anywhere. This workaround is unconditional, even for platforms with thread-safe pread() because the overhead is small (a couple file handles more) and not worth fragmenting the code. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-10wrapper.c: add xpread() similar to xread()Yiannis Marangos1-1/+1
It is a common mistake to call read(2)/pread(2) and forget to anticipate that they may return error with EAGAIN/EINTR when the system call is interrupted. We have xread() helper to relieve callers of read(2) from having to worry about it; add xpread() helper to do the same for pread(2). Update the caller in the builtin/index-pack.c and the mmap emulation in compat/. Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-03Merge branch 'nd/index-pack-error-message' into maintJunio C Hamano1-2/+2
* nd/index-pack-error-message: index-pack: report error using the correct variable
2014-03-25Merge branch 'nd/index-pack-error-message'Junio C Hamano1-2/+2
* nd/index-pack-error-message: index-pack: report error using the correct variable
2014-03-17index-pack: report error using the correct variableJunio C Hamano1-2/+2
We feed a string pointer that is potentially NULL to die() when showing the message. Don't. Noticed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-20rename read_replace_refs to check_replace_refsMichael Haggerty1-1/+1
The semantics of this flag was changed in commit e1111cef23 inline lookup_replace_object() calls but wasn't renamed at the time to minimize code churn. Rename it now, and add a comment explaining its use. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Christian Couder1-4/+4
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17Merge branch 'jk/free-tree-buffer'Junio C Hamano1-0/+1
* jk/free-tree-buffer: clear parsed flag when we free tree buffers
2013-06-06clear parsed flag when we free tree buffersJeff King1-0/+1
Many code paths will free a tree object's buffer and set it to NULL after finishing with it in order to keep memory usage down during a traversal. However, out of 8 sites that do this, only one actually unsets the "parsed" flag back. Those sites that don't are setting a trap for later users of the tree object; even after calling parse_tree, the buffer will remain NULL, causing potential segfaults. It is not known whether this is triggerable in the current code. Most commands do not do an in-memory traversal followed by actually using the objects again. However, it does not hurt to be safe for future callers. In most cases, we can abstract this out to a "free_tree_buffer" helper. However, there are two exceptions: 1. The fsck code relies on the parsed flag to know that we were able to parse the object at one point. We can switch this to using a flag in the "flags" field. 2. The index-pack code sets the buffer to NULL but does not free it (it is freed by a caller). We should still unset the parsed flag here, but we cannot use our helper, as we do not want to free the buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28clone: open a shortcut for connectivity checkNguyễn Thái Ngọc Duy1-9/+26
In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28index-pack: remove dead code (it should never happen)Nguyễn Thái Ngọc Duy1-2/+1
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-01Merge branch 'nd/index-pack-threaded-fixes'Junio C Hamano1-8/+26
"index-pack --verify-stat" used a few counters outside protection of mutex, possibly showing incorrect numbers. * nd/index-pack-threaded-fixes: index-pack: guard nr_resolved_deltas reads by lock index-pack: protect deepest_delta in multithread code
2013-04-01Merge branch 'jk/index-pack-correct-depth-fix'Junio C Hamano1-0/+2
"index-pack --fix-thin" used uninitialize value to compute delta depths of objects it appends to the resulting pack. * jk/index-pack-correct-depth-fix: index-pack: always zero-initialize object_entry list
2013-03-20index-pack: always zero-initialize object_entry listJeff King1-0/+2
Commit 38a4556 (index-pack: start learning to emulate "verify-pack -v", 2011-06-03) added a "delta_depth" counter to each "struct object_entry". Initially, all object entries have their depth set to 0; in resolve_delta, we then set the depth of each delta to "base + 1". Base entries never have their depth touched, and remain at 0. To ensure that all depths start at 0, that commit changed calls to xmalloc the object_entry list into calls to xcalloc. However, it forgot that we grow the list with xrealloc later. These extra entries are used when we add an object from elsewhere to complete a thin pack. If we add a non-delta object, its depth value will just be uninitialized heap data. This patch fixes it by zero-initializing entries we add to the objects list via the xrealloc. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19index-pack: guard nr_resolved_deltas reads by lockThomas Rast1-1/+3
The threaded parts of index-pack increment the number of resolved deltas in nr_resolved_deltas guarded by counter_mutex. However, the per-thread outer loop accessed nr_resolved_deltas without any locks. This is not wrong as such, since it doesn't matter all that much whether we get an outdated value. However, unless someone proves that this one lock makes all the performance difference, it would be much cleaner to guard _all_ accesses to the variable with the lock. The only such use is display_progress() in the threaded section (all others are in the conclude_pack() callchain outside the threaded part). To make it obvious that it cannot deadlock, move it out of work_mutex. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19index-pack: protect deepest_delta in multithread codeNguyễn Thái Ngọc Duy1-7/+23
deepest_delta is a global variable but is updated without protection in resolve_delta(), a multithreaded function. Add a new mutex for it, but only protect and update when it's actually used (i.e. show_stat is non-zero). Another variable that will not be updated is delta_depth in "struct object_entry" as it's only useful when show_stat is 1. Putting it in "if (show_stat)" makes it clearer. The local variable "stat" is renamed to "show_stat" after moving to global scope because the name "stat" conflicts with stat(2) syscall. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-16index-pack: fix buffer overflow caused by translationsNguyễn Thái Ngọc Duy1-4/+5
The translation of "completed with %d local objects" is put in a 48-byte buffer, which may be enough for English but not true for any translations. Convert it to use strbuf (i.e. no hard limit on translation length). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-31i18n: mark more index-pack strings for translationNguyễn Thái Ngọc Duy1-10/+11
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-15Merge branch 'jk/index-pack-streaming-fix'Junio C Hamano1-10/+13
The streaming index-pack introduced in 1.7.11 had a data corruption bug, and this should fix it. * jk/index-pack-streaming-fix: index-pack: loop while inflating objects in unpack_data
2012-07-10index-pack: loop while inflating objects in unpack_dataJeff King1-10/+13
When the unpack_data function is given a consume() callback, it unpacks only 64K of the input at a time, feeding it to git_inflate along with a 64K output buffer. However, because we are inflating, there is a good chance that the output buffer will fill before consuming all of the input. In this case, we need to loop on git_inflate until we have fed the whole input buffer, feeding each chunk of output to the consume buffer. The current code does not do this, and as a result, will fail the loop condition and trigger a fatal "serious inflate inconsistency" error in this case. While we're rearranging the loop, let's get rid of the extra last_out pointer. It is meant to point to the beginning of the buffer that we feed to git_inflate, but in practice this is always the beginning of our same 64K buffer, because: 1. At the beginning of the loop, we are feeding the buffer. 2. At the end of the loop, if we are using a consume() function, we reset git_inflate's pointer to the beginning of the buffer. If we are not using a consume() function, then we do not care about the value of last_out at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-09Merge branch 'rj/platform-pread-may-be-thread-unsafe'Junio C Hamano1-2/+2
On Cygwin, the platform pread(3) is not thread safe, just like our own compat/ emulation, and cannot be used in the index-pack program. * rj/platform-pread-may-be-thread-unsafe: index-pack: Disable threading on cygwin
2012-06-26index-pack: Disable threading on cygwinJunio C Hamano1-2/+2
The Cygwin implementation of pread() is not thread-safe since, just like the emulation provided by compat/pread.c, it uses a sequence of seek-read-seek calls. In order to avoid failues due to thread-safety issues, commit b038a61 disables threading when NO_PREAD is defined. (ie when using the emulation code in compat/pread.c). We introduce a new build variable, NO_THREAD_SAFE_PREAD, which allows use to disable the threaded index-pack code on cygwin, in addition to the above NO_PREAD case. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-24index-pack: use streaming interface for collision test on large blobsNguyễn Thái Ngọc Duy1-5/+77
When putting whole objects in core is unavoidable, try match object type and size first before actually inflating. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23index-pack: factor out unpack core from get_data_from_packNguyễn Thái Ngọc Duy1-3/+24
This allows caller to consume large inflated object with a fixed amount of memory. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23index-pack: use streaming interface on large blobs (most of the time)Nguyễn Thái Ngọc Duy1-8/+44
unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23index-pack: hash non-delta objects while reading from streamNguyễn Thái Ngọc Duy1-11/+30
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-14Merge branch 'nd/threaded-index-pack'Junio C Hamano1-66/+275
Enables threading in index-pack to resolve base data in parallel. By Nguyễn Thái Ngọc Duy (3) and Ramsay Jones (1) * nd/threaded-index-pack: index-pack: disable threading if NO_PREAD is defined index-pack: support multithreaded delta resolving index-pack: restructure pack processing into three main functions compat/win32/pthread.h: Add an pthread_key_delete() implementation
2012-05-07index-pack: disable threading if NO_PREAD is definedNguyễn Thái Ngọc Duy1-0/+5
NO_PREAD simulates pread() as a sequence of seek, read, seek in compat/pread.c. The simulation is not thread-safe because another thread could move the file offset away in the middle of pread operation. Do not allow threading in that case. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-07index-pack: support multithreaded delta resolvingNguyễn Thái Ngọc Duy1-11/+193
This puts delta resolving on each base on a separate thread, one base cache per thread. Per-thread data is grouped in struct thread_local. When running with nr_threads == 1, no pthreads calls are made. The system essentially runs in non-thread mode. An experiment on a Xeon 24 core machine with git.git shows that performance does not increase proportional to the number of cores. So by default, we use maximum 3 cores. Some numbers with --threads from 1 to 16: 1..4 real 0m8.003s 0m5.307s 0m4.321s 0m3.830s user 0m7.720s 0m8.009s 0m8.133s 0m8.305s sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s 5..8 real 0m3.727s 0m3.604s 0m3.332s 0m3.369s user 0m9.361s 0m9.817s 0m9.525s 0m9.769s sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s 9..12 real 0m3.036s 0m3.139s 0m3.177s 0m2.961s user 0m8.977s 0m10.205s 0m9.737s 0m10.073s sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s 13..16 real 0m2.985s 0m2.894s 0m2.975s 0m2.971s user 0m9.825s 0m10.573s 0m10.833s 0m11.361s sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s On an Intel dual core and linux-2.6.git 1..4 real 2m37.789s 2m7.963s 2m0.920s 1m58.213s user 2m28.415s 2m52.325s 2m50.176s 2m41.187s sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s Thanks Ramsay Jones for troubleshooting and support on MinGW platform. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-07index-pack: restructure pack processing into three main functionsNguyễn Thái Ngọc Duy1-53/+75
The second pass in parse_pack_objects() are split into resolve_deltas(). The final phase, fixing thin pack or just seal the pack, is now in conclude_pack() function. Main pack processing is now a sequence of these functions: - parse_pack_objects() reads through the input pack - resolve_deltas() makes sure all deltas can be resolved - conclude_pack() seals the output pack - write_idx_file() writes companion index file - final() moves the pack/index to proper place Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-24i18n: index-pack: mark strings for translationNguyễn Thái Ngọc Duy1-57/+68
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-16index-pack: eliminate unlimited recursion in get_base_data()Nguyễn Thái Ngọc Duy1-9/+44
Revese the order of delta applying so that by the time a delta is applied, its base is either non-delta or already inflated. get_base_data() is still recursive, but because base's data is always ready, the inner get_base_data() call never has any chance to call itself again. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-16index-pack: eliminate recursion in find_unresolved_deltasNguyễn Thái Ngọc Duy1-41/+70
Current find_unresolved_deltas() links all bases together in a form of tree, using struct base_data, with prev_base pointer to point to parent node. Then it traverses down from parent to children in recursive manner with all base_data allocated on stack. To eliminate recursion, we simply need to put all on heap (parse_pack_objects and fix_unresolved_deltas). After that, it's simple non-recursive depth-first traversal loop. Each node also maintains its own state (ofs and ref indices) to iterate over all children nodes. So we process one node: - if it returns a new (child) node (a parent base), we link it to our tree, then process the new node. - if it returns nothing, the node is done, free it. We go back to parent node and resume whatever it's doing. and do it until we have no nodes to process. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21Appease Sun Studio by renaming "tmpfile"Ævar Arnfjörð Bjarmason1-3/+3
On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-16receive-pack, fetch-pack: reject bogus pack that records objects twiceJunio C Hamano1-1/+3
When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-19Merge branch 'jc/index-pack'Junio C Hamano1-42/+208
* jc/index-pack: verify-pack: use index-pack --verify index-pack: show histogram when emulating "verify-pack -v" index-pack: start learning to emulate "verify-pack -v" index-pack: a miniscule refactor index-pack --verify: read anomalous offsets from v2 idx file write_idx_file: need_large_offset() helper function index-pack: --verify write_idx_file: introduce a struct to hold idx customization options index-pack: group the delta-base array entries also by type Conflicts: builtin/verify-pack.c cache.h sha1_file.c
2011-07-19Merge branch 'jc/zlib-wrap'Junio C Hamano1-6/+6
* jc/zlib-wrap: zlib: allow feeding more than 4GB in one go zlib: zlib can only process 4GB at a time zlib: wrap deflateBound() too zlib: wrap deflate side of the API zlib: wrap inflateInit2 used to accept only for gzip format zlib: wrap remaining calls to direct inflate/inflateEnd zlib wrapper: refactor error message formatter Conflicts: sha1_file.c
2011-06-10zlib: zlib can only process 4GB at a timeJunio C Hamano1-3/+3
The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10zlib: wrap deflate side of the APIJunio C Hamano1-3/+3
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05index-pack: show histogram when emulating "verify-pack -v"Junio C Hamano1-3/+23
The histogram produced by "verify-pack -v" always had an artificial limit of 50, but index-pack knows what the maximum delta depth is, so we do not have to limit it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05index-pack: start learning to emulate "verify-pack -v"Junio C Hamano1-3/+40
The "index-pack" machinery already has almost enough knowledge to produce the same output as "verify-pack -v". Fill small gaps in its bookkeeping, and teach it to show what it knows. Add a few more command line options that do not have to be advertised to the end users. They will be used internally when verify-pack calls this. The eventual goal is to remove verify-pack implementation and redo it as a thin wrapper around the index-pack, so that we can remove the rather expensive packed_object_info_detail() API. This still does not do the delta-chain-depth histogram yet but that part is easy. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05index-pack: a miniscule refactorJunio C Hamano1-3/+8
Introduce a helper function that takes the type of an object and tell if it is a delta, as we seem to use this check in many places. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-11sparse: Fix an "symbol 'cmd_index_pack' not declared" warningRamsay Jones1-1/+1
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-03sparse: Fix errors and silence warningsStephen Boyd1-1/+1
* load_file() returns a void pointer but is using 0 for the return value * builtin/receive-pack.c forgot to include builtin.h * packet_trace_prefix can be marked static * ll_merge takes a pointer for its last argument, not an int * crc32 expects a pointer as the second argument but Z_NULL is defined to be 0 (see 38f4d13 sparse fix: Using plain integer as NULL pointer, 2006-11-18 for more info) Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22Fix sparse warningsStephen Boyd1-1/+1
Fix warnings from 'make check'. - These files don't include 'builtin.h' causing sparse to complain that cmd_* isn't declared: builtin/clone.c:364, builtin/fetch-pack.c:797, builtin/fmt-merge-msg.c:34, builtin/hash-object.c:78, builtin/merge-index.c:69, builtin/merge-recursive.c:22 builtin/merge-tree.c:341, builtin/mktag.c:156, builtin/notes.c:426 builtin/notes.c:822, builtin/pack-redundant.c:596, builtin/pack-refs.c:10, builtin/patch-id.c:60, builtin/patch-id.c:149, builtin/remote.c:1512, builtin/remote-ext.c:240, builtin/remote-fd.c:53, builtin/reset.c:236, builtin/send-pack.c:384, builtin/unpack-file.c:25, builtin/var.c:75 - These files have symbols which should be marked static since they're only file scope: submodule.c:12, diff.c:631, replace_object.c:92, submodule.c:13, submodule.c:14, trace.c:78, transport.c:195, transport-helper.c:79, unpack-trees.c:19, url.c:3, url.c:18, url.c:104, url.c:117, url.c:123, url.c:129, url.c:136, thread-utils.c:21, thread-utils.c:48 - These files redeclare symbols to be different types: builtin/index-pack.c:210, parse-options.c:564, parse-options.c:571, usage.c:49, usage.c:58, usage.c:63, usage.c:72 - These files use a literal integer 0 when they really should use a NULL pointer: daemon.c:663, fast-import.c:2942, imap-send.c:1072, notes-merge.c:362 While we're in the area, clean up some unused #includes in builtin files (mostly exec_cmd.h). Signed-off-by: Stephen Boyd <bebarino@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16standardize brace placement in struct definitionsJonathan Nieder1-4/+2
In a struct definitions, unlike functions, the prevailing style is for the opening brace to go on the same line as the struct name, like so: struct foo { int bar; char *baz; }; Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many matches as 'struct [a-z_]*$'. Linus sayeth: Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that (a) K&R are _right_ and (b) K&R are right. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27index-pack --verify: read anomalous offsets from v2 idx fileJunio C Hamano1-0/+48
A pack v2 .idx file usually records offset using 64-bit representation only when the offset does not fit within 31-bit, but you can handcraft your .idx file to record smaller offset using 64-bit, storing all zero in the upper 4-byte. By inspecting the original idx file when running index-pack --verify, encode such low offsets that do not need to be in 64-bit but are encoded using 64-bit just like the original idx file so that we can still validate the pack/idx pair by comparing the idx file recomputed with the original. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27index-pack: --verifyJunio C Hamano1-6/+40
Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27write_idx_file: introduce a struct to hold idx customization optionsJunio C Hamano1-10/+13
Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27index-pack: group the delta-base array entries also by typeJunio C Hamano1-21/+40
Entries in the delta_base array are only grouped by the bytepattern in the delta_base union, some of which have 20-byte object name of the base object (i.e. base for REF_DELTA objects), while others have sizeof(off_t) bytes followed by enough NULs to fill 20-byte. The loops to iterate through a range inside this array still needs to inspect the type of the delta, and skip over false hits. Group the entries also by type to eliminate the potential of false hits. Signed-off-by: Junio C Hamano <gitster@pobox.com>