aboutsummaryrefslogtreecommitdiffstats
path: root/t/helper/test-read-midx.c
AgeCommit message (Collapse)AuthorFilesLines
2025-08-11midx: stop duplicating info redundant with its owning sourcePatrick Steinhardt1-1/+1
Multi-pack indices store some information that is redundant with their owning source: - The locality bit that tracks whether the source is the primary object source or an alternate. - The object directory path the multi-pack index is located in. - The pointer to the owning parent directory. All of this information is already contained in `struct odb_source`. So now that we always have that struct available when loading a multi-pack index we have it readily accessible. Drop the redundant information and instead store a pointer to the object source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11midx: load multi-pack indices via their sourcePatrick Steinhardt1-10/+15
To load a multi-pack index the caller is expected to pass both the repository and the object directory where the multi-pack index is located. While this works, this layout has a couple of downsides: - We need to pass in information reduntant with the owning source, namely its object directory and whether the source is local or not. - We don't have access to the source when loading the multi-pack index. If we had that access, we could store a pointer to the owning source in the MIDX and thus deduplicate some information. - Multi-pack indices are inherently specific to the object source and its format. With the goal of pluggable object backends in mind we will eventually want the backends to own the logic of reading and writing multi-pack indices. Making the logic work on top of object sources is a step into that direction. Refactor loading of multi-pack indices accordingly. This surfaces one small problem though: git-multi-pack-index(1) and our MIDX test helper both know to read and write multi-pack-indices located in a different object directory. This issue is addressed by adding the user-provided object directory as an in-memory alternate. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-11midx: drop redundant `struct repository` parameterPatrick Steinhardt1-2/+2
There are a couple of functions that take both a `struct repository` and a `struct multi_pack_index`. This provides redundant information though without much benefit given that the multi-pack index already has a pointer to its owning repository. Drop the `struct repository` parameter from such functions. While at it, reorder the list of parameters of `fill_midx_entry()` so that the MIDX comes first to better align with our coding guidelines. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt1-1/+1
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt1-1/+1
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04midx: pass `repository` to `load_multi_pack_index`Karthik Nayak1-4/+4
The `load_multi_pack_index` function in midx uses `the_repository` variable to access the `repository` struct. Modify the function and its callee's to send the `repository` field. This moves usage of `the_repository` to the `test-read-midx.c` file. While that is not optimal, it is okay, since the upcoming commits will slowly move the usage of `the_repository` up the layers and remove it eventually. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22t/helper: fix leaking multi-pack-indices in "read-midx"Patrick Steinhardt1-1/+7
Several of the subcommands of `test-helper read-midx` do not close the MIDX that they have opened, leading to memory leaks. Fix those. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-06midx: support reading incremental MIDX chainsTaylor Blau1-7/+17
Now that the MIDX machinery's internals have been taught to understand incremental MIDXs over the previous handful of commits, the MIDX machinery itself can begin reading incremental MIDXs. (Note that while the on-disk format for incremental MIDXs has been defined, the writing end has not been implemented. This will take place in the commit after next.) The core of this change involves following the order specified in the MIDX chain in reverse and opening up MIDXs in the chain one-by-one, adding them to the previous layer's `->base_midx` pointer at each step. In order to implement this, the `load_multi_pack_index()` function is taught to call a new `load_multi_pack_index_chain()` function if loading a non-incremental MIDX failed via `load_multi_pack_index_one()`. When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in the file one-by-one and dispatches calls to `load_multi_pack_index_one()` to read each layer of the MIDX chain. When a layer was successfully read, it is added to the MIDX chain by calling `add_midx_to_chain()` which validates the contents of the `BASE` chunk, performs some bounds checks on the number of combined packs and objects, and attaches the new MIDX by assigning its `base_midx` pointer to the existing part of the chain. As a supplement to this, introduce a new mode in the test-read-midx test-tool which allows us to read the information for a specific MIDX in the chain by specifying its trailing checksum via the command-line arguments like so: $ test-tool read-midx .git/objects [checksum] Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+2
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14midx: implement `midx_preferred_pack()`Taylor Blau1-8/+5
When performing a binary search over the objects in a MIDX's bitmap (i.e. in pseudo-pack order), the reader reconstructs the pseudo-pack ordering using a combination of (a) the preferred pack, (b) the pack's lexical position in the MIDX based on pack names, and (c) the object offset within the pack. In order to perform this binary search, the reader must know the identity of the preferred pack. This could be stored in the MIDX, but isn't for historical reasons, mostly because it can easily be inferred at read-time by looking at the object in the first bit position and finding out which pack it was selected from in the MIDX, like so: nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); In midx_to_pack_pos() which performs this binary search, we look up the identity of the preferred pack before each search. This is relatively quick, since it involves two table-driven lookups (one in the MIDX's revindex for `pack_pos_to_midx()`, and another in the MIDX's object table for `nth_midxed_pack_int_id()`). But since the preferred pack does not change after the MIDX is written, it is safe to cache this value on the MIDX itself. Write a helper to do just that, and rewrite all of the existing call-sites that care about the identity of the preferred pack in terms of this new helper. This will prepare us for a subsequent patch where we will need to binary search through the MIDX's pseudo-pack order multiple times. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14midx: implement `BTMP` chunkTaylor Blau1-1/+29
When a multi-pack bitmap is used to implement verbatim pack reuse (that is, when verbatim chunks from an on-disk packfile are copied directly[^1]), it does so by using its "preferred pack" as the source for pack-reuse. This allows repositories to pack the majority of their objects into a single (often large) pack, and then use it as the single source for verbatim pack reuse. This increases the amount of objects that are reused verbatim (and consequently, decrease the amount of time it takes to generate many packs). But this performance comes at a cost, which is that the preferred packfile must pace its growth with that of the entire repository in order to maintain the utility of verbatim pack reuse. As repositories grow beyond what we can reasonably store in a single packfile, the utility of verbatim pack reuse diminishes. Or, at the very least, it becomes increasingly more expensive to maintain as the pack grows larger and larger. It would be beneficial to be able to perform this same optimization over multiple packs, provided some modest constraints (most importantly, that the set of packs eligible for verbatim reuse are disjoint with respect to the subset of their objects being sent). If we assume that the packs which we treat as candidates for verbatim reuse are disjoint with respect to any of their objects we may output, we need to make only modest modifications to the verbatim pack-reuse code itself. Most notably, we need to remove the assumption that the bits in the reachability bitmap corresponding to objects from the single reuse pack begin at the first bit position. Future patches will unwind these assumptions and reimplement their existing functionality as special cases of the more general assumptions (e.g. that reuse bits can start anywhere within the bitset, but happen to start at 0 for all existing cases). This patch does not yet relax any of those assumptions. Instead, it implements a foundational data-structure, the "Bitampped Packs" (`BTMP`) chunk of the multi-pack index. The `BTMP` chunk's contents are described in detail here. Importantly, the `BTMP` chunk contains information to map regions of a multi-pack index's reachability bitmap to the packs whose objects they represent. For now, this chunk is only written, not read (outside of the test-tool used in this patch to test the new chunk's behavior). Future patches will begin to make use of this new chunk. [^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the pack that wasn't used verbatim. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren1-1/+1
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24treewide: remove cache.h inclusion due to previous changesElijah Newren1-1/+0
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24packfile.h: move pack_window and pack_entry from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21setup.h: move declarations for setup.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-27t/helper/test-read-midx.c: free MIDX within read_midx_file()Taylor Blau1-1/+2
When calling `read_midx_file()` to show information about a MIDX or list the objects contained within it we fail to call `close_midx()`, leaking the memory allocated to store that MIDX. Fix this by calling `close_midx()` before exiting the function. We can drop the "early" return when `show_objects` is non-zero, since the next instruction is also a return. (We could just as easily put a `cleanup` label here as with previous patches. But the only other time we terminate the function early is when we fail to load a MIDX in the first place. `close_midx()` does handle a NULL argument, but the extra complexity is likely not warranted). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07test-read-midx: fix leak of bitmap_index structJeff King1-2/+6
In read_midx_preferred_pack(), we open the bitmap index but never free it. This isn't a big deal since this is just a test helper, and we exit immediately after, but since we're trying to keep our leak-checking tidy now, it's worth fixing. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28builtin/repack.c: make largest pack preferredTaylor Blau1-1/+24
When repacking into a geometric series and writing a multi-pack bitmap, it is beneficial to have the largest resulting pack be the preferred object source in the bitmap's MIDX, since selecting the large packs can lead to fewer broken delta chains and better compression. Teach 'git repack' to identify this pack and pass it to the MIDX write machinery in order to mark it as preferred. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t/helper/test-read-midx.c: add --checksum modeTaylor Blau1-1/+15
Subsequent tests will want to check for the existence of a multi-pack bitmap which matches the multi-pack-index stored in the pack directory. The multi-pack bitmap includes the hex checksum of the MIDX it corresponds to in its filename (for example, '$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests want a way to learn what '<checksum>' is. This helper addresses that need by printing the checksum of the repository's multi-pack-index. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30t/helper/test-read-midx.c: add '--show-objects'Taylor Blau1-4/+20
The 'read-midx' helper is used in places like t5319 to display basic information about a multi-pack-index. In the next patch, the MIDX writing machinery will learn a new way to choose from which pack an object is selected when multiple copies of that object exist. To disambiguate which pack introduces an object so that this feature can be tested, add a '--show-objects' option which displays additional information about each object in the MIDX. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17multi-pack-index: use hash version byteDerrick Stolee1-2/+6
Similar to the commit-graph format, the multi-pack-index format has a byte in the header intended to track the hash version used to write the file. This allows one to interpret the hash length without having the context of the repository config specifying the hash length. This was not modified as part of the SHA-256 work because the hash length was automatically up-shifted due to that config. Since we have this byte available, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Add a new oid_version() method in midx.c similar to the one in commit-graph.c. This is specifically made separate from that implementation to avoid artificially linking the formats. The test impact requires a few more things than the corresponding change in the commit-graph format. Specifically, 'test-tool read-midx' was not writing anything about this header value to output. Since the value available in 'struct multi_pack_index' is hash_len instead of a version value, we output "20" or "32" instead of "1" or "2". Since we want a user to not have their Git commands fail if their multi-pack-index has the incorrect hash version compared to the repository's hash version, we relax the die() to an error() in load_multi_pack_index(). This has some effect on 'git multi-pack-index verify' as we need to check that a failed parse of a file that exists is actually a verify error. For that test that checks the hash version matches, we change the corrupted byte from "2" to "3" to ensure the test fails for both hash algorithms. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20multi-pack-index: store local propertyDerrick Stolee1-1/+1
A pack-file is 'local' if it is stored within the usual object directory. If it is stored in an alternate, it is non-local. Pack-files are stored using a 'pack_local' member in the packed_git struct. Add a similar 'local' member to the multi_pack_index struct and 'local' parameters to the methods that load and prepare multi- pack-indexes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write object offsetsDerrick Stolee1-0/+4
The final pair of chunks for the multi-pack-index file stores the object offsets. We default to using 32-bit offsets as in the pack-index version 1 format, but if there exists an offset larger than 32-bits, we use a trick similar to the pack-index version 2 format by storing all offsets at least 2^31 in a 64-bit table; we use the 32-bit table to point into that 64-bit table as necessary. We only store these 64-bit offsets if necessary, so create a test that manipulates a version 2 pack-index to fake a large offset. This allows us to test that the large offset table is created, but the data does not match the actual packfile offsets. The multi-pack-index offset does match the (corrupted) pack-index offset, so a future feature will compare these offsets during a 'verify' step. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write object id fanout chunkDerrick Stolee1-1/+3
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: write object ids in a chunkDerrick Stolee1-0/+2
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20midx: read pack names into arrayDerrick Stolee1-0/+5
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20multi-pack-index: write pack names in chunkDerrick Stolee1-0/+7
The multi-pack-index needs to track which packfiles it indexes. Store these in our first required chunk. Since filenames are not well structured, add padding to keep good alignment in later chunks. Modify the 'git multi-pack-index read' subcommand to output the existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh to reflect this new output and the new expected number of chunks. Defense in depth: A pattern we are using in the multi-pack-index feature is to verify the data as we write it. We want to ensure we never write invalid data to the multi-pack-index. There are many checks that verify that the values we are writing fit the format definitions. This mainly helps developers while working on the feature, but it can also identify issues that only appear when dealing with very large data sets. These large sets are hard to encode into test cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20multi-pack-index: load into memoryDerrick Stolee1-0/+31
Create a new multi_pack_index struct for loading multi-pack-indexes into memory. Create a test-tool builtin for reading basic information about that multi-pack-index to verify the correct data is written. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>