| Age | Commit message (Collapse) | Author | Files | Lines |
|
The logic to close an object database is currently contained in the
packfile subsystem. That choice is somewhat relatable, as most of the
logic really is to close resources associated with the packfile store
itself. But we also end up handling object sources and commit graphs,
which certainly is not related to packfiles.
Move the function into the object database subsystem and rename it to
`odb_close()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Over the past several dozen commits, we have moved a large amount of
functionality out of the repack builtin and into other files like
repack.c, repack-cruft.c, repack-filtered.c, repack-midx.c, and
repack-promisor.c.
These files specify the minimal set of `#include`s that they need to
compile successfully, but we did not change the set of `#include`s in
the repack builtin itself.
Now that the code movement is complete, let's clean up that set of
`#include`s and trim down the builtin to include the minimal amount of
external headers necessary to compile.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In an identical fashion as the previous commit, move the function
`write_cruft_pack()` into its own compilation unit, and make the
function visible through the repack.h API.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a similar fashion as in previous commits, move the function
`write_filtered_pack()` out of the builtin and into its own compilation
unit.
This function is now part of the repack.h API, but implemented in its
own "repack-filtered.c" unit as it is a separate component from other
kinds of repacking operations.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "pack_kept_objects" variable is defined as static to the repack
builtin, but is inherently related to the pack-objects arguments that
the builtin uses when generating new packs.
Move that field into the "struct pack_objects_args", and shuffle around
where we append the corresponding command-line option when preparing a
pack-objects process. Specifically:
- `write_cruft_pack()` always wants to pass "--honor-pack-keep", so
explicitly set the `pack_kept_objects` field to "0" when initializing
the `write_pack_opts` struct before calling `write_cruft_pack()`.
- `write_filtered_pack()` no longer needs to handle writing the
command-line option "--honor-pack-keep" when preparing a pack-objects
process, since its call to `prepare_pack_objects()` will have already
taken care of that.
`write_filtered_pack()` also reads the `pack_kept_objects` field to
determine whether to write the existing kept packs with a leading "^"
character, so update that to read through the `po_args` pointer
instead.
- `cmd_repack()` also no longer has to write the "--honor-pack-keep"
flag explicitly, since this is also handled via its call to
`prepare_pack_objects()`.
Since there is a default value for "pack_kept_objects" that relies on
whether or not we are writing a bitmap (and not writing a MIDX), extract
a default initializer for `struct pack_objects_args` that keeps this
conditional default behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a similar spirit as the previous commit(s), now that the function
`finish_pack_objects_cmd()` has no explicit dependencies within the
repack builtin, let's extract it.
This prepares us to extract the remaining two functions within the
repack builtin that explicitly write packfiles, which are
`write_cruft_pack()` and `write_filtered_pack()`, which will be done in
the future commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To prepare to move the `finish_pack_objects_cmd()` function out of the
builtin and into the repack.h API, there are a couple of things we need
to do first:
- First, let's take advantage of `write_pack_opts_is_local()` function
introduced in the previous commit instead of passing "local"
explicitly.
- Let's also avoid referring to the static 'packtmp' field within
builtin/repack.c by instead accessing it through the write_pack_opts
argument.
There are three callers which need to adjust themselves in order to
account for this change. The callers which reside in write_cruft_pack()
and write_filtered_pack() both already have an "opts" in scope, so they
can pass it through transparently.
The other call (at the bottom of `cmd_repack()`) needs to initialize its
own write_pack_opts to pass the necessary fields over to the direct call
to `finish_pack_objects_cmd()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Similar to the previous commit, the functions `write_cruft_pack()` and
`write_filtered_pack()` both compute a "local" variable via the exact
same mechanism:
const char *scratch;
int local = skip_prefix(opts->destination, opts->packdir, &scratch);
Not only does this cause us to repeat the same pair of lines, it also
introduces an unnecessary "scratch" variable that is common between both
functions.
Instead of repeating ourselves, let's extract that functionality into a
new function in the repack.h API called "write_pack_opts_is_local()".
That function takes a pointer to a "struct write_pack_opts" (which has
as fields both "destination" and "packdir"), and can encapsulate the
dangling "scratch" field.
Extract that function and make it visible within the repack.h API, and
use it within both `write_cruft_pack()` and `write_filtered_pack()`.
While we're at it, match our modern conventions by returning a "bool"
instead of "int", and use `starts_with()` instead of `skip_prefix()` to
avoid storing the dummy "scratch" variable.
The remaining duplication (that is, that both `write_cruft_pack()` and
`write_filtered_pack()` still both call `write_pack_opts_is_local()`)
will be addressed in the following commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Both callers within the repack builtin which call functions that take a
'write_pack_opts' structure have the following pattern:
struct write_pack_opts opts = {
.packdir = packdir,
.packtmp = packtmp,
.pack_prefix = find_pack_prefix(packdir, packtmp),
/* ... */
};
int ret = write_some_kind_of_pack(&opts, /* ... */);
, but both "packdir" and "packtmp" are fields within the write_pack_opts
struct itself!
Instead of also computing the pack_prefix ahead of time, let's have the
callees compute it themselves by moving `find_pack_prefix()` out of the
repack builtin, and have it take a write_pack_opts pointer instead of
the "packdir" and "packtmp" fields directly.
This avoids the callers having to do some prep work that is common
between the two of them, but also avoids the potential pitfall of
accidentally writing:
.pack_prefix = find_pack_prefix(packtmp, packdir),
(which is well-typed) when the caller meant to instead write:
.pack_prefix = find_pack_prefix(packdir, packtmp),
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Similar to the changes made in the previous commit to
`write_filtered_pack()`, teach `write_cruft_pack()` to take a
`write_pack_opts` struct and use that where possible.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are various functions within the 'repack' builtin which are
responsible for writing different kinds of packs. They include:
- `static int write_filtered_pack(...)`
- `static int write_cruft_pack(...)`
as well as the function `finish_pack_objects_cmd()`, which is
responsible for finalizing a new pack write, and recording the checksum
of its contents in the 'names' list.
Both of these `write_` functions have a few things in common. They both
take a pointer to the 'pack_objects_args' struct, as well as a pair of
character pointers for `destination` and `pack_prefix`.
Instead of repeating those arguments for each function, let's extract an
options struct called "write_pack_opts" which has these three parameters
as member fields. While we're at it, add fields for "packdir," and
"packtmp", both of which are static variables within the builtin, and
need to be read from within these two functions.
This will shorten the list of parameters that callers have to provide to
`write_filtered_pack()`, avoid ambiguity when passing multiple variables
of the same type, and provide a unified interface for the two functions
mentioned earlier.
(Note that "pack_prefix" can be derived on the fly as a function of
"packdir" and "packtmp", making it unnecessary to store "pack_prefix"
explicitly. This commit ignores that potential cleanup in the name of
doing as few things as possible, but a later commit will make that
change.)
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that we have sufficiently cleaned up the write_midx_included_packs()
function, we can move it (along with the struct repack_write_midx_opts)
out of the builtin, and into the repack.h header.
Since this function (and the static ones that it depends on) are
MIDX-specific details of the repacking process, move them to the
repack-midx.c compilation unit instead of the general repack.c one.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To write a MIDX at the end of a repack operation, 'git repack' presently
computes the set of packs to write into the MIDX, before invoking
`write_midx_included_packs()` with a `string_list` containing those
packs.
The logic for computing which packs are supposed to appear in the
resulting MIDX is within `midx_included_packs()`, where it is aware of
details like which cruft pack(s) were written/combined, if/how we did a
geometric repack, etc.
Computing this list ourselves before providing it to the sole function
to make use of that list `write_midx_included_packs()` is somewhat
awkward. In the future, repack will learn how to write incremental
MIDXs, which will use a very different pack selection routine.
Instead of doing something like:
struct string_list included_packs = STRING_LIST_INIT_DUP;
if (incremental) {
midx_incremental_included_packs(&included_packs, ...):
write_midx_incremental_included_packs(&included_packs, ...);
} else {
midx_included_packs(&included_packs, ...):
write_midx_included_packs(&included_packs, ...);
}
in the future, let's have each function that writes a MIDX be
responsible for itself computing the list of included packs. Inline the
declaration and initialization of `included_packs` into the
`write_midx_included_packs()` function itself, and repeat that pattern
in the future when we introduce new ways to write MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Instead of passing individual parameters (in this case, "existing",
"names", and "geometry") to `midx_included_packs()`, pass a pointer to a
`repack_write_midx_opts` structure instead.
Besides reducing the number of parameters necessary to call the
`midx_included_packs` function, this refactoring sets us up nicely to
inline the call to `midx_included_packs()` into
`write_midx_included_packs()`, thus making the caller (in this case,
`cmd_repack()`) oblivious to the set of packs being written into the
MIDX.
In order to do this, `repack_write_midx_opts` has to keep track of the
set of existing packs, so add an additional field to point to that set.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
After writing a new MIDX, the repack command removes any bitmaps
belonging to packs which were written into the MIDX.
This is currently done in a separate function outside of
`write_midx_included_packs()`, which forces the caller to keep track of
the set of packs written into the MIDX.
Prepare to no longer require the caller to keep track of such
information by inlining the clean-up into `write_midx_included_packs()`.
Future commits will make the caller oblivious to the set of packs
included in the MIDX altogether.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The next commit will inline the call to `remove_redundant_bitmaps()`
into `write_midx_included_packs()`. Reorder these two functions to avoid
a forward declaration to `remove_redundant_bitmaps()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Instead of storing the list of MIDX pack names separately, let's inline
it into the existing_packs struct, further reducing the number of
parameters we have to pass around.
This amounts to adding a new string_list to the existing_packs struct,
and populating it via `existing_packs_collect()`. This is fairly
straightforward to do, since we are already looping over all packs, all
we need to do is:
if (p->multi_pack_index)
string_list_append(&existing->midx_packs, pack_basename(p));
Note, however, that this check *must* come before other conditions where
we discard and do not keep track of a pack, including the condition "if
(!p->pack_local)" immediately below. This is because the existing
routine which collects MIDX pack names does so blindly, and does not
discard, for example, non-local packs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When writing a new MIDX, repack must determine whether or not there are
any packs in the MIDX it is replacing (if one exists) that are not
somehow represented in the new MIDX (e.g., either by preserving the pack
verbatim, or rolling it up as part of a geometric repack, etc.).
In order to do this, it keeps track of a list of pack names from the
MIDX present in the repository at the start of the repack operation.
Since we manipulate and close the object store, we cannot rely on the
repository's in-core representation of the MIDX, since this is subject
to change and/or go away.
When this behavior was introduced in 5ee86c273b (repack: exclude cruft
pack(s) from the MIDX where possible, 2025-06-23), we maintained an
array of character pointers instead of using a convenience API, such as
string-list.h.
Store the list of MIDX pack names in a string_list, thereby reducing the
number of parameters we have to pass to `midx_has_unknown_packs()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function 'write_midx_included_packs()', which is responsible for
writing a new MIDX with a given set of included packs, currently takes a
list of six arguments.
In order to extract this function out of the builtin, we have to pass
in a few additional parameters, like 'midx_must_contain_cruft' and
'packdir', which are currently declared as static variables within the
builtin/repack.c compilation unit.
Instead of adding additional parameters to `write_midx_included_packs()`
extract out an "opts" struct that names these parameters, and pass a
pointer to that, making it less cumbersome to add additional parameters.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When writing a MIDX, 'git repack' takes a snapshot of the repository's
references and writes the result out to a file, which it then passes to
'git multi-pack-index write' via the '--refs-snapshot'.
This is done in order to make bitmap selections with respect to what we
are packing, thus avoiding a race where an incoming reference update
causes us to try and write a bitmap for a commit not present in the
MIDX.
Extract this functionality out into a new repack-midx.c compilation
unit, and expose the necessary functions via the repack.h API.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that the pack_geometry API is fully factored and isolated from the
rest of the builtin, declare it within repack.h and move its
implementation to "repack-geometry.c" as a separate component.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For similar reasons as the preceding commit, pass the "packdir" variable
directly to `pack_geometry_remove_redundant()` as a parameter to the
function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Prepare to move pack_geometry-related APIs to their own compilation unit
by passing in the static "pack_kept_objects" variable directly as a
parameter to the 'pack_geometry_init()' function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Rename functions which work with 'struct pack_geometry' to begin with
"pack_geometry_". While we're at it, change `free_pack_geometry()` to
instead be named `pack_geometry_release()` to match our conventions, and
make clear that that function frees the contents of the struct, not the
memory allocated to hold the struct itself.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that we have properly factored the portion of the builtin which is
responsible for repacking promisor objects, we can move that function
(and associated dependencies) out of the builtin entirely.
Similar to previous extractions, this function is declared in repack.h,
but implemented in a separate repack-promisor.c file. This is done to
separate promisor-specific repacking functionality from generic repack
utilities (like "existing_packs", and "generated_pack" APIs).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a similar spirit as previous commit(s), pass the "packtmp" variable
to "repack_promisor_objects()" as an explicit parameter of the function,
preparing us to move this function in a following commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that we have factored the "generated_pack" API, we can move it to
repack.ch, further slimming down builtin/repack.c.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Repeat what was done in the preceding commit for the
`generated_pack_install()` function, which needs both "packdir" and
"packtmp".
(As an aside, it is somewhat unfortunate that the final three parameters
to this function are all "const char *", making errors like passing
"packdir" and "packtmp" in the wrong order easy. We could define a new
structure here, but that may be too heavy-handed.)
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a similar spirit as previous commits, this function needs to know the
temporary pack prefix, which it currently accesses through the static
"packtmp" variable within builtin/repack.c.
Pass it explicitly as a function parameter to facilitate moving this
function out of builtin/repack.c entirely.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Once all new packs are known to exist, 'repack' installs their contents
from their temporary location into their permanent one. This is a
semi-involved procedure for each pack, since for each extension (e.g.,
".idx", ".pack", ".mtimes", and so on) we have to either:
- adjust the filemode of the temporary file before renaming it into
place, or
- die() if we are missing a non-optional extension, or
- unlink() any existing file for extensions that we did not generate
(e.g., if a non-cruft pack we generated was identical to, say, a
cruft pack which existed at the beginning of the process, we have to
remove the ".mtimes" file).
Extract this procedure into its own function, and call it
"generated_pack_install"(). This will set us up for pulling this
function out of the builtin entirely and making it part of the repack.h
API, which will be done in a future commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The name "generated_pack_data" is somewhat redundant, since the contents
of the struct *is* the data associated with the generated pack.
Rename the structure to just "generated_pack", resulting in less awkward
function names, like "generated_pack_has_ext()" which is preferable to
"generated_pack_data_has_ext()".
Rename a few related functions to align with the convention that
functions to do with a struct "S" should be prefixed with "S_".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The repack builtin defines an API for keeping track of which packs
were found in the repository at the beginning of the repack operation.
This is used to classify what state a pack was in (kept, non-kept, or
cruft), and is also used to mark which packs to delete (or keep) at the
end of a repack operation.
Now that the prerequisite refactoring is complete, this API is isolated
enough that it can be moved out to repack.[ch] and removed from the
builtin entirely.
As a result, some of its functions become static within repack.c,
cleaning up the visible API.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are a couple of spots that cause warnings within the
existing_packs API without DISABLE_SIGN_COMPARE_WARNINGS under
DEVELOPER=1 mode.
In both cases, we have int values that are being compared against size_t
ones. Neither of these two cases are incorrect, and the cast is
completely OK in practice. But both are unnecessary, since:
- in existing_packs_mark_for_deletion_1(), 'hexsz' should be defined as
a size_t anyway, since algop->hexsz is.
- in existing_packs_collect(), 'i' should be defined as a size_t since
it is counting up to the value of a string_list's 'nr' field.
(This patch is a little bit of noise, but I would rather see us squelch
these warnings ahead of moving the existing_packs API into a separate
compilation unit to avoid having to define DISABLE_SIGN_COMPARE_WARNINGS
in repack.c.)
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
builtin/repack.c defines a static "packdir" to instruct pack-objects on
where to write any new packfiles. This is also the directory scanned
when removing any packfiles which were made redundant by the latest
repack.
Prepare to move the "existing_packs_remove_redundant" function to its
own compilation unit by passing in this information as a parameter to
that function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Extract "remove_redundant_pack()" as generic repack-related
functionality by moving its implementation to the repack.[ch]
compilation unit.
This is a prerequisite to moving the "existing_packs" API, which is one
of the callers of this function. (The remaining caller in the pack
geometry code will eventually move to its own compilation unit as well,
and will likewise rely on this function.)
While moving it over, prefix the function name with "repack_" to
indicate that it belongs to the repack-subsystem.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Rename many of the 'struct existing_packs'-related functions according
to the convention introduced in and described by 541204aabe
(Documentation: document naming schema for structs and their functions,
2024-07-30).
Note that some functions which operate over an individual entry in the
list of existing packs are prefixed with "existing_pack_" instead of the
plural form.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that the 'prepare_pack_objects' function no longer refers to
external, static variables, move it out to repack.h as generic
functionality.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The static variable 'delta_base_offset' determines whether or not we
pass the "--delta-base-offset" command-line argument when spawning
pack-objects as a child process. Its introduction dates back to when
repack was rewritten in C, all the way back in a1bbc6c017 (repack:
rewrite the shell script in C, 2013-09-15).
'struct pack_objects_args' was introduced much later on in 4571324b99
(builtin/repack.c: allow configuring cruft pack generation, 2022-05-20),
but did not move the 'delta_base_offset' variable.
Since the 'delta_base_offset' is a property of an individual
pack-objects command, re-introduce that variable as a member of 'struct
pack_objects_args', which will enable further code movement in the
subsequent commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A subsequent commit will remove 'delta_base_offset' as a static variable
within builtin/repack.c, and reintroduce it as a member of the 'struct
pack_objects_args'.
As a result, the repack_config callback will need to have both the
cruft- and non-cruft 'struct pack_objects_args's in scope. Introduce a
new 'struct repack_config_ctx' to allow the callee to provide both
pointers to the callback.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Over the years, builtin/repack.c has turned into a grab-bag of
functionality powering the 'git repack' builtin. Among its many
capabilities, it:
- can build and spawn 'git pack-objects' commands, which in turn
generate new packs
- has infrastructure to manage the set of existing packs in a
repository
- has infrastructure to split a sequence of packs into a geometric
progression based on object size
- can manage both generating and combining cruft packs together
- can write new MIDXs
to name a few.
As a result, this builtin has accumulated a lot of code, making adding
new functionality difficult. In the future, 'repack' will learn how to
manage a chain of incremental MIDXs, adding yet more functionality into
the builtin.
As a prerequisite step, let's first move some of the functionality in
the builtin into its own repack.[ch].
This will be done over the course of many steps, since there are many
individual components, some of which will end up in other, yet-to-exist
compilation units of their own. Some of the code movement here is also
non-trivial, so performing it in individual steps will make it easier to
verify.
Let's start by migrating 'struct pack_objects_args' (and the related
corresponding pack_objects_args_release() function) into repack.h, and
teach both the Makefile and Meson how to build the new compilation unit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In previous commits, we started passing either repository or
git_hash_algo pointers around to various spots within builtin/repack.c
to reduce our dependency on the_repository in the hope of undef'ing
USE_THE_REPOSITORY_VARIABLE.
This commit takes us as far as we can (easily) go in that direction by
removing the only use of a convenience function that only exists when
USE_THE_REPOSITORY_VARIABLE is defined.
Unfortunately, the only other such function is "is_bare_repository()",
which is less than straightforward to convert into, say,
"repo_is_bare()", the latter of the two accepting a repository pointer.
Punt on that for now, and declare this commit as the stopping point for
our efforts in the direction of undef'ing USE_THE_REPOSITORY_VARIABLE.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a similar spirit as previous commits, avoid referring directly to
"the_hash_algo" in builtin/repack.c::finish_pack_objects_cmd() and
instead accept one as a parameter to the function.
Since this function has a number of callers throughout the builtin, the
diff is a little noisier than previous commits. However, each hunk is
limited to passing the hash_algo parameter from a repository pointer
that is already in scope.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a similar spirit as the previous commits, avoid referring directly to
"the_hash_algo" within builtin/repack.c::repack_promisor_objects().
Since there is already a repository pointer in scope, use its hash_algo
value instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a similar spirit as the previous commit, avoid referring directly to
"the_hash_algo" within builtin/repack.c::write_oid().
Unlike the previous commit, we are within a callback function, so must
introduce a new struct to pass additional data through its "data"
pointer.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "mark_packs_for_deletion_1" function uses "the_hash_algo->hexsz" to
isolate a pack's checksum before deleting it to avoid deleting a newly
written pack having the same checksum (that is, some generated pack
wound up identical to an existing pack).
Avoid this by passing down a "struct git_hash_algo" pointer, and refer to
the hash algorithm through it instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Pass a "struct repository" pointer to the 'repack_promisor_objects()'
function to avoid using "the_repository".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The 'remove_redundant_pack()' function uses "the_repository" to obtain,
and optionally remove, the repository's MIDX. Instead of relying on
"the_repository", pass around a "struct repository *" parameter through
its callers, and use that instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Avoid using "the_repository" in various MIDX-related ref snapshotting
functions.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are a number of spots within builtin/repack.c which refer to
"the_repository", and either make use of the "existing packs" API
or otherwise have a 'struct existing_packs *' in scope.
Add a "repo" member to "struct existing_packs" and use that instead of
"the_repository" in such locations.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reduce builtin/repack.c's reliance on `the_repository` by using the
currently-UNUSED "repo" parameter within cmd_repack().
The following commits will continue to reduce the usage of
the_repository in other places within builtin/repack.c.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `get_all_packs()` function prepares the packfile store and then
returns its packfiles. Refactor it to accept a packfile store instead of
a repository to clarify its scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `reprepare_packed_git()` we perform a couple of operations:
- We reload alternate object directories.
- We clear the loose object cache.
- We reprepare packfiles.
While the logic is hosted in "packfile.c", it clearly reaches into other
subsystems that aren't related to packfiles.
Split up the responsibility and introduce `odb_reprepare()` which now
becomes responsible for repreparing the whole object database. The
existing `reprepare_packed_git()` function is refactored accordingly and
only cares about reloading the packfile store now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/object-store-midx-dedup-info:
midx: compute paths via their source
midx: stop duplicating info redundant with its owning source
midx: write multi-pack indices via their source
midx: load multi-pack indices via their source
midx: drop redundant `struct repository` parameter
odb: simplify calling `link_alt_odb_entry()`
odb: return newly created in-memory sources
odb: consistently use "dir" to refer to alternate's directory
odb: allow `odb_find_source()` to fail
odb: store locality in object database sources
|
|
Multi-pack indices store some information that is redundant with their
owning source:
- The locality bit that tracks whether the source is the primary
object source or an alternate.
- The object directory path the multi-pack index is located in.
- The pointer to the owning parent directory.
All of this information is already contained in `struct odb_source`. So
now that we always have that struct available when loading a multi-pack
index we have it readily accessible.
Drop the redundant information and instead store a pointer to the object
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Similar to the preceding commit, refactor the writing side of multi-pack
indices so that we pass in the object database source where the index
should be written to.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The config API had a set of convenience wrapper functions that
implicitly use the_repository instance; they have been removed and
inlined at the calling sites.
* ps/config-wo-the-repository: (21 commits)
config: fix sign comparison warnings
config: move Git config parsing into "environment.c"
config: remove unused `the_repository` wrappers
config: drop `git_config_set_multivar()` wrapper
config: drop `git_config_get_multivar_gently()` wrapper
config: drop `git_config_set_multivar_in_file_gently()` wrapper
config: drop `git_config_set_in_file_gently()` wrapper
config: drop `git_config_set()` wrapper
config: drop `git_config_set_gently()` wrapper
config: drop `git_config_set_in_file()` wrapper
config: drop `git_config_get_bool()` wrapper
config: drop `git_config_get_ulong()` wrapper
config: drop `git_config_get_int()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string_multi()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get()` wrapper
config: drop `git_config_clear()` wrapper
...
|
|
Redefine where the multi-pack-index sits in the object subsystem,
which recently was restructured to allow multiple backends that
support a single object source that belongs to one repository. A
midx does span mulitple "object sources".
* ps/object-store-midx:
midx: remove now-unused linked list of multi-pack indices
packfile: stop using linked MIDX list in `get_all_packs()`
packfile: stop using linked MIDX list in `find_pack_entry()`
packfile: refactor `get_multi_pack_index()` to work on sources
midx: stop using linked list when closing MIDX
packfile: refactor `prepare_packed_git_one()` to work on sources
midx: start tracking per object database source
|
|
* ps/object-store-midx:
midx: remove now-unused linked list of multi-pack indices
packfile: stop using linked MIDX list in `get_all_packs()`
packfile: stop using linked MIDX list in `find_pack_entry()`
packfile: refactor `get_multi_pack_index()` to work on sources
midx: stop using linked list when closing MIDX
packfile: refactor `prepare_packed_git_one()` to work on sources
midx: start tracking per object database source
|
|
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config()`. All callsites
are adjusted so that they use `repo_config(the_repository, ...)`
instead. While some callsites might already have a repository available,
this mechanical conversion is the exact same as the current situation
and thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"pack-objects" has been taught to avoid pointing into objects in
cruft packs from midx.
* tb/midx-avoid-cruft-packs:
repack: exclude cruft pack(s) from the MIDX where possible
pack-objects: introduce '--stdin-packs=follow'
pack-objects: swap 'show_{object,commit}_pack_hint'
pack-objects: fix typo in 'show_object_pack_hint()'
pack-objects: perform name-hash traversal for unpacked objects
pack-objects: declare 'rev_info' for '--stdin-packs' earlier
pack-objects: factor out handling '--stdin-packs'
pack-objects: limit scope in 'add_object_entry_from_pack()'
pack-objects: use standard option incompatibility functions
|
|
Code clean-up around object access API.
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
|
|
The function `get_multi_pack_index()` loads multi-pack indices via
`prepare_packed_git()` and then returns the linked list of multi-pack
indices that is stored in `struct object_database`. That list is in the
process of being removed though in favor of storing the MIDX as part of
the object database source it belongs to.
Refactor `get_multi_pack_index()` so that it returns the multi-pack
index for a single object source. Callers are now expected to call this
function for each source they are interested in. This requires them to
iterate through alternates, so we have to prepare alternate object
sources before doing so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* tb/midx-avoid-cruft-packs:
repack: exclude cruft pack(s) from the MIDX where possible
pack-objects: introduce '--stdin-packs=follow'
pack-objects: swap 'show_{object,commit}_pack_hint'
pack-objects: fix typo in 'show_object_pack_hint()'
pack-objects: perform name-hash traversal for unpacked objects
pack-objects: declare 'rev_info' for '--stdin-packs' earlier
pack-objects: factor out handling '--stdin-packs'
pack-objects: limit scope in 'add_object_entry_from_pack()'
pack-objects: use standard option incompatibility functions
|
|
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
|
|
The 'extensions.preciousObjects' setting when set true, prevents
operations that might drop objects from the object storage. This setting
is populated in the global variable
'repository_format_precious_objects'.
Move this global variable to repo scope by adding it to 'struct
repository and also refactor all the occurences accordingly.
This change is part of an ongoing effort to eliminate global variables,
improve modularity and help libify the codebase.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The functions to manage alternates all depend on `the_repository`.
Refactor them to accept an object database as a parameter and adjust all
callers. The functions are renamed accordingly.
Note that right now the situation is still somewhat weird because we end
up using the object store path provided by the object store's repository
anyway. Consequently, we could have instead passed in a pointer to the
repository instead of passing in the pointer to the object store. This
will be addressed in subsequent commits though, where we will start to
use the path owned by the object store itself.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.
While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.
To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying
[...] it's desirable to avoid including cruft packs in the MIDX
because it causes the MIDX to store a bunch of objects which are
likely to get thrown away.
, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.
This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.
This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.
Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.
If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.
But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.
(One exception here is when "starting from scratch" results in a noop
repack, e.g., because the non-cruft pack(s) in a repository already form
a geometric progression. Since we can't tell whether or not those were
generated with '--stdin-packs=follow', they may depend on
once-unreachable objects, so we have to include the cruft pack in the
MIDX in this case.)
Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git pack-objects" learns to find delta bases from blobs at the
same path, using the --path-walk API.
* ds/path-walk-2:
pack-objects: allow --shallow and --path-walk
path-walk: add new 'edge_aggressive' option
pack-objects: thread the path-based compression
pack-objects: refactor path-walk delta phase
scalar: enable path-walk during push via config
pack-objects: enable --path-walk via config
repack: add --path-walk option
t5538: add tests to confirm deltas in shallow pushes
pack-objects: introduce GIT_TEST_PACK_PATH_WALK
p5313: add performance tests for --path-walk
pack-objects: update usage to match docs
pack-objects: add --path-walk option
pack-objects: extract should_attempt_deltas()
|
|
Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.
Add the --path-walk option to the performance tests in p5313.
For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the --path-walk tests in p5313 look like this:
Test this tree
-------------------------------------------------------------------------
5313.18: thin pack with --path-walk 0.08(0.06+0.02)
5313.19: thin pack size with --path-walk 18.4K
5313.20: big pack with --path-walk 2.10(7.80+0.26)
5313.21: big pack size with --path-walk 19.8M
5313.22: shallow fetch pack with --path-walk 1.62(3.38+0.17)
5313.23: shallow pack size with --path-walk 33.6M
5313.24: repack with --path-walk 81.29(96.08+0.71)
5313.25: repack size with --path-walk 142.5M
[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08
Along with the earlier tests in p5313, I'll instead reformat the
comparison as follows:
Repack Method Pack Size Time
---------------------------------------
Hash v1 439.4M 87.24s
Hash v2 161.7M 21.51s
Path Walk 142.5M 81.29s
There are a few things to notice here:
1. The benefits of --name-hash-version=2 over --name-hash-version=1 are
significant, but --path-walk still compresses better than that
option.
2. The --path-walk command is still using --name-hash-version=1 for the
second pass of delta computation, using the increased name hash
collisions as a potential method for opportunistic compression on
top of the path-focused compression.
3. The --path-walk algorithm is currently sequential and does not use
multiple threads for delta compression. Threading will be
implemented in a future change so the computation time will improve
to better compete in this metric.
There are small benefits in size for my copy of the Git repository:
Repack Method Pack Size Time
---------------------------------------
Hash v1 248.8M 30.44s
Hash v2 249.0M 30.15s
Path Walk 213.2M 142.50s
As well as in the nodejs/node repository [3]:
Repack Method Pack Size Time
---------------------------------------
Hash v1 739.9M 71.18s
Hash v2 764.6M 67.82s
Path Walk 698.1M 208.10s
[3] https://github.com/nodejs/node
This benefit also repeats in my copy of the Linux kernel repository:
Repack Method Pack Size Time
---------------------------------------
Hash v1 2.5G 554.41s
Hash v2 2.5G 549.62s
Path Walk 2.2G 1562.36s
It is important to see that even when the repository shape does not have
many name-hash collisions, there is a slight space boost to be found
using this method.
As this repacking strategy was released in Git for Windows 2.47.0, some
users have reported cases where the --path-walk compression is slightly
worse than the --name-hash-version=2 option. In those cases, it may be
beneficial to combine the two options. However, there has not been a
released version of Git that has both options and I don't have access to
these repos for testing.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update parse-options API to catch mistakes to pass address of an
integral variable of a wrong type/size.
* ps/parse-options-integers:
parse-options: detect mismatches in integer signedness
parse-options: introduce precision handling for `OPTION_UNSIGNED`
parse-options: introduce precision handling for `OPTION_INTEGER`
parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`
parse-options: support unit factors in `OPT_INTEGER()`
global: use designated initializers for options
parse: fix off-by-one for minimum signed values
|
|
With the preceding commit, `OPT_INTEGER()` has learned to support unit
factors. Consequently, the major differencen between `OPT_INTEGER()` and
`OPT_MAGNITUDE()` isn't the support of unit factors anymore, as both of
them do support them now. Instead, the difference is that one handles
signed and the other handles unsigned integers.
Adapt the name of `OPT_MAGNITUDE()` accordingly by renaming it to
`OPT_UNSIGNED()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.
Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.
This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.
The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).
But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.
There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.
This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.
But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).
So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.
This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.
The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.
Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:
if (args->max_pack_size && !cruft_expiration) {
collapse_small_cruft_packs(in, args->max_pack_size, existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "-%s.pack\n", item->string);
}
This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.
Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).
Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git repack --keep-unreachable" to send unreachable objects to the
main pack "git repack -ad" produces did not work when there is no
existing packs, which has been corrected.
* ps/repack-keep-unreachable-in-unpacked-repo:
builtin/repack: fix `--keep-unreachable` when there are no packs
|
|
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
|
|
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.
The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:
- `has_existing_non_kept_packs()` checks whether there are existing
packfiles. This condition makes sense to guard "--keep-pack=",
"--unpack-unreachable" and "--keep-unreachable", because all of
these flags only make sense in combination with existing packfiles.
But it does not make sense to disable `--pack-loose-unreachable`
when there aren't any preexisting packfiles, as loose objects can be
packed into the new packfile regardless of that.
- `delete_redundant` checks whether we want to delete any objects or
packs that are about to become redundant. The documentation of
`--keep-unreachable` explicitly says that `git repack -ad` needs to
be executed for the flag to have an effect.
It is not immediately obvious why such redundant objects need to be
deleted in order for "--pack-unreachable-objects" to be effective.
But as things are working as documented this is nothing we'll change
for now.
- `pack_everything & PACK_CRUFT` checks that we're not creating a
cruft pack. This condition makes sense in the context of
"--pack-loose-unreachable", as unreachable objects would end up in
the cruft pack anyway.
So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.
Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The new '--name-hash-version' option for 'git repack' is a simple
pass-through to the underlying 'git pack-objects' subcommand. However,
this subcommand may have other options and a temporary filename as part
of the subcommand execution that may not be predictable or could change
over time.
The existing test_subcommand method requires an exact list of arguments
for the subcommand. This is too rigid for our needs here, so create a
new method, test_subcommand_flex. Use it to check that the
--name-hash-version option is passing through.
Since we are modifying the 'git repack' command, let's bring its usage
in line with the Documentation's synopsis. This removes it from the
allow list in t0450 so it will remain in sync in the future.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
|
|
Stop using `the_repository` in the "server-info" subsystem by passing in
a repository when updating server info and storing the repository in the
`update_info_ctx` structure to make it accessible to other functions.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
|
|
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a previous commit, we passed the repository field to all
subcommands in the `builtin/` directory. Utilize this to pass the
repository field down to the `write_midx_file[_only]` functions to
remove the usage of `the_repository` global variables.
With this, all usage of global variables in `midx-write.c` is removed,
hence, remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function `for_each_packed_object` currently relies on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from this
function and closely related function `is_promisor_object`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More leak-fixes.
* ps/leakfixes-part-7: (23 commits)
diffcore-break: fix leaking filespecs when merging broken pairs
revision: fix leaking parents when simplifying commits
builtin/maintenance: fix leak in `get_schedule_cmd()`
builtin/maintenance: fix leaking config string
promisor-remote: fix leaking partial clone filter
grep: fix leaking grep pattern
submodule: fix leaking submodule ODB paths
trace2: destroy context stored in thread-local storage
builtin/difftool: plug several trivial memory leaks
builtin/repack: fix leaking configuration
diffcore-order: fix leaking buffer when parsing orderfiles
parse-options: free previous value of `OPTION_FILENAME`
diff: fix leaking orderfile option
builtin/pull: fix leaking "ff" option
dir: fix off by one errors for ignored and untracked entries
builtin/submodule--helper: fix leaking remote ref on errors
t/helper: fix leaking subrepo in nested submodule config helper
builtin/submodule--helper: fix leaking error buffer
builtin/submodule--helper: clear child process when not running it
submodule: fix leaking update strategy
...
|
|
When repacking, we assemble git-pack-objects(1) arguments both for the
"normal" pack and for the cruft pack. This configuration gets populated
with a bunch of `OPT_PASSTHRU` options that we end up passing to the
child process. These options are allocated, but never free'd.
Create a new `pack_objects_args_release()` function that releases the
memory for us and call it for both sets of options.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The convention to calling into built-in command implementation has
been updated to pass the repository, if known, together with the
prefix value.
* jc/pass-repo-to-builtins:
add: pass in repo variable instead of global the_repository
builtin: remove USE_THE_REPOSITORY for those without the_repository
builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h
builtin: add a repository parameter for builtin functions
|
|
Code clean-up.
* ps/environ-wo-the-repository: (21 commits)
environment: stop storing "core.notesRef" globally
environment: stop storing "core.warnAmbiguousRefs" globally
environment: stop storing "core.preferSymlinkRefs" globally
environment: stop storing "core.logAllRefUpdates" globally
refs: stop modifying global `log_all_ref_updates` variable
branch: stop modifying `log_all_ref_updates` variable
repo-settings: track defaults close to `struct repo_settings`
repo-settings: split out declarations into a standalone header
environment: guard state depending on a repository
environment: reorder header to split out `the_repository`-free section
environment: move `set_git_dir()` and related into setup layer
environment: make `get_git_namespace()` self-contained
environment: move object database functions into object layer
config: make dependency on repo in `read_early_config()` explicit
config: document `read_early_config()` and `read_very_early_config()`
environment: make `get_git_work_tree()` accept a repository
environment: make `get_graft_file()` accept a repository
environment: make `get_index_file()` accept a repository
environment: make `get_object_directory()` accept a repository
environment: make `get_git_common_dir()` accept a repository
...
|
|
More leakfixes.
* ps/leakfixes-part-6: (22 commits)
builtin/repack: fix leaking keep-pack list
merge-ort: fix two leaks when handling directory rename modifications
match-trees: fix leaking prefixes in `shift_tree()`
builtin/fmt-merge-msg: fix leaking buffers
builtin/grep: fix leaking object context
builtin/pack-objects: plug leaking list of keep-packs
builtin/repack: fix leaking line buffer when packing promisors
negotiator/skipping: fix leaking commit entries
shallow: fix leaking members of `struct shallow_info`
shallow: free grafts when unregistering them
object: clear grafts when clearing parsed object pool
gpg-interface: fix misdesigned signing key interfaces
send-pack: fix leaking push cert nonce
remote: fix leak in reachability check of a remote-tracking ref
remote: fix leaking tracking refs
builtin/submodule--helper: fix leaking refs on push-check
submodule: fix leaking fetch task data
upload-pack: fix leaking child process data on reachability checks
builtin/push: fix leaking refspec query result
send-pack: fix leaking common object IDs
...
|
|
* ps/leakfixes-part-6: (22 commits)
builtin/repack: fix leaking keep-pack list
merge-ort: fix two leaks when handling directory rename modifications
match-trees: fix leaking prefixes in `shift_tree()`
builtin/fmt-merge-msg: fix leaking buffers
builtin/grep: fix leaking object context
builtin/pack-objects: plug leaking list of keep-packs
builtin/repack: fix leaking line buffer when packing promisors
negotiator/skipping: fix leaking commit entries
shallow: fix leaking members of `struct shallow_info`
shallow: free grafts when unregistering them
object: clear grafts when clearing parsed object pool
gpg-interface: fix misdesigned signing key interfaces
send-pack: fix leaking push cert nonce
remote: fix leak in reachability check of a remote-tracking ref
remote: fix leaking tracking refs
builtin/submodule--helper: fix leaking refs on push-check
submodule: fix leaking fetch task data
upload-pack: fix leaking child process data on reachability checks
builtin/push: fix leaking refspec query result
send-pack: fix leaking common object IDs
...
|
|
Instead of including USE_THE_REPOSITORY_VARIABLE by default on every
builtin, remove it from builtin.h and add it to all the builtins that
include builtin.h (by definition, that means all builtins/*.c).
Also, remove the include statement for repository.h since it gets
brought in through builtin.h.
The next step will be to migrate each builtin
from having to use the_repository.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In order to reduce the usage of the global the_repository, add a
parameter to builtin functions that will get passed a repository
variable.
This commit uses UNUSED on most of the builtin functions, as subsequent
commits will modify the actual builtins to pass the repository parameter
down.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `get_object_directory()` function retrieves the path to the object
directory for `the_repository`. Make it accept a `struct repository`
such that it can work on arbitrary repositories and make it part of the
repository subsystem. This reduces our reliance on `the_repository` and
clarifies scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The list of packs to keep is populated via a command line option but
never free'd. Plug this memory leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `repack_promisor_objects()` we read output from git-pack-objects(1)
line by line, using `strbuf_getline_lf()`. We never free the line
buffer, causing a memory leak. Plug it.
This leak is being hit in t5616, but plugging it alone is not
sufficient to make the whole test suite leak free.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When writing an MIDX in git-repack(1) we first collect all the pack
names that we want to add to it in a string list. This list is marked as
`NODUP`, which indicates that it will neither duplicate nor own strings
added to it. In `write_midx_included_packs()` we then `insert()` strings
via `xstrdup()` or `strbuf_detach()`, but the resulting strings will not
be owned by anything and thus leak.
Fix this issue by marking the list as `DUP` and using a local buffer to
compute the pack names.
This leak is hit in t5319, but plugging it is not sufficient to make the
whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Incremental updates of multi-pack index files.
* tb/incremental-midx-part-1:
midx: implement support for writing incremental MIDX chains
t/t5313-pack-bounds-checks.sh: prepare for sub-directories
t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
midx: implement verification support for incremental MIDXs
midx: support reading incremental MIDX chains
midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
midx: teach `midx_preferred_pack()` about incremental MIDXs
midx: teach `midx_contains_pack()` about incremental MIDXs
midx: remove unused `midx_locate_pack()`
midx: teach `fill_midx_entry()` about incremental MIDXs
midx: teach `nth_midxed_offset()` about incremental MIDXs
midx: teach `bsearch_midx()` about incremental MIDXs
midx: introduce `bsearch_one_midx()`
midx: teach `nth_bitmapped_pack()` about incremental MIDXs
midx: teach `nth_midxed_object_oid()` about incremental MIDXs
midx: teach `prepare_midx_pack()` about incremental MIDXs
midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
midx: add new fields for incremental MIDX chains
Documentation: describe incremental MIDX format
|
|
Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that the rest of the MIDX subsystem and relevant callers have been
updated to learn about how to read and process incremental MIDX chains,
let's finally update the implementation in `write_midx_internal()` to be
able to write incremental MIDX chains.
This new feature is available behind the `--incremental` option for the
`multi-pack-index` builtin, like so:
$ git multi-pack-index write --incremental
The implementation for doing so is relatively straightforward, and boils
down to a handful of different kinds of changes implemented in this
patch:
- The `compute_sorted_entries()` function is taught to reject objects
which appear in any existing MIDX layer.
- Functions like `write_midx_revindex()` are adjusted to write
pack_order values which are offset by the number of objects in the
base MIDX layer.
- The end of `write_midx_internal()` is adjusted to move
non-incremental MIDX files when necessary (i.e. when creating an
incremental chain with an existing non-incremental MIDX in the
repository).
There are a handful of other changes that are introduced, like new
functions to clear incremental MIDX files that are unrelated to the
current chain (using the same "keep_hash" mechanism as in the
non-incremental case).
The tests explicitly exercising the new incremental MIDX feature are
relatively limited for two reasons:
1. Most of the "interesting" behavior is already thoroughly covered in
t5319-multi-pack-index.sh, which handles the core logic of reading
objects through a MIDX.
The new tests in t5334-incremental-multi-pack-index.sh are mostly
focused on creating and destroying incremental MIDXs, as well as
stitching their results together across layers.
2. A new GIT_TEST environment variable is added called
"GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
entire test suite to write incremental MIDXs after repacking when
combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.
This exercises the long tail of other interesting behavior that is
defined implicitly throughout the rest of the CI suite. It is
likewise added to the linux-TEST-vars job.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Two years ago, commit ff1e653c8e2 (midx: respect
'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP', 2021-08-31) introduced a new
environment variable which caused the test suite to write MIDX bitmaps
after any 'git repack' invocation.
At the time, this was done to help flush out any bugs with MIDX bitmaps
that weren't explicitly covered in the t5326-multi-pack-bitmap.sh
script.
Two years later, that flag has served us well and is no longer providing
meaningful coverage, as the script in t5326 has matured substantially
and covers many more interesting cases than it did back when ff1e653c8e2
was originally written.
Remove the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment variable
as it is no longer serving a useful purpose. More importantly, removing
this variable clears the way for us to introduce a new one to help
similarly flush out bugs related to incremental MIDX chains.
Because these incremental MIDX chains are (for now) incompatible with
MIDX bitmaps, we cannot have both.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Leakfixes.
* ps/leakfixes:
builtin/mv: fix leaks for submodule gitfile paths
builtin/mv: refactor to use `struct strvec`
builtin/mv duplicate string list memory
builtin/mv: refactor `add_slash()` to always return allocated strings
strvec: add functions to replace and remove strings
submodule: fix leaking memory for submodule entries
commit-reach: fix memory leak in `ahead_behind()`
builtin/credential: clear credential before exit
config: plug various memory leaks
config: clarify memory ownership in `git_config_string()`
builtin/log: stop using globals for format config
builtin/log: stop using globals for log config
convert: refactor code to clarify ownership of check_roundtrip_encoding
diff: refactor code to clarify memory ownership of prefixes
config: clarify memory ownership in `git_config_pathname()`
http: refactor code to clarify memory ownership
checkout: clarify memory ownership in `unique_tracking_name()`
strbuf: fix leak when `appendwholeline()` fails with EOF
transport-helper: fix leaking helper name
|
|
The out parameter of `git_config_string()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Both `peel_object()` and `peel_iterated_oid()` implicitly rely on
`the_repository` to look up objects. Despite the fact that we want to
get rid of `the_repository`, it also leads to some restrictions in our
ref iterators when trying to retrieve the peeled value for a repository
other than `the_repository`.
Refactor these functions such that both take a repository as argument
and remove the now-unnecessary restrictions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the rules that rewrite callers of "refs" interfaces to explicitly
pass `struct ref_store`. The resulting patch has been applied with the
`--whitespace=fix` option.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When "git repack" repacks promisor objects, it starts a pack-objects
subprocess and uses xwrite() to send object names over the pipe to
it, but without any error checking. An I/O error or short write
(even though a short write is unlikely for such a small amount of
data) can result in a packfile that lacks certain objects we wanted
to put in there, leading to a silent repository corruption.
Use write_in_full(), instead of xwrite(), to mitigate short write
risks, check errors from it, and abort if we see a failure.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove unused header "#include".
* en/header-cleanup:
treewide: remove unnecessary includes in source files
treewide: add direct includes currently only pulled in transitively
trace2/tr2_tls.h: remove unnecessary include
submodule-config.h: remove unnecessary include
pkt-line.h: remove unnecessary include
line-log.h: remove unnecessary include
http.h: remove unnecessary include
fsmonitor--daemon.h: remove unnecessary includes
blame.h: remove unnecessary includes
archive.h: remove unnecessary include
treewide: remove unnecessary includes in source files
treewide: remove unnecessary includes from header files
|
|
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The repack option --keep-unreachable is incompatible with -A, --cruft is
incompatible with -A and -k, and -k is short for --keep-unreachable. So
they are all incompatible with each other.
Use the function for checking three mutually incompatible options,
die_for_incompatible_opt3(), to perform this check in one place and
without repetition. This is shorter and clearer.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git repack" learned "--max-cruft-size" to prevent cruft packs from
growing without bounds.
* tb/repack-max-cruft-size:
repack: free existing_cruft array after use
builtin/repack.c: avoid making cruft packs preferred
builtin/repack.c: implement support for `--max-cruft-size`
builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
t7700: split cruft-related tests to t7704
|
|
We allocate an array of packed_git pointers so that we can sort the list
of cruft packs, but we never free the array, causing a small leak. Note
that we don't need to free the packed_git structs themselves; they're
owned by the repository object.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When doing a `--geometric` repack, we make sure that the preferred pack
(if writing a MIDX) is the largest pack that we *didn't* repack. That
has the effect of keeping the preferred pack in sync with the pack
containing a majority of the repository's reachable objects.
But if the repository happens to double in size, we'll repack
everything. Here we don't specify any `--preferred-pack`, and instead
let the MIDX code choose.
In the past, that worked fine, since there would only be one pack to
choose from: the one we just wrote. But it's no longer necessarily the
case that there is one pack to choose from. It's possible that the
repository also has a cruft pack, too.
If the cruft pack happens to come earlier in lexical order (and has an
earlier mtime than any non-cruft pack), we'll pick that pack as
preferred. This makes it impossible to reuse chunks of the reachable
pack verbatim from pack-objects, so is sub-optimal.
Luckily, this is a somewhat rare circumstance to be in, since we would
have to repack the entire repository during a `--geometric` repack, and
the cruft pack would have to sort ahead of the pack we just created.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Cruft packs are an alternative mechanism for storing a collection of
unreachable objects whose mtimes are recent enough to avoid being
pruned out of the repository.
When cruft packs were first introduced back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20) and
a7d493833f (builtin/pack-objects.c: --cruft with expiration,
2022-05-20), the recommended workflow consisted of:
- Repacking periodically, either by packing anything loose in the
repository (via `git repack -d`) or producing a geometric sequence
of packs (via `git repack --geometric=<d> -d`).
- Every so often, splitting the repository into two packs, one cruft
to store the unreachable objects, and another non-cruft pack to
store the reachable objects.
Repositories may (out of band with the above) choose periodically to
prune out some unreachable objects which have aged out of the grace
period by generating a pack with `--cruft-expiration=<approxidate>`.
This allowed repositories to maintain relatively few packs on average,
and quarantine unreachable objects together in a cruft pack, avoiding
the pitfalls of holding unreachable objects as loose while they age out
(for more, see some of the details in 3d89a8c118
(Documentation/technical: add cruft-packs.txt, 2022-05-20)).
This all works, but can be costly from an I/O-perspective when
frequently repacking a repository that has many unreachable objects.
This problem is exacerbated when those unreachable objects are rarely
(if every) pruned.
Since there is at most one cruft pack in the above scheme, each time we
update the cruft pack it must be rewritten from scratch. Because much of
the pack is reused, this is a relatively inexpensive operation from a
CPU-perspective, but is very costly in terms of I/O since we end up
rewriting basically the same pack (plus any new unreachable objects that
have entered the repository since the last time a cruft pack was
generated).
At the time, we decided against implementing more robust support for
multiple cruft packs. This patch implements that support which we were
lacking.
Introduce a new option `--max-cruft-size` which allows repositories to
accumulate cruft packs up to a given size, after which point a new
generation of cruft packs can accumulate until it reaches the maximum
size, and so on. To generate a new cruft pack, the process works like
so:
- Sort a list of any existing cruft packs in ascending order of pack
size.
- Starting from the beginning of the list, group cruft packs together
while the accumulated size is smaller than the maximum specified
pack size.
- Combine the objects in these cruft packs together into a new cruft
pack, along with any other unreachable objects which have since
entered the repository.
Once a cruft pack grows beyond the size specified via `--max-cruft-size`
the pack is effectively frozen. This limits the I/O churn up to a
quadratic function of the value specified by the `--max-cruft-size`
option, instead of behaving quadratically in the number of total
unreachable objects.
When pruning unreachable objects, we bypass the new code paths which
combine small cruft packs together, and instead start from scratch,
passing in the appropriate `--max-pack-size` down to `pack-objects`,
putting it in charge of keeping the resulting set of cruft packs sized
correctly.
This may seem like further I/O churn, but in practice it isn't so bad.
We could prune old cruft packs for whom all or most objects are removed,
and then generate a new cruft pack with just the remaining set of
objects. But this additional complexity buys us relatively little,
because most objects end up being pruned anyway, so the I/O churn is
well contained.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The repack builtin takes a `--max-pack-size` command-line argument which
it uses to feed into any of the pack-objects children that it may spawn
when generating a new pack.
This option is parsed with OPT_STRING, meaning that we'll accept
anything as input, punting on more fine-grained validation until we get
down into pack-objects.
This is fine, but it's wasteful to spend an entire sub-process just to
figure out that one of its option is bogus. Instead, parse the value of
`--max-pack-size` with OPT_MAGNITUDE in 'git repack', and then pass the
known-good result down to pack-objects.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A previous commit has implemented `git repack --filter=<filter-spec>` to
allow users to filter out some objects from the main pack and move them
into a new different pack.
It would be nice if this new different pack could be created in a
different directory than the regular pack. This would make it possible
to move large blobs into a pack on a different kind of storage, for
example cheaper storage.
Even in a different directory, this pack can be accessible if, for
example, the Git alternates mechanism is used to point to it. In fact
not using the Git alternates mechanism can corrupt a repo as the
generated pack containing the filtered objects might not be accessible
from the repo any more. So setting up the Git alternates mechanism
should be done before using this feature if the user wants the repo to
be fully usable while this feature is used.
In some cases, like when a repo has just been cloned or when there is no
other activity in the repo, it's Ok to setup the Git alternates
mechanism afterwards though. It's also Ok to just inspect the generated
packfile containing the filtered objects and then just move it into the
'.git/objects/pack/' directory manually. That's why it's not necessary
for this command to check that the Git alternates mechanism has been
already setup.
While at it, as an example to show that `--filter` and `--filter-to`
work well with other options, let's also add a test to check that these
options work well with `--max-pack-size`.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This new option puts the objects specified by `<filter-spec>` into a
separate packfile.
This could be useful if, for example, some blobs take up a lot of
precious space on fast storage while they are rarely accessed. It could
make sense to move them into a separate cheaper, though slower, storage.
It's possible to find which new packfile contains the filtered out
objects using one of the following:
- `git verify-pack -v ...`,
- `test-tool find-pack ...`, which a previous commit added,
- `--filter-to=<dir>`, which a following commit will add to specify
where the pack containing the filtered out objects will be.
This feature is implemented by running `git pack-objects` twice in a
row. The first command is run with `--filter=<filter-spec>`, using the
specified filter. It packs objects while omitting the objects specified
by the filter. Then another `git pack-objects` command is launched using
`--stdin-packs`. We pass it all the previously existing packs into its
stdin, so that it will pack all the objects in the previously existing
packs. But we also pass into its stdin, the pack created by the previous
`git pack-objects --filter=<filter-spec>` command as well as the kept
packs, all prefixed with '^', so that the objects in these packs will be
omitted from the resulting pack. The result is that only the objects
filtered out by the first `git pack-objects` command are in the pack
resulting from the second `git pack-objects` command.
As the interactions with kept packs are a bit tricky, a few related
tests are added.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Create a new find_pack_prefix() to refactor code that handles finding
the pack prefix from the packtmp and packdir global variables, as we are
going to need this feature again in following commit.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Create a new finish_pack_objects_cmd() to refactor duplicated code
that handles reading the packfile names from the output of a
`git pack-objects` command and putting it into a string_list, as well as
calling finish_command().
While at it, beautify a code comment a bit in the new function.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code to keep track of existing packs in the repository while
repacking has been refactored.
* tb/repack-existing-packs-cleanup:
builtin/repack.c: extract common cruft pack loop
builtin/repack.c: avoid directly inspecting "util"
builtin/repack.c: store existing cruft packs separately
builtin/repack.c: extract `has_existing_non_kept_packs()`
builtin/repack.c: extract redundant pack cleanup for existing packs
builtin/repack.c: extract redundant pack cleanup for --geometric
builtin/repack.c: extract marking packs for deletion
builtin/repack.c: extract structure to store existing packs
|
|
When generating the list of packs to store in a MIDX (when given the
`--write-midx` option), we include any cruft packs both during
--geometric and non-geometric repacks.
But the rules for when we do and don't have to check whether any of
those cruft packs were queued for deletion differ slightly between the
two cases.
But the two can be unified, provided there is a little bit of extra
detail added in the comment to clarify when it is safe to avoid checking
for any pending deletions (and why it is OK to do so even when not
required).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `->util` field corresponding to each string_list_item is used to
track the existence of some pack at the beginning of a repack operation
was originally intended to be used as a bitfield.
This bitfield tracked:
- (1 << 0): whether or not the pack should be deleted
- (1 << 1): whether or not the pack is cruft
The previous commit removed the use of the second bit, but a future
patch (from a different series than this one) will introduce a new use
of it.
So we could stop treating the util pointer as a bitfield and instead
start treating it as if it were a boolean. But this would require some
backtracking when that later patch is applied.
Instead, let's avoid touching the ->util field directly, and instead
introduce convenience functions like:
- pack_mark_for_deletion()
- pack_is_marked_for_deletion()
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When repacking with the `--write-midx` option, we invoke the function
`midx_included_packs()` in order to produce the list of packs we want to
include in the resulting MIDX.
This list is comprised of:
- existing .keep packs
- any pack(s) which were written earlier in the same process
- any unchanged packs when doing a `--geometric` repack
- any cruft packs
Prior to this patch, we stored pre-existing cruft and non-cruft packs
together (provided those packs are non-kept). This meant we needed an
additional bit to indicate which non-kept pack(s) were cruft versus
those that aren't.
But alternatively we can store cruft packs in a separate list, avoiding
the need for this extra bit, and simplifying the code below.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When there is:
- at least one pre-existing packfile (which is not marked as kept),
- repacking with the `-d` flag, and
- not doing a cruft repack
, then we pass a handful of additional options to the inner
`pack-objects` process, like `--unpack-unreachable`,
`--keep-unreachable`, and `--pack-loose-unreachable`, in addition to
marking any packs we just wrote for promisor remotes as kept in-core
(with `--keep-pack`, as opposed to the presence of a ".keep" file on
disk).
Because we store both cruft and non-cruft packs together in the same
`existing.non_kept_packs` list, it suffices to check its `nr` member to
see if it is zero or not.
But a following change will store cruft- and non-cruft packs separately,
meaning this check would break as a result. Prepare for this by
extracting this part of the check into a new helper function called
`has_existing_non_kept_packs()`.
This patch does not introduce any functional changes, but prepares us to
make a more isolated change in a subsequent patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To remove redundant packs at the end of a repacking operation, Git uses
its `remove_redundant_pack()` function in a loop over the set of
pre-existing, non-kept packs.
In a later commit, we will split this list into two, one for
pre-existing cruft pack(s), and another for non-cruft pack(s). Prepare
for this by factoring out the routine to loop over and delete redundant
packs into its own function.
Instead of calling `remove_redundant_pack()` directly, we now will call
`remove_redundant_existing_packs()`, which itself dispatches a call to
`remove_redundant_packs_1()`. Note that the geometric repacking code
will still call `remove_redundant_pack()` directly, but see the previous
commit for more details.
Having `remove_redundant_packs_1()` exist as a separate function may
seem like overkill in this patch. However, a later patch will call
`remove_redundant_packs_1()` once over two separate lists, so this
refactoring sets us up for that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To reduce the complexity of the already quite-long `cmd_repack()`
implementation, extract out the parts responsible for deleting redundant
packs from a geometric repack out into its own sub-routine.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
At the end of a repack (when given `-d`), Git attempts to remove any
packs which have been made "redundant" as a result of the repacking
operation. For example, an all-into-one (`-A` or `-a`) repack makes
every pre-existing pack which is not marked as kept redundant. Geometric
repacks (with `--geometric=<n>`) make any packs which were rolled up
redundant, and so on.
But before deleting the set of packs we think are redundant, we first
check to see whether or not we just wrote a pack which is identical to
any one of the packs we were going to delete. When this is the case, Git
must avoid deleting that pack, since it matches a pack we just wrote
(so deleting it may cause the repository to become corrupt).
Right now we only process the list of non-kept packs in a single pass.
But a future change will split the existing non-kept packs further into
two lists: one for cruft packs, and another for non-cruft packs.
Factor out this routine to prepare for calling it twice on two separate
lists in a future patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The repack machinery needs to keep track of which packfiles were present
in the repository at the beginning of a repack, segmented by whether or
not each pack is marked as kept.
The names of these packs are stored in two `string_list`s, corresponding
to kept- and non-kept packs, respectively. As a consequence, many
functions within the repack code need to take both `string_list`s as
arguments, leading to code like this:
ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
cruft_expiration, &names,
&existing_nonkept_packs, /* <- */
&existing_kept_packs); /* <- */
Wrap up this pair of `string_list`s into a single structure that stores
both. This saves us from having to pass both string lists separately,
and prepares for adding additional fields to this structure.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When pack-objects learned the `--cruft` option back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20), we
explicitly forbade `--cruft` with `--max-pack-size`.
At the time, there was no specific rationale given in the patch for not
supporting the `--max-pack-size` option with `--cruft`. (As best I can
remember, it's because we were trying to push users towards only ever
having a single cruft pack, but I cannot be sure).
However, `--max-pack-size` is flexible enough that it already works with
`--cruft` and can shard unreachable objects across multiple cruft packs,
creating separate ".mtimes" files as appropriate. In fact, the
`--max-pack-size` option worked with `--cruft` as far back as
b757353676!
This is because we overwrite the `written_list`, and pass down the
appropriate length, i.e. the number of objects written in each pack
shard.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `pack_geometry` struct is used to maintain and partition a list of
packfiles into a "frozen" set (to be left alone), and a non-frozen set
(to be combined into a single new pack). In the previous commit, we
removed a leak caused by neglecting to free() the heap allocated space
used to store the structure itself.
But there is no need for this structure to live on the heap anyway.
Instead, let's move it to be stack allocated, eliminating the
possibility of a direct leak like the one addressed in the previous
patch.
The one minor hitch is that we use the NULL-ness of the pack_geometry's
struct pointer to determine whether or not we are performing a geometric
repack with `--geometric=<d>`. But since we only initialize the
pack_geometry structure when the `geometric_factor` is non-zero, we can
use that variable (based on whether or not it is equal to zero) to
determine whether or not we are performing a geometric repack.
There are a couple of spots that have access to a pointer to the
pack_geometry struct, but not the geometric_factor itself. Instead of
passing in an additional variable, let's make the geometric_factor a
field of the pack_geometry struct.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the program is ending, we call clear_pack_geometry() to free any
resources in the pack_geometry struct. But the struct itself is
allocated on the heap, and leak-checkers will complain about the
resulting small leak.
This one was marked by Coverity as a "new" leak, though it has existed
since 0fabafd0b9 (builtin/repack.c: add '--geometric' option,
2021-02-22). This might be because recent unrelated changes in the file
confused it about what is new and what is not. But regardless, it is
worth addressing.
We can fix it easily by free-ing the struct. We'll convert our "clear"
function to "free", since the allocation happens in the matching init()
function (though since there is only one call to each, and the struct is
local to this file, it's mostly academic).
Another option would be to put the struct on the stack rather than the
heap. However, this gets tricky, as we check the pointer against NULL in
several places to decide whether we're in geometric mode.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The recent change to "git repack" made it react less nicely when a
leftover .idx file that no longer has the corresponding .pack file
in the repository, which has been corrected.
* tb/repack-cleanup:
builtin/repack.c: avoid dir traversal in `collect_pack_filenames()`
builtin/repack.c: only repack `.pack`s that exist
|
|
Further shuffling of declarations across header files to streamline
file dependencies.
* cw/compat-util-header-cleanup:
git-compat-util: move alloc macros to git-compat-util.h
treewide: remove unnecessary includes for wrapper.h
kwset: move translation table from ctype
sane-ctype.h: create header for sane-ctype macros
git-compat-util: move wrapper.c funcs to its header
git-compat-util: move strbuf.c funcs to its header
|
|
When repacking, the function `collect_pack_filenames()` is responsible
for collecting the set of existing packs in the repository, and
partitioning them into "kept" (if the pack has a ".keep" file or was
given via `--keep-pack`) and "nonkept" (otherwise) lists.
This function comes from the original C port of git-repack.sh from back
in a1bbc6c0176 (repack: rewrite the shell script in C, 2013-09-15),
where it first appears as `get_non_kept_pack_filenames()`. At the time,
the implementation was a fairly direct translation from the relevant
portion of git-repack.sh, which looped over the results of
find "$PACKDIR" -type f -name '*.pack'
either ignoring the pack as kept, or adding it to the list of existing
packs.
So the choice to directly translate this function in terms of
`readdir()` in a1bbc6c0176 made sense. At the time, it was possible to
refine the C version in terms of packed_git structs, but was never done.
However, manually enumerating a repository's packs via `readdir()` is
confusing and error-prone. It leads to frustrating inconsistencies
between which packs Git considers to be part of a repository (i.e.,
could be found in the list of packs from `get_all_packs()`), and which
packs `collect_pack_filenames()` considers to meet the same criteria.
This bit us in 73320e49ad (builtin/repack.c: only collect fully-formed
packs, 2023-06-07), and again in the previous commit.
Prevent these issues from biting us in the future by implementing the
`collect_pack_filenames()` function by looping over an array of pointers
to `packed_git` structs, ensuring that we use the same criteria to
determine the set of available packs.
One gotcha here is that we have to ignore non-local packs, since the
original version of `collect_pack_filenames()` only looks at the local
pack directory to collect existing packs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 73320e49add (builtin/repack.c: only collect fully-formed packs,
2023-06-07), we switched the check for which packs to collect by
starting at the .idx files and looking for matching .pack files. This
avoids trying to repack pack-files that have not had their pack-indexes
installed yet.
However, it does cause maintenance to halt if we find the (problematic,
but not insurmountable) case of a .idx file without a corresponding
.pack file. In an environment where packfile maintenance is a critical
function, such a hard stop is costly and requires human intervention to
resolve (by deleting the .idx file).
This was not the case before. We successfully repacked through this
scenario until the recent change to scan for .idx files.
Further, if we are actually in a case where objects are missing, we
detect this at a different point during the reachability walk.
In other cases, Git prepares its list of packfiles by scanning .idx
files and then only adds it to the packfile list if the corresponding
.pack file exists. It even does so without a warning! (See
add_packed_git() in packfile.c for details.)
This case is much less likely to occur than the failures seen before
73320e49add. Packfiles are "installed" by writing the .pack file before
the .idx and that process can be interrupted. Packfiles _should_ be
deleted by deleting the .idx first, followed by the .pack file, but
unlink_pack_path() does not do this: it deletes the .pack _first_,
allowing a window where this process could be interrupted. We leave the
consideration of changing this order as a separate concern. Knowing that
this condition is possible from interrupted Git processes and not other
tools lends some weight that Git should be more flexible around this
scenario.
Add a check to see if the .pack file exists before adding it to the list
for repacking. This will stop a number of maintenance failures seen in
production but fixed by deleting the .idx files.
This brings us closer to the case before 73320e49add in that 'git
repack' will not fail when there is an orphaned .idx file, at least, not
due to the way we scan for packfiles. In the case that the .pack file
was erroneously deleted without copies of its objects in other installed
packfiles, then 'git repack' will fail due to the reachable object walk.
This does resolve the case where automated repacks will no longer be
halted on this case. The tests in t7700 show both these successful
scenarios and the case of failing if the .pack was truly required.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reduce reliance on a global state in the config reading API.
* gc/config-context:
config: pass source to config_parser_event_fn_t
config: add kvi.path, use it to evaluate includes
config.c: remove config_reader from configsets
config: pass kvi to die_bad_number()
trace2: plumb config kvi
config.c: pass ctx with CLI config
config: pass ctx with config files
config.c: pass ctx in configsets
config: add ctx arg to config_fn_t
urlmatch.h: use config_fn_t type
config: inline git_color_default_config
|
|
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Header files cleanup.
* en/header-split-cache-h-part-3: (28 commits)
fsmonitor-ll.h: split this header out of fsmonitor.h
hash-ll, hashmap: move oidhash() to hash-ll
object-store-ll.h: split this header out of object-store.h
khash: name the structs that khash declares
merge-ll: rename from ll-merge
git-compat-util.h: remove unneccessary include of wildmatch.h
builtin.h: remove unneccessary includes
list-objects-filter-options.h: remove unneccessary include
diff.h: remove unnecessary include of oidset.h
repository: remove unnecessary include of path.h
log-tree: replace include of revision.h with simple forward declaration
cache.h: remove this no-longer-used header
read-cache*.h: move declarations for read-cache.c functions from cache.h
repository.h: move declaration of the_index from cache.h
merge.h: move declarations for merge.c from cache.h
diff.h: move declaration for global in diff.c from cache.h
preload-index.h: move declarations for preload-index.c from elsewhere
sparse-index.h: move declarations for sparse-index.c from cache.h
name-hash.h: move declarations for name-hash.c from cache.h
run-command.h: move declarations for run-command.c from cache.h
...
|
|
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This also made it clear that several .c files that depended upon path.h
were missing a #include for it; add the missing includes while at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To partition the set of packs based on which ones are "kept" (either
they have a .keep file, or were otherwise marked via the `--keep-pack`
option) and "non-kept" ones (anything else), `git repack` uses its
`collect_pack_filenames()` function.
Ordinarily, we would rely on a convenience function such as
`get_all_packs()` to enumerate and partition the set of packs. But
`collect_pack_filenames()` uses `readdir()` directly to read the
contents of the "$GIT_DIR/objects/pack" directory, and adds each entry
ending in ".pack" to the appropriate list (either kept, or non-kept as
above).
This is subtly racy, since `collect_pack_filenames()` may see a pack
that is not fully staged (i.e., it is missing its ".idx" file).
Ordinarily, this doesn't cause a problem. But it can cause issues when
generating a cruft pack.
This is because `git repack` feeds (among other things) the list of
existing kept packs down to `git pack-objects --cruft` to indicate that
any kept packs will not be removed from the repository (so that the
cruft pack machinery can avoid packing objects that appear in those
packs as cruft).
But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`.
So if a ".pack" file exists (necessary to get that pack to appear to
`collect_pack_filenames()`), but doesn't have a corresponding ".idx"
file (necessary to get that pack to appear via `get_all_packs()`), we'll
complain with:
fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack'
Fix the above by teaching `collect_pack_filenames()` to only collect
packs with their corresponding `*.idx` files in place, indicating that
those packs have been fully staged.
There are a couple of things worth noting:
- Since each entry in the `extra_keep` list (which contains the
`--keep-pack` names) has a `*.pack` suffix, we'll have to swap the
suffix from ".pack" to ".idx", and compare that instead.
- Since we use the the `fname_kept_list` to figure out which packs to
delete (with `git repack -d`), we would have previously deleted a
`*.pack` with no index (since the existince of a ".pack" file is
necessary and sufficient to include that pack in the list of
existing non-kept packs).
Now we will leave it alone (since that pack won't appear in the
list). This is far more correct behavior, since we don't want
to race with a pack being staged. Deleting a partially staged pack
is unlikely, however, since the window of time between staging a
pack and moving its .idx file into place is miniscule.
Note that this window does *not* include the time it takes to
receive and index the pack, since the incoming data goes into
"$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack"
and is thus ignored by collect_pack_filenames().
In the future, this function should probably be rewritten as a callback
to `for_each_file_in_pack_dir()`, but this is the simplest change we
could do in the short-term.
Reported-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More header clean-up.
* en/header-split-cache-h-part-2: (22 commits)
reftable: ensure git-compat-util.h is the first (indirect) include
diff.h: reduce unnecessary includes
object-store.h: reduce unnecessary includes
commit.h: reduce unnecessary includes
fsmonitor: reduce includes of cache.h
cache.h: remove unnecessary headers
treewide: remove cache.h inclusion due to previous changes
cache,tree: move basic name compare functions from read-cache to tree
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
hash-ll.h: split out of hash.h to remove dependency on repository.h
tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
dir.h: move DTYPE defines from cache.h
versioncmp.h: move declarations for versioncmp.c functions from cache.h
ws.h: move declarations for ws.c functions from cache.h
match-trees.h: move declarations for match-trees.c functions from cache.h
pkt-line.h: move declarations for pkt-line.c functions from cache.h
base85.h: move declarations for base85.c functions from cache.h
copy.h: move declarations for copy.c functions from cache.h
server-info.h: move declarations for server-info.c functions from cache.h
packfile.h: move pack_window and pack_entry from cache.h
...
|
|
When "gc" needs to retain unreachable objects, packing them into
cruft packs (instead of exploding them into loose object files) has
been offered as a more efficient option for some time. Now the use
of cruft packs has been made the default and no longer considered
an experimental feature.
* tb/enable-cruft-packs-by-default:
repository.h: drop unused `gc_cruft_packs`
builtin/gc.c: make `gc.cruftPacks` enabled by default
t/t9300-fast-import.sh: prepare for `gc --cruft` by default
t/t6500-gc.sh: add additional test cases
t/t6500-gc.sh: refactor cruft pack tests
t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default
t/t5304-prune.sh: prepare for `gc --cruft` by default
builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
builtin/repack.c: fix incorrect reference to '-C'
pack-write.c: plug a leak in stage_tmp_packfiles()
|
|
Geometric repacking ("git repack --geometric=<n>") in a repository
that borrows from an alternate object database had various corner
case bugs, which have been corrected.
* ps/fix-geom-repack-with-alternates:
repack: disable writing bitmaps when doing a local repack
repack: honor `-l` when calculating pack geometry
t/helper: allow chmtime to print verbosely without modifying mtime
pack-objects: extend test coverage of `--stdin-packs` with alternates
pack-objects: fix error when same packfile is included and excluded
pack-objects: fix error when packing same pack twice
pack-objects: split out `--stdin-packs` tests into separate file
repack: fix generating multi-pack-index with only non-local packs
repack: fix trying to use preferred pack in alternates
midx: fix segfault with no packs and invalid preferred pack
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When cruft packs were originally being developed, `-C` was designated as
the short-form for `--cruft` (as in `git repack -C`).
This was dropped due to confusion with Git's top-level `-C` option
before submitting to the list. But the reference to it in
`--cruft-expiration`'s help text was never updated. Fix that dangling
reference in this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In order to write a bitmap, we need to have full coverage of all objects
that are about to be packed. In the traditional non-multi-pack-index
world this meant we need to do a full repack of all objects into a
single packfile. But in the new multi-pack-index world we can get away
with writing bitmaps when we have multiple packfiles as long as the
multi-pack-index covers all objects.
This is not always the case though. When asked to perform a repack of
local objects, only, then we cannot guarantee to have full coverage of
all objects regardless of whether we do a full repack or a repack with a
multi-pack-index. The end result is that writing the bitmap will fail in
both worlds:
$ git multi-pack-index write --stdin-packs --bitmap <packfiles
warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
error: could not write multi-pack bitmap
Now there are two different ways to fix this. The first one would be to
amend git-multi-pack-index(1) to disable writing bitmaps when we notice
that we don't have full object coverage.
- We don't have enough information in git-multi-pack-index(1) in
order to tell whether the local repository _should_ have full
coverage. Because even when connected to an alternate object
directory, it may be the case that we still have all objects
around in the main object database.
- git-multi-pack-index(1) is quite a low-level tool. Automatically
disabling functionality that it was asked to provide does not feel
like the right thing to do.
We can easily fix it at a higher level in git-repack(1) though. When
asked to only include local objects via `-l` and when connected to an
alternate object directory then we will override the user's ask and
disable writing bitmaps with a warning. This is similar to what we do in
git-pack-objects(1), where we also disable writing bitmaps in case we
omit an object from the pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the user passes `-l` to git-repack(1), then they essentially ask us
to only repack objects part of the local object database while ignoring
any packfiles part of an alternate object database. And we in fact honor
this bit when doing a geometric repack as the resulting packfile will
only ever contain local objects.
What we're missing though is that we don't take locality of packfiles
into account when computing whether the geometric sequence is intact or
not. So even though we would only ever roll up local packfiles anyway,
we could end up trying to repack because of non-local packfiles. This
does not make much sense, and in the worst case it can cause us to try
and do the geometric repack over and over again because we're never able
to restore the geometric sequence.
Fix this bug by honoring whether the user has passed `-l`. If so, we
skip adding any non-local packfiles to the pack geometry.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When writing the multi-pack-index with geometric repacking we will add
all packfiles to the index that are part of the geometric sequence. This
can potentially also include packfiles borrowed from an alternate object
directory. But given that a multi-pack-index can only ever include packs
that are part of the main object database this does not make much sense
whatsoever.
In the edge case where all packfiles are contained in the alternate
object database and the local repository has none itself this bug can
cause us to invoke git-multi-pack-index(1) with only non-local packfiles
that it ultimately cannot find. This causes it to return an error and
thus causes the geometric repack to fail.
Fix the code to skip non-local packfiles.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When doing a geometric repack with multi-pack-indices, then we ask
git-multi-pack-index(1) to use the largest packfile as the preferred
pack. It can happen though that the largest packfile is not part of the
main object database, but instead part of an alternate object database.
The result is that git-multi-pack-index(1) will not be able to find the
preferred pack and print a warning. It then falls back to use the first
packfile that the multi-pack-index shall reference.
Fix this bug by only considering packfiles as preferred pack that are
local. This is the right thing to do given that a multi-pack-index
should never reference packfiles borrowed from an alternate.
While at it, rename the function `get_largest_active_packfile()` to
`get_preferred_pack()` to better document its intent.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.
* en/header-split-cleanup:
csum-file.h: remove unnecessary inclusion of cache.h
write-or-die.h: move declarations for write-or-die.c functions from cache.h
treewide: remove cache.h inclusion due to setup.h changes
setup.h: move declarations for setup.c functions from cache.h
treewide: remove cache.h inclusion due to environment.h changes
environment.h: move declarations for environment.c functions from cache.h
treewide: remove unnecessary includes of cache.h
wrapper.h: move declarations for wrapper.c functions from cache.h
path.h: move function declarations for path.c functions from cache.h
cache.h: remove expand_user_path()
abspath.h: move absolute path functions from cache.h
environment: move comment_line_char from cache.h
treewide: remove unnecessary cache.h inclusion from several sources
treewide: remove unnecessary inclusion of gettext.h
treewide: be explicit about dependence on gettext.h
treewide: remove unnecessary cache.h inclusion from a few headers
|
|
Code clean-up around the use of the_repository.
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
|
|
en/header-split-cache-h
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
|
|
Apply the part of "the_repository.pending.cocci" pertaining to
"promisor-remote.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More work towards -Wunused.
* jk/unused-post-2.39-part2: (21 commits)
help: mark unused parameter in git_unknown_cmd_config()
run_processes_parallel: mark unused callback parameters
userformat_want_item(): mark unused parameter
for_each_commit_graft(): mark unused callback parameter
rewrite_parents(): mark unused callback parameter
fetch-pack: mark unused parameter in callback function
notes: mark unused callback parameters
prio-queue: mark unused parameters in comparison functions
for_each_object: mark unused callback parameters
list-objects: mark unused callback parameters
mark unused parameters in signal handlers
run-command: mark error routine parameters as unused
mark "pointless" data pointers in callbacks
ref-filter: mark unused callback parameters
http-backend: mark unused parameters in virtual functions
http-backend: mark argc/argv unused
object-name: mark unused parameters in disambiguate callbacks
serve: mark unused parameters in virtual functions
serve: use repository pointer to get config
ls-refs: drop config caching
...
|
|
The for_each_{loose,packed}_object interface uses callback functions,
but not every callback needs all of the parameters. Mark the unused ones
to satisfy -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In cmd_repack() when we hit an error, replace "return ret" with "goto
cleanup" to ensure we free the necessary data structures.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Various leak fixes.
* ab/various-leak-fixes:
built-ins: use free() not UNLEAK() if trivial, rm dead code
revert: fix parse_options_concat() leak
cherry-pick: free "struct replay_opts" members
rebase: don't leak on "--abort"
connected.c: free the "struct packed_git"
sequencer.c: fix "opts->strategy" leak in read_strategy_opts()
ls-files: fix a --with-tree memory leak
revision API: call graph_clear() in release_revisions()
unpack-file: fix ancient leak in create_temp_file()
built-ins & libs & helpers: add/move destructors, fix leaks
dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache"
read-cache.c: clear and free "sparse_checkout_patterns"
commit: discard partial cache before (re-)reading it
{reset,merge}: call discard_index() before returning
tests: mark tests as passing with SANITIZE=leak
|
|
Fix various leaks in built-ins, libraries and a test helper here we
were missing a call to strbuf_release(), string_list_clear() etc, or
were calling them after a potential "return".
Comments on individual changes:
- builtin/checkout.c: Fix a memory leak that was introduced in [1]. A
sibling leak introduced in [2] was recently fixed in [3]. As with [3]
we should be using the wt_status_state_free_buffers() API introduced
in [4].
- builtin/repack.c: Fix a leak that's been here since this use of
"strbuf_release()" was added in a1bbc6c0176 (repack: rewrite the shell
script in C, 2013-09-15). We don't use the variable for anything
except this loop, so we can instead free it right afterwards.
- builtin/rev-parse: Fix a leak that's been here since this code was
added in 21d47835386 (Add a parseopt mode to git-rev-parse to bring
parse-options to shell scripts., 2007-11-04).
- builtin/stash.c: Fix a couple of leaks that have been here since
this code was added in d4788af875c (stash: convert create to builtin,
2019-02-25), we strbuf_release()'d only some of the "struct strbuf" we
allocated earlier in the function, let's release all of them.
- ref-filter.c: Fix a leak in 482c1191869 (gpg-interface: improve
interface for parsing tags, 2021-02-11), we don't use the "payload"
variable that we ask parse_signature() to populate for us, so let's
free it.
- t/helper/test-fake-ssh.c: Fix a leak that's been here since this
code was added in 3064d5a38c7 (mingw: fix t5601-clone.sh,
2016-01-27). Let's free the "struct strbuf" as soon as we don't need
it anymore.
1. c45f0f525de (switch: reject if some operation is in progress,
2019-03-29)
2. 2708ce62d21 (branch: sort detached HEAD based on a flag,
2021-01-07)
3. abcac2e19fa (ref-filter.c: fix a leak in get_head_description,
2022-09-25)
4. 962dd7ebc3e (wt-status: introduce wt_status_state_free_buffers(),
2020-09-27).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
|
|
"git repack" learns to send cruft objects out of the way into
packfiles outside the repository.
* tb/repack-expire-to:
builtin/repack.c: implement `--expire-to` for storing pruned objects
builtin/repack.c: write cruft packs to arbitrary locations
builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack`
builtin/repack.c: pass "out" to `prepare_pack_objects`
|
|
The way "git repack" creared temporary files when it received a
signal was prone to deadlocking, which has been corrected.
* jk/repack-tempfile-cleanup:
t7700: annotate cruft-pack failure with ok=sigpipe
repack: drop remove_temporary_files()
repack: use tempfiles for signal cleanup
repack: expand error message for missing pack files
repack: populate extension bits incrementally
repack: convert "names" util bitfield to array
|
|
When creating a multi-pack bitmap, remove per-pack bitmap files
unconditionally as they will never be consulted.
* tb/remove-unused-pack-bitmap:
builtin/repack.c: remove redundant pack-based bitmaps
|
|
When pruning objects with `--cruft`, `git repack` offers some
flexibility when selecting the set of which objects are pruned via the
`--cruft-expiration` option.
This is useful for expiring objects which are older than the grace
period, making races where to-be-pruned objects become reachable and
then ancestors of freshly pushed objects, leaving the repository in a
corrupt state after pruning substantially less likely [1].
But in practice, such races are impossible to avoid entirely, no matter
how long the grace period is. To prevent this race, it is often
advisable to temporarily put a repository into a read-only state. But in
practice, this is not always practical, and so some middle ground would
be nice.
This patch introduces a new option, `--expire-to`, which teaches `git
repack` to write an additional cruft pack containing just the objects
which were pruned from the repository. The caller can specify a
directory outside of the current repository as the destination for this
second cruft pack.
This makes it possible to prune objects from a repository, while still
holding onto a supplemental copy of them outside of the original
repository. Having this copy on-disk makes it substantially easier to
recover objects when the aforementioned race is encountered.
`--expire-to` is implemented in a somewhat convoluted manner, which is
to take advantage of the fact that the first time `write_cruft_pack()`
is called, it adds the name of the cruft pack to the `names` string
list. That means the second time we call `write_cruft_pack()`, objects
in the previously-written cruft pack will be excluded.
As long as the caller ensures that no objects are expired during the
second pass, this is sufficient to generate a cruft pack containing all
objects which don't appear in any of the new packs written by `git
repack`, including the cruft pack. In other words, all of the objects
which are about to be pruned from the repository.
It is important to note that the destination in `--expire-to` does not
necessarily need to be a Git repository (though it can be) Notably, the
expired packs do not contain all ancestors of expired objects. So if the
source repository contains something like:
<unreachable>
/
C1 --- C2
\
refs/heads/master
where C2 is unreachable, but has a parent (C1) which is reachable, and
C2 would be pruned, then the expiry pack will contain only C2, not C1.
[1]: https://lore.kernel.org/git/20190319001829.GL29661@sigill.intra.peff.net/
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the following commit, a new write_cruft_pack() caller will be added
which wants to write a cruft pack to an arbitrary location. Prepare for
this by adding a parameter which controls the destination of the cruft
pack.
For now, provide "packtmp" so that this commit does not change any
behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
`builtin/repack.c`'s `write_cruft_pack()` is used to generate the cruft
pack when `--cruft` is supplied. It uses a static variable
"cruft_expiration" which is filled in by option parsing.
A future patch will add an `--expire-to` option which allows `git
repack` to write a cruft pack containing the pruned objects out to a
separate repository. In order to implement this functionality, some
callers will have to pass a value for `cruft_expiration` different than
the one filled out by option parsing.
Prepare for this by teaching `write_cruft_pack` to take a
"cruft_expiration" parameter, instead of reading a single static
variable.
The (sole) existing caller of `write_cruft_pack()` will pass the value
for "cruft_expiration" filled in by option parsing, retaining existing
behavior. This means that we can make the variable local to
`cmd_repack()`, and eliminate the static declaration.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
`builtin/repack.c`'s `prepare_pack_objects()` is used to prepare a set
of arguments to a `pack-objects` process which will generate a desired
pack.
A future patch will add an `--expire-to` option which allows `git
repack` to write a cruft pack containing the pruned objects out to a
separate repository. Prepare for this by teaching that function to write
packs to an arbitrary location specified by the caller.
All existing callers of `prepare_pack_objects()` will pass `packtmp` for
`out`, retaining the existing behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
After we've successfully finished the repack, we call
remove_temporary_files(), which looks for and removes any files matching
".tmp-$$-pack-*", where $$ is the pid of the current process. But this
is pointless. If we make it this far in the process, we've already
renamed these tempfiles into place, and there is nothing left to delete.
Nor is there a point in trying to call it to clean up when we _aren't_
successful. It's not safe for using in a signal handler, and the
previous commit already handed that job over to the tempfile API.
It might seem like it would be useful to clean up stray .tmp files left
by other invocations of git-repack. But it won't clean those files; it
only matches ones with its pid, and leaves the rest. Fortunately, those
are cleaned up naturally by successive calls to git-repack; we'll
consider .tmp-*.pack the same as normal packfiles, so "repack -ad", etc,
will roll up their contents and eventually delete them.
The one case that could matter is if pack-objects generates an extension
we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current
code will quietly delete such a file, while after this patch we'd leave
it in place. In practice this doesn't happen, and would be indicative of
a bug. Leaving the file as cruft is arguably a better behavior, as it
means somebody is more likely to eventually notice and fix the bug. If
we really wanted to be paranoid, we could scan for and warn about such
files, but that seems like overkill.
There's nothing to test with regard to the removal of this function. It
was doing nothing, so the behavior should be the same. However, we can
verify (and protect) our assumption that "repack -ad" will eventually
remove stray files by adding a test for that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When git-repack exits due to a signal, it tries to clean up by calling
its remove_temporary_files() function, which walks through the packs dir
looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of
the current process).
The biggest problem here is that remove_temporary_files() is not safe to
call in a signal handler. It uses opendir(), which isn't on the POSIX
async-signal-safe list. The details will be platform-specific, but a
likely issue is that it needs to allocate memory; if we receive a signal
while inside malloc(), etc, we'll conflict on the allocator lock and
deadlock with ourselves.
We can fix this by just cleaning up the files directly, without walking
the directory. We already know the complete list of .tmp-* files that
were generated, because we recorded them via populate_pack_exts(). When
we find files there, we can use register_tempfile() to record the
filenames. If we receive a signal, then the tempfile API will clean them
up for us, and it's async-safe and pretty battle-tested.
Note that this is slightly racier than the existing scheme. We don't
record the filenames until pack-objects tells us the hash over stdout.
So during the period between it generating the file and reporting the
hash, we'd fail to clean up. However, that period is very small. During
most of the pack generation process pack-objects is using its own
internal tempfiles. It's only at the very end that it moves them into
the names git-repack expects, and then it immediately reports the name
to us. Given that cleanup like this is best effort (after all, we may
get SIGKILL), this level of race is acceptable.
When we register the tempfiles, we'll record them locally and use the
result to call rename_tempfile(), rather than renaming by hand. This
isn't strictly necessary, as once we've renamed the files they're gone,
and the tempfile API's cleanup unlink() would simply become a pointless
noop. But managing the lifetimes of the tempfile objects is the cleanest
thing to do, and the tempfile pointers naturally fill the same role as
the old booleans.
This patch also fixes another small problem. We only hook signals, and
don't set up an atexit handler. So if we see an error that causes us to
die(), we'll leave the .tmp-* files in place. But since the tempfile API
handles this for us, this is now fixed for free. The new test covers
this by stimulating a failure of pack-objects when generating a cruft
pack. Before this patch, the .tmp-* file for the main pack would have
been left, but now we correctly clean it up.
Two small subtleties on the implementation:
- in the renaming loop, we can stop re-constructing fname_old; we only
use it when we have a tempfile to rename, so we can just ask the
tempfile for its path (which, barring bugs, should be identical)
- when renaming fails, our error message mentions fname_old. But since
a failed rename_tempfile() invalidates the tempfile struct, we'll
lose access to that string. Instead, let's mention the destination
filename, which is what most other callers do.
Reported-by: Jan Pokorný <poki@fnusa.cz>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If pack-objects tells us it generated pack $hash, we expect to find
.tmp-$$-pack-$hash.pack, .idx, .rev, and so on. Some of these files are
optional, but others are not. For the required ones, we'll bail with an
error if any of them is missing.
The error message is just "missing required file", which is a bit vague.
We should be more clear that it is not the user's fault, but rather that
the sub-pgoram we called is not operating as expected. In practice,
nobody should ever see this message, as it would generally only be
caused by a bug in Git.
It probably doesn't make sense to convert this to a BUG(), though, as
there are other (unlikely) possibilities, such as somebody else racily
deleting the files, filesystem errors causing stat() to fail, and so on.
A nice side effect here is that we stop relying on fname_old in this
code path, which will let us deal with it only in the first part of the
conditional.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
After generating the main pack and then any additional cruft packs, we
iterate over the "names" list (which contains hashes of packs generated
by pack-objects), and call populate_pack_exts() for each.
There's one small problem with this. In repack_promisor_objects(), we
may add entries to "names" and call populate_pack_exts() for them.
Calling it again is mostly just wasteful, as we'll stat() the filename
with each possible extension, get the same result, and just overwrite
our bits.
So we could drop the call there, and leave the final loop to populate
all of the bits. But instead, this patch does the reverse: drops the
final loop, and teaches the other two sites to populate the bits as they
add entries.
This makes the code easier to reason about, as you never have to worry
about when the util field is valid; it is always valid for each entry.
It also serves my ulterior purpose: recording the generated filenames as
soon as possible will make it easier for a future patch to use them for
cleaning up from a failed operation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We keep a string_list "names" containing the hashes of packs generated
on our behalf by pack-objects. The util field of each item is treated as
a bitfield that tells us which extensions (.pack, .idx, .rev, etc) are
present for each name.
Let's switch this to allocating a real array. That will give us room in
a future patch to store more data than just a single bit per extension.
And it makes the code a little easier to read, as we avoid casting back
and forth between uintptr_t and a void pointer.
Since the only thing we're storing is an array, we could just allocate
it directly. But instead I've put it into a named struct here. That
further increases readability around the casts, and in particular helps
differentiate us from other string_lists in the same file which use
their util field differently. E.g., the existing_*_packs lists still do
bit-twiddling, but their bits have different meaning than the ones in
"names". This makes it hard to grep around the code to see how the util
fields are used; now you can look for "generated_pack_data".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
`git repack` supports a `--pack-kept-objects` flag which more or less
translates to whether or not we pass `--honor-pack-keep` down to `git
pack-objects` when assembling a new pack.
This behavior has existed since ee34a2bead (repack: add
`repack.packKeptObjects` config var, 2014-03-03). In that commit, the
documentation was extended to say:
[...] Note that we still do not delete `.keep` packs after
`pack-objects` finishes.
Unfortunately, this is not the case when `--pack-kept-objects` is
combined with a `--geometric` repack. When doing a geometric repack, we
include `.keep` packs when enumerating available packs only when
`pack_kept_objects` is set.
So this all works fine when `--no-pack-kept-objects` (or similar) is
given. Kept packs are excluded from the geometric roll-up, so when we go
to delete redundant packs (with `-d`), no `.keep` packs appear "below
the split" in our geometric progression.
But when `--pack-kept-objects` is given, things can go awry. Namely,
when a kept pack is included in the list of packs tracked by the
`pack_geometry` struct *and* part of the pack roll-up, we will delete
the `.keep` pack when we shouldn't.
Note that this *doesn't* result in object corruption, since the `.keep`
pack's objects are still present in the new pack. But the `.keep` pack
itself is removed, which violates our promise from back in ee34a2bead.
But there's more. Because `repack` computes the geometric roll-up
independently from selecting which packs belong in a MIDX (with
`--write-midx`), this can lead to odd behavior. Consider when a `.keep`
pack appears below the geometric split (ie., its objects will be part of
the new pack we generate).
We'll write a MIDX containing the new pack along with the existing
`.keep` pack. But because the `.keep` pack appears below the geometric
split line, we'll (incorrectly) try to remove it. While this doesn't
corrupt the repository, it does cause us to remove the MIDX we just
wrote, since removing that pack would invalidate the new MIDX.
Funny enough, this behavior became far less noticeable after e4d0c11c04
(repack: respect kept objects with '--write-midx -b', 2021-12-20), which
made `pack_kept_objects` be enabled by default only when we were writing
a non-MIDX bitmap.
But e4d0c11c04 didn't resolve this bug, it just made it harder to notice
unless callers explicitly passed `--pack-kept-objects`.
The solution is to avoid trying to remove `.keep` packs during
`--geometric` repacks, even when they appear below the geometric split
line, which is the approach this patch implements.
Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we write a MIDX bitmap after repacking, it is possible that the
repository would be left in a state with both pack- and multi-pack
reachability bitmaps.
This can occur, for instance, if a pack that was kept (either by having
a .keep file, or during a geometric repack in which it is not rolled up)
has a bitmap file, and the repack wrote a multi-pack index and bitmap.
When loading a reachability bitmap for the repository, the multi-pack
one is always preferred, so the pack-based one is redundant. Let's
remove it unconditionally, even if '-d' isn't passed, since there is no
practical reason to keep both around. The patch below does just that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b240347543 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Functions used with for_each_ref(), etc, need to conform to the
each_ref_fn interface. But most of them don't need every parameter;
let's annotate the unused ones to quiet -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Generalize the newly added "unused.cocci" rule to find more than just
"struct strbuf", let's have it find the same unused patterns for
"struct string_list", as well as other code that uses
similar-looking *_{release,clear,free}() and {release,clear,free}_*()
functions.
We're intentionally loose in accepting e.g. a "strbuf_init(&sb)"
followed by a "string_list_clear(&sb, 0)". It's assumed that the
compiler will catch any such invalid code, i.e. that our
constructors/destructors don't take a "void *".
See [1] for example of code that would be covered by the
"get_worktrees()" part of this rule. We'd still need work that the
series is based on (we were passing "worktrees" to a function), but
could now do the change in [1] automatically.
1. https://lore.kernel.org/git/Yq6eJFUPPTv%2Fzc0o@coredump.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some config variables are combinations of multiple words, and we
typically write them in camelCase forms in manpage and translatable
strings. It's not easy to find mismatches for these camelCase config
variables during code reviews, but occasionally they are identified
during localization translations.
To check for mismatched config variables, I introduced a new feature
in the helper program for localization[^1]. The following mismatched
config variables have been identified by running the helper program,
such as "git-po-helper check-pot".
Lowercase in manpage should use camelCase:
* Documentation/config/http.txt: http.pinnedpubkey
Lowercase in translable strings should use camelCase:
* builtin/fast-import.c: pack.indexversion
* builtin/gc.c: gc.logexpiry
* builtin/index-pack.c: pack.indexversion
* builtin/pack-objects.c: pack.indexversion
* builtin/repack.c: pack.writebitmaps
* commit.c: i18n.commitencoding
* gpg-interface.c: user.signingkey
* http.c: http.postbuffer
* submodule-config.c: submodule.fetchjobs
Mismatched camelCases, choose the former:
* Documentation/config/transfer.txt: transfer.credentialsInUrl
remote.c: transfer.credentialsInURL
[^1]: https://github.com/git-l10n/git-po-helper
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.
* tb/cruft-packs:
sha1-file.c: don't freshen cruft packs
builtin/gc.c: conditionally avoid pruning objects via loose
builtin/repack.c: add cruft packs to MIDX during geometric repack
builtin/repack.c: use named flags for existing_packs
builtin/repack.c: allow configuring cruft pack generation
builtin/repack.c: support generating a cruft pack
builtin/pack-objects.c: --cruft with expiration
reachable: report precise timestamps from objects in cruft packs
reachable: add options to add_unseen_recent_objects_to_traversal
builtin/pack-objects.c: --cruft without expiration
builtin/pack-objects.c: return from create_object_entry()
t/helper: add 'pack-mtimes' test-tool
pack-mtimes: support writing pack .mtimes files
chunk-format.h: extract oid_version()
pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
pack-mtimes: support reading .mtimes files
Documentation/technical: add cruft-packs.txt
|
|
When using cruft packs, the following race can occur when a geometric
repack that writes a MIDX bitmap takes place afterwords:
- First, create an unreachable object and do an all-into-one cruft
repack which stores that object in the repository's cruft pack.
- Then make that object reachable.
- Finally, do a geometric repack and write a MIDX bitmap.
Assuming that we are sufficiently unlucky as to select a commit from the
MIDX which reaches that object for bitmapping, then the `git
multi-pack-index` process will complain that that object is missing.
The reason is because we don't include cruft packs in the MIDX when
doing a geometric repack. Since the "make that object reachable" doesn't
necessarily mean that we'll create a new copy of that object in one of
the packs that will get rolled up as part of a geometric repack, it's
possible that the MIDX won't see any copies of that now-reachable
object.
Of course, it's desirable to avoid including cruft packs in the MIDX
because it causes the MIDX to store a bunch of objects which are likely
to get thrown away. But excluding that pack does open us up to the above
race.
This patch demonstrates the bug, and resolves it by including cruft
packs in the MIDX even when doing a geometric repack.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We use the `util` pointer for items in the `existing_packs` string list
to indicate which packs are going to be deleted. Since that has so far
been the only use of that `util` pointer, we just set it to 0 or 1.
But we're going to add an additional state to this field in the next
patch, so prepare for that by adding a #define for the first bit so we
can more expressively inspect the flags state.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In servers which set the pack.window configuration to a large value, we
can wind up spending quite a lot of time finding new bases when breaking
delta chains between reachable and unreachable objects while generating
a cruft pack.
Introduce a handful of `repack.cruft*` configuration variables to
control the parameters used by pack-objects when generating a cruft
pack.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Expose a way to split the contents of a repository into a main and cruft
pack when doing an all-into-one repack with `git repack --cruft -d`, and
a complementary configuration variable.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To store the individual mtimes of objects in a cruft pack, introduce a
new `.mtimes` format that can optionally accompany a single pack in the
repository.
The format is defined in Documentation/technical/pack-format.txt, and
stores a 4-byte network order timestamp for each object in name (index)
order.
This patch prepares for cruft packs by defining the `.mtimes` format,
and introducing a basic API that callers can use to read out individual
mtimes.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous patch demonstrates a scenario where the list of packs
written by `pack-objects` (and stored in the `names` string_list) is
out-of-order, and can thus cause us to delete packs we shouldn't.
This patch resolves that bug by ensuring that `names` is sorted in all
cases, not just when
delete_redundant && pack_everything & ALL_INTO_ONE
is true.
Because we did sort `names` in that case (which, prior to `--geometric`
repacks, was the only time we would actually delete packs, this is only
a bug for `--geometric` repacks.
It would be sufficient to only sort `names` when `delete_redundant` is
set to a non-zero value. But sorting a small list of strings is cheap,
and it is defensive against future calls to `string_list_has_string()`
on this list.
Co-discovered-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update 'repack' to ignore packs named on the command line with the
'--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat
command line-kept packs the same way it treats packs with an on-disk '.keep'
file (that is, skip the pack and do not include it in the 'geometry'
structure).
Without this handling, a '--keep-pack' pack would be included in the
'geometry' structure. If the pack is *before* the geometry split line (with
at least one other pack and/or loose objects present), 'repack' assumes the
pack's contents are "rolled up" into another pack via 'pack-objects'.
However, because the internally-invoked 'pack-objects' properly excludes
'--keep-pack' objects, any new pack it creates will not contain the kept
objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since
it assumes 'pack-objects' created a new pack with its contents), resulting
in possible object loss and repository corruption.
Add a test ensuring that '--keep-pack' packs are now appropriately handled.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
By default, git-repack(1) will update server info that is required by
the dumb HTTP transport. This can be skipped by passing the `-n` flag,
but what we're noticably missing is a config option to permanently
disable updating this information.
Add a new option "repack.updateServerInfo" which can be used to disable
the logic. Most hosting providers have turned off the dumb HTTP protocol
anyway, and on the client-side it woudln't typically be useful either.
Giving a persistent way to disable this feature thus makes quite some
sense to avoid wasting compute cycles and storage.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
By default, git-repack(1) runs `update_server_info()` to generate info
required for the dumb HTTP protocol. This can be disabled via the `-n`
flag, which then sets the `no_update_server_info` flag. Further down the
code this leads to some double-negation logic, which is about to become
more confusing as we're about to add a new config which allows the user
to permanently disable generation of the info.
Refactor the code to avoid the double-negation and add some tests which
verify that the flag continues to work as expected.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Similar message templates have been consolidated so that
translators need to work on fewer number of messages.
* ja/i18n-similar-messages:
i18n: turn even more messages into "cannot be used together" ones
i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom"
i18n: factorize "--foo outside a repository"
i18n: refactor "unrecognized %(foo) argument" strings
i18n: factorize "no directory given for --foo"
i18n: factorize "--foo requires --bar" and the like
i18n: tag.c factorize i18n strings
i18n: standardize "cannot open" and "cannot read"
i18n: turn "options are incompatible" into "cannot be used together"
i18n: refactor "%s, %s and %s are mutually exclusive"
i18n: refactor "foo and bar are mutually exclusive"
|
|
Two fixes around "git repack".
* ds/repack-fixlets:
repack: make '--quiet' disable progress
repack: respect kept objects with '--write-midx -b'
|
|
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While testing some ideas in 'git repack', I ran it with '--quiet' and
discovered that some progress output was still shown. Specifically, the
output for writing the multi-pack-index showed the progress.
The 'show_progress' variable in cmd_repack() is initialized with
isatty(2) and is not modified at all by the '--quiet' flag. The
'--quiet' flag modifies the po_args.quiet option which is translated
into a '--quiet' flag for the 'git pack-objects' child process. However,
'show_progress' is used to directly send progress information to the
multi-pack-index writing logic which does not use a child process.
The fix here is to modify 'show_progress' to be false if po_opts.quiet
is true, and isatty(2) otherwise. This new expectation simplifies a
later condition that checks both.
Update the documentation to make it clear that '-q' will disable all
progress in addition to ensuring the 'git pack-objects' child process
will receive the flag.
Use 'test_terminal' to check that this works to get around the isatty(2)
check.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Historically, we needed a single packfile in order to have reachability
bitmaps. This introduced logic that when 'git repack' had a '-b' option
that we should stop sending the '--honor-pack-keep' option to the 'git
pack-objects' child process, ensuring that we create a packfile
containing all reachable objects.
In the world of multi-pack-index bitmaps, we no longer need to repack
all objects into a single pack to have valid bitmaps. Thus, we should
continue sending the '--honor-pack-keep' flag to 'git pack-objects'.
The fix is very simple: only disable the flag when writing bitmaps but
also _not_ writing the multi-pack-index.
This opens the door to new repacking strategies that might want to keep
some historical set of objects in a stable pack-file while only
repacking more recent objects.
To test, create a new 'test_subcommand_inexact' helper that is more
flexible than 'test_subcommand'. This allows us to look for the
--honor-pack-keep flag without over-indexing on the exact set of
arguments.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
On platforms where ulong is shorter than size_t, code paths that
shifted 1 or 1U to the left lacked the necessary cast to size_t,
which have been corrected.
* po/size-t-for-vs:
object-file.c: LLP64 compatibility, upcast unity for left shift
diffcore-delta.c: LLP64 compatibility, upcast unity for left shift
repack.c: LLP64 compatibility, upcast unity for left shift
|
|
Visual Studio reports C4334 "was 64-bit shift intended" warning
because of size mismatch.
Promote unity to the matching type to fit with the `&` operator.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
`git repack` invokes a handful of child processes: one to write the
actual pack, and optionally ones to repack promisor objects and update
the MIDX.
Most of these are freed automatically by calling `start_command()` (which
invokes it on error) and `finish_command()` which calls it
automatically.
But repack_promisor_objects() can initialize a child_process, populate
its array of arguments, and then return from the function before even
calling start_command().
Make sure that the prepared list of arguments is freed by calling
child_process_clear() ourselves to avoid leaking memory along this path.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|