| Age | Commit message (Collapse) | Author | Files | Lines |
|
Docfix.
* ja/doc-placeholders-fix:
doc: enforce placeholders in documentation
doc: enforce dashes in placeholders
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The next commit will remove a bunch of unnecessary includes, but to do
so, we need some of the lower level direct includes that files rely on
to be explicitly specified.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The CodingGuidelines documents stipulates that multi-word placeholders
are to be separated by dashes, not underscores nor spaces.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Leakfix.
* jk/commit-graph-leak-fixes:
commit-graph: clear oidset after finishing write
commit-graph: free write-context base_graph_name during cleanup
commit-graph: free write-context entries before overwriting
commit-graph: free graph struct that was not added to chain
commit-graph: delay base_graph assignment in add_graph_to_chain()
commit-graph: free all elements of graph chain
commit-graph: move slab-clearing to close_commit_graph()
merge: free result of repo_get_merge_bases()
commit-reach: free temporary list in get_octopus_merge_bases()
t6700: mark test as leak-free
|
|
In graph_write() we store commits in an oidset, but never clean it up,
leaking the contents. We should clear it in the cleanup section.
The oidset comes from 6830c36077 (commit-graph.h: replace 'commit_hex'
with 'commits', 2020-04-13), but it was just replacing a string_list
that was also leaked. Curiously, we fixed the leak of some adjacent
variables in commit fa8953cb40 (builtin/commit-graph.c: extract
'read_one_commit()', 2020-05-18), but the oidset wasn't included for
some reason.
In combination with the preceding commits, this lets us mark t5324 as
leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The load_commit_graph_chain_fd_st() function will stop loading chains
when it sees an error. But if it has loaded any graph slice at all, it
will return it. This is a good thing for normal use (we use what data we
can, and this is just an optimization). But it's a bad thing for
"commit-graph verify", which should be careful about finding any
irregularities. We do complain to stderr with a warning(), but the
verify command still exits with a successful return code.
The new tests here cover corruption of both the base and tip slices of
the chain. The corruption of the base file already works (it is the
first file we look at, so when we see the error we return NULL). The
"tip" case is what is fixed by this patch (it complains to stderr but
still returns the base slice).
Likewise the existing tests for corruption of the commit-graph-chain
file itself need to be updated. We already exited non-zero correctly for
the "base" case, but the "tip" case can now do so, too.
Note that this also causes us to adjust a test later in the file that
similarly corrupts a tip (though confusingly the test script calls this
"base"). It checks stderr but erroneously expects the whole "verify"
command to exit with a successful code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Because it's OK to not have a graph file at all, the graph_verify()
function needs to tell the difference between a missing file and a real
error. So when loading a traditional graph file, we call
open_commit_graph() separately from load_commit_graph_chain_fd_st(), and
don't complain if the first one fails with ENOENT.
When the function learned about chain files in 3da4b609bb (commit-graph:
verify chains with --shallow mode, 2019-06-18), we couldn't be as
careful, since the only way to load a chain was with
read_commit_graph_one(), which did both the open/load as a single unit.
So we'll miss errors in chain files we load, thinking instead that there
was just no chain file at all.
Note that we do still report some of these problems to stderr, as the
loading function calls error() and warning(). But we'd exit with a
successful exit code, which is wrong.
We can fix that by using the recently split open/load functions for
chains. That lets us treat the chain file just like a single file with
respect to error handling here.
An existing test (from 3da4b609bb) shows off the problem; we were
expecting "commit-graph verify" to report success, but that makes no
sense. We did not even verify the contents of the graph data, because we
couldn't load it! I don't think this was an intentional exception, but
rather just the test covering what happened to occur.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reduce reliance on a global state in the config reading API.
* gc/config-context:
config: pass source to config_parser_event_fn_t
config: add kvi.path, use it to evaluate includes
config.c: remove config_reader from configsets
config: pass kvi to die_bad_number()
trace2: plumb config kvi
config.c: pass ctx with CLI config
config: pass ctx with config files
config.c: pass ctx in configsets
config: add ctx arg to config_fn_t
urlmatch.h: use config_fn_t type
config: inline git_color_default_config
|
|
Header files cleanup.
* en/header-split-cache-h-part-3: (28 commits)
fsmonitor-ll.h: split this header out of fsmonitor.h
hash-ll, hashmap: move oidhash() to hash-ll
object-store-ll.h: split this header out of object-store.h
khash: name the structs that khash declares
merge-ll: rename from ll-merge
git-compat-util.h: remove unneccessary include of wildmatch.h
builtin.h: remove unneccessary includes
list-objects-filter-options.h: remove unneccessary include
diff.h: remove unnecessary include of oidset.h
repository: remove unnecessary include of path.h
log-tree: replace include of revision.h with simple forward declaration
cache.h: remove this no-longer-used header
read-cache*.h: move declarations for read-cache.c functions from cache.h
repository.h: move declaration of the_index from cache.h
merge.h: move declarations for merge.c from cache.h
diff.h: move declaration for global in diff.c from cache.h
preload-index.h: move declarations for preload-index.c from elsewhere
sparse-index.h: move declarations for sparse-index.c from cache.h
name-hash.h: move declarations for name-hash.c from cache.h
run-command.h: move declarations for run-command.c from cache.h
...
|
|
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.
In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.
Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.
The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This also made it clear that a few .c files under builtin/ were
depending upon some headers but had forgotten to #include them. Add the
missing direct includes while at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Several builtins depend on being able to disable the replace references
so we actually operate on each object individually. These currently do
so by directly mutating the 'read_replace_refs' global.
A future change will move this global into a different place, so it will
be necessary to change all of these lines. However, we can simplify that
transition by abstracting the purpose of these global assignments with a
method call.
We will need to keep this read_replace_refs global forever, as we want
to make sure that we never use replace refs throughout the life of the
process if this method is called. Future changes may present a
repository-scoped version of the variable to represent that repository's
core.useReplaceRefs config value, but a zero-valued read_replace_refs
will always override such a setting.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h. This made it more difficult
to find which files could remove a dependence on cache.h. Make C files
explicitly include trace.h or trace2.h if they are using them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adjust several files to be more explicit about their dependency on
replace-objects to accommodate this change.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the parse_options_concat() was added to this file in
84e4484f128 (commit-graph: use parse_options_concat(), 2021-08-23) we
wouldn't free() it if we returned early in these cases.
Since "result" is 0 by default we can "goto cleanup" in both cases,
and only need to set "result" if write_commit_graph_reachable() fails.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 0bfb48e6723 (builtin/commit-graph.c: UNLEAK variables, 2018-10-03)
this was made to UNLEAK(), but we can just as easily invoke the
free_commit_graph() function added in c3756d5b7fc (commit-graph: add
free_commit_graph, 2018-07-11) instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The short-help text shown by "git cmd -h" and the synopsis text
shown at the beginning of "git help cmd" have been made more
consistent.
* ab/doc-synopsis-and-cmd-usage: (34 commits)
tests: assert consistent whitespace in -h output
tests: start asserting that *.txt SYNOPSIS matches -h output
doc txt & -h consistency: make "worktree" consistent
worktree: define subcommand -h in terms of command -h
reflog doc: list real subcommands up-front
doc txt & -h consistency: make "commit" consistent
doc txt & -h consistency: make "diff-tree" consistent
doc txt & -h consistency: use "[<label>...]" for "zero or more"
doc txt & -h consistency: make "annotate" consistent
doc txt & -h consistency: make "stash" consistent
doc txt & -h consistency: add missing options
doc txt & -h consistency: use "git foo" form, not "git-foo"
doc txt & -h consistency: make "bundle" consistent
doc txt & -h consistency: make "read-tree" consistent
doc txt & -h consistency: make "rerere" consistent
doc txt & -h consistency: add missing options and labels
doc txt & -h consistency: make output order consistent
doc txt & -h consistency: add or fix optional "--" syntax
doc txt & -h consistency: fix mismatching labels
doc SYNOPSIS & -h: use "-" to separate words in labels, not "_"
...
|
|
Fix various inconsistencies between command SYNOPSIS and the
corresponding -h output where our translatable labels didn't match
up.
In some cases we need to adjust the prose that follows the SYNOPSIS
accordingly, as it refers back to the changed label.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The whitespace padding of alternatives should be of the form "[-f |
--force]" not "[-f|--force]". Likewise we should not have padding
before the first option, so "(--all | <pack-filename>...)" is correct,
not "( --all | <pack-filename>... )".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the documentation and -h output for those built-in commands
where both the -h output and *.txt were lacking in word-wrapping.
There are many more built-ins that could use this treatment, this
change is narrowed to those where this whitespace change is needed to
make the -h and *.txt consistent in the end.
In the case of "Documentation/git-hash-object.txt" and
"builtin/hash-object.c" this is not a "doc txt & -h consistency"
change, as we're changing both versions, doing so here makes a
subsequent change smaller.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.
* ab/unused-annotation:
git-compat-util.h: use "deprecated" for UNUSED variables
git-compat-util.h: use "UNUSED", not "UNUSED(var)"
|
|
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.
* jk/unused-annotation:
is_path_owned_by_current_uid(): mark "report" parameter as unused
run-command: mark unused async callback parameters
mark unused read_tree_recursive() callback parameters
hashmap: mark unused callback parameters
config: mark unused callback parameters
streaming: mark unused virtual method parameters
transport: mark bundle transport_options as unused
refs: mark unused virtual method parameters
refs: mark unused reflog callback parameters
refs: mark unused each_ref_fn parameters
git-compat-util: add UNUSED macro
|
|
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b240347543 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Recent commits such as bf0a6b65fc (builtin/multi-pack-index.c: let
parse-options parse subcommands, 2022-08-19) converted a few functions
to match our usual argc/argv/prefix conventions, but the prefix argument
remains unused.
However, there is a good use for it: they should pass it to their own
parse_options() functions, where it may be used to adjust the value of
any filename options. In all but one of these functions, there's no
behavior change, since they don't use OPT_FILENAME. But this is an
actual fix for one option, which you can see by modifying the test suite
like so:
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c1..d0974d4371 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -186,7 +186,11 @@ test_expect_success 'writing a bitmap with --refs-snapshot' '
# Then again, but with a refs snapshot which only sees
# refs/tags/one.
- git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+ (
+ mkdir subdir &&
+ cd subdir &&
+ git multi-pack-index write --bitmap --refs-snapshot=../snapshot
+ ) &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
I'd emphasize that this wasn't broken by bf0a6b65fc; it has been broken
all along, because the sub-function never got to see the prefix. It is
that commit which is actually enabling us to fix it (and which also
brought attention to the problem because it triggers -Wunused-parameter!)
The other functions changed here don't use OPT_FILENAME at all. In their
cases this isn't fixing anything visible, but it's following the usual
pattern and future-proofing them against somebody adding new options and
being surprised.
I didn't include a test for the one visible case above. We don't
generally test routine parse-options behavior for individual options.
The challenge here was finding the problem, and now that this has been
done, it's not likely to regress. Likewise, we could apply the patch
above to cover it "for free" but it makes reading the rest of the test
unnecessarily complicated.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The callback passed to git_config() must conform to a particular
interface. But most callbacks don't actually look at the extra "void
*data" parameter. Let's mark the unused parameters to make
-Wunused-parameter happy.
Note there's one unusual case here in get_remote_default() where we
actually ignore the "value" parameter. That's because it's only checking
whether the option is found at all, and not parsing its value.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
'git commit-graph' parses its subcommands with an if-else if
statement. parse-options has just learned to parse subcommands, so
let's use that facility instead, with the benefits of shorter code,
handling missing or unknown subcommands, and listing subcommands for
Bash completion.
Note that the functions implementing each subcommand only accept the
'argc' and '**argv' parameters, so add a (unused) '*prefix' parameter
to make them match the type expected by parse-options, and thus avoid
casting function pointers.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When this code was migrated to the string_list API in
d88b14b3fd6 (commit-graph: use string-list API for input, 2018-06-27)
it was made to use used both STRING_LIST_INIT_NODUP and a
strbuf_detach() pattern.
Those should not be used together if string_list_clear() is expected
to free the memory, instead we need to either use STRING_LIST_INIT_DUP
with a string_list_append_nodup(), or a STRING_LIST_INIT_NODUP and
manually fiddle with the "strdup_strings" member before calling
string_list_clear(). Let's do the former.
Since "strdup_strings = 1" is set now other code might be broken by
relying on "pack_indexes" not to duplicate it strings, but that
doesn't happen. When we pass this down to write_commit_graph() that
code uses the "struct string_list" without modifying it. Let's add a
"const" to the variable to have the compiler enforce that assumption.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach "git commit-graph" command not to allow using replace objects
at all, as we do not use the commit-graph at runtime when we see
object replacement.
* ab/ignore-replace-while-working-on-commit-graph:
commit-graph: don't consider "replace" objects with "verify"
commit-graph tests: fix another graph_git_two_modes() helper
commit-graph tests: fix error-hiding graph_git_two_modes() helper
|
|
Random changes to parse-options implementation.
* ab/parse-options-cleanup:
parse-options: change OPT_{SHORT,UNSET} to an enum
parse-options tests: test optname() output
parse-options.[ch]: make opt{bug,name}() "static"
commit-graph: stop using optname()
parse-options.c: move optname() earlier in the file
parse-options.h: make the "flags" in "struct option" an enum
parse-options.c: use exhaustive "case" arms for "enum parse_opt_result"
parse-options.[ch]: consistently use "enum parse_opt_result"
parse-options.[ch]: consistently use "enum parse_opt_flags"
parse-options.h: move PARSE_OPT_SHELL_EVAL between enums
|
|
Extend the code added in d6538246d3d (commit-graph: not compatible
with replace objects, 2018-08-20) which ignored replace objects in the
"write" command to ignore it in the "verify" command too.
We can just move this assignment to the cmd_commit_graph(), it
dispatches to "write" and "verify", and we're unlikely to ever get a
sub-command that would like to consider replace refs.
This will make tests added in eddc1f556cd (mktag tests: test
update-ref and reachable fsck, 2021-06-17) pass in combination with
the "GIT_TEST_COMMIT_GRAPH" mode added in 859fdc0c3cf (commit-graph:
define GIT_TEST_COMMIT_GRAPH, 2018-08-29), except that mode is
currently broken (but is being fixed concurrently). See the discussion
starting at [1].
1. https://lore.kernel.org/git/87wnmihswp.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop using optname() in builtin/commit-graph.c to emit an error with
the --max-new-filters option. This changes code added in 809e0327f57
(builtin/commit-graph.c: introduce '--max-new-filters=<n>',
2020-09-18).
See 9440b831ad5 (parse-options: replace opterror() with optname(),
2018-11-10) for why using optname() like this is considered bad,
i.e. it's assembling human-readable output piecemeal, and the "option
`X'" at the start can't be translated.
It didn't matter in this case, but this code was also buggy in its use
of "opt->flags" to optname(), that function expects flags, but not
*those* flags.
Let's pass "max-new-filters" to the new error because the option name
isn't translatable, and because we can re-use a translation added in
f7e68a08780 (parse-options: check empty value in OPT_INTEGER and
OPT_ABBREV, 2019-05-29).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Regression in "git commit-graph" command line parsing has been
corrected.
* tb/commit-graph-usage-fix:
builtin/multi-pack-index.c: disable top-level --[no-]progress
builtin/commit-graph.c: don't accept common --[no-]progress
|
|
The reachability bitmap file used to be generated only for a single
pack, but now we've learned to generate bitmaps for history that
span across multiple packfiles.
* tb/multi-pack-bitmaps: (29 commits)
pack-bitmap: drop bitmap_index argument from try_partial_reuse()
pack-bitmap: drop repository argument from prepare_midx_bitmap_git()
p5326: perf tests for MIDX bitmaps
p5310: extract full and partial bitmap tests
midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
t7700: update to work with MIDX bitmap test knob
t5319: don't write MIDX bitmaps in t5319
t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
t5326: test multi-pack bitmap behavior
t/helper/test-read-midx.c: add --checksum mode
t5310: move some tests to lib-bitmap.sh
pack-bitmap: write multi-pack bitmaps
pack-bitmap: read multi-pack bitmaps
pack-bitmap.c: avoid redundant calls to try_partial_reuse
pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
pack-bitmap.c: introduce 'bitmap_num_objects()'
midx: avoid opening multiple MIDXs when writing
midx: close linked MIDXs, avoid leaking memory
...
|
|
In 84e4484f12 (commit-graph: use parse_options_concat(), 2021-08-23) we
unified common options of commit-graph's subcommands into a single
"common_opts" array.
But 84e4484f12 introduced a behavior change which is to accept the
"--[no-]progress" option before any sub-commands, e.g.,
git commit-graph --progress write ...
Prior to that commit, the above would error out with "unknown option".
There are two issues with this behavior change. First is that the
top-level --[no-]progress is not always respected. This is because
isatty(2) is performed in the sub-commands, which unconditionally
overwrites any --[no-]progress that was given at the top-level.
But the second issue is that the existing sub-commands of commit-graph
only happen to both have a sensible interpretation of what `--progress`
or `--no-progress` means. If we ever added a sub-command which didn't
have a notion of progress, we would be forced to ignore the top-level
`--[no-]progress` altogether.
Since we haven't released a version of Git that supports --[no-]progress
as a top-level option for `git commit-graph`, let's remove it.
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Opening multiple instance of the same MIDX can lead to problems like two
separate packed_git structures which represent the same pack being added
to the repository's object store.
The above scenario can happen because prepare_midx_pack() checks if
`m->packs[pack_int_id]` is NULL in order to determine if a pack has been
opened and installed in the repository before. But a caller can
construct two copies of the same MIDX by calling get_multi_pack_index()
and load_multi_pack_index() since the former manipulates the
object store directly but the latter is a lower-level routine which
allocates a new MIDX for each call.
So if prepare_midx_pack() is called on multiple MIDXs with the same
pack_int_id, then that pack will be installed twice in the object
store's packed_git pointer.
This can lead to problems in, for e.g., the pack-bitmap code, which does
something like the following (in pack-bitmap.c:open_pack_bitmap()):
struct bitmap_index *bitmap_git = ...;
for (p = get_all_packs(r); p; p = p->next) {
if (open_pack_bitmap_1(bitmap_git, p) == 0)
ret = 0;
}
which is a problem if two copies of the same pack exist in the
packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a
conditional like the following:
if (bitmap_git->pack || bitmap_git->midx) {
/* ignore extra bitmap file; we can only handle one */
warning("ignoring extra bitmap file: %s", packfile->pack_name);
close(fd);
return -1;
}
Avoid this scenario by not letting write_midx_internal() open a MIDX
that isn't also pointed at by the object store. So long as this is the
case, other routines should prefer to open MIDXs with
get_multi_pack_index() or reprepare_packed_git() instead of creating
instances on their own. Because get_multi_pack_index() returns
`r->object_store->multi_pack_index` if it is non-NULL, we'll only have
one instance of a MIDX open at one time, avoiding these problems.
To encourage this, drop the `struct multi_pack_index *` parameter from
`write_midx_internal()`, and rely instead on the `object_dir` to find
(or initialize) the correct MIDX instance.
Likewise, replace the call to `close_midx()` with
`close_object_store()`, since we're about to replace the MIDX with a new
one and should invalidate the object store's memory of any MIDX that
might have existed beforehand.
Note that this now forbids passing object directories that don't belong
to alternate repositories over `--object-dir`, since before we would
have happily opened a MIDX in any directory, but now restrict ourselves
to only those reachable by `r->objects->multi_pack_index` (and alternate
MIDXs that we can see by walking the `next` pointer).
As far as I can tell, supporting arbitrary directories with
`--object-dir` was a historical accident, since even the documentation
says `<alt>` when referring to the value passed to this option.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Bring the "commit-graph" command in line with the error output and
general pattern in cmd_multi_pack_index().
Let's test for that output, and also cover the same potential bug as
was fixed in the multi-pack-index command in
88617d11f9d (multi-pack-index: fix potential segfault without
sub-command, 2021-07-19).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the parse_options() invocation in the commit-graph code to
error on unknown leftover argv elements, in addition to the existing
and implicit erroring via parse_options() on unknown options.
We'd already error in cmd_commit_graph() on e.g.:
git commit-graph unknown verify
git commit-graph --unknown verify
But here we're calling parse_options() twice more for the "write" and
"verify" subcommands. We did not do the same checking for leftover
argv elements there. As a result we'd silently accept garbage in these
subcommands, let's not do that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Rather than guarding all of the !argc with an additional "if" arm
let's do an early goto to "usage". This also makes it clear that
"save_commit_buffer" is not needed in this case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make use of the parse_options_concat() so we don't need to copy/paste
common options like --object-dir.
This is inspired by a similar change to "checkout" in 2087182272
(checkout: split options[] array in three pieces, 2019-03-29), and the
same pattern in the multi-pack-index command, see
60ca94769ce (builtin/multi-pack-index.c: split sub-commands,
2021-03-30).
A minor behavior change here is that now we're going to list both
--object-dir and --progress first, before we'd list --progress along
with other options.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If we don't handle the -h option here like most parse_options() users
we'll fall through and it'll do the right thing for us.
I think this code added in 4ce58ee38d (commit-graph: create
git-commit-graph builtin, 2018-04-02) was always redundant,
parse_options() did this at the time, and the commit-graph code never
used PARSE_OPT_NO_INTERNAL_HELP.
We don't need a test for this, it's tested by the t0012-help.sh test
added in d691551192a (t0012: test "-h" with builtins, 2017-05-30).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Share the usage message between these three variables by using a
macro. Before this new options needed to copy/paste the usage
information, see e.g. 809e0327f5 (builtin/commit-graph.c: introduce
'--max-new-filters=<n>', 2020-09-18).
See b25b727494f (builtin/multi-pack-index.c: define common usage with
a macro, 2021-03-30) for another use of this pattern (but on-list this
one came first).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
According to the guidelines in parse-options.h,
we should not end in a full stop or start with
a capital letter. Fix old error and usage
messages to match this expectation.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the spirit of 517fe807d6 (assert NOARG/NONEG behavior of
parse-options callbacks, 2018-11-05), let's cover some parse-options
callbacks which expect to be used with PARSE_OPT_NONEG but don't
explicitly assert that this is the case. These callbacks are all used
correctly in the current code, but this will help document their
expectations and future-proof the code.
As a bonus, it also silences -Wunused-parameters (these were added since
the initial sweep of 517fe807d6, and we can't yet turn on
-Wunused-parameters to remind people because it has too many existing
false positives).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a configuration variable to specify a default value for the
recently-introduce '--max-new-filters' option of 'git commit-graph
write'.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a command-line flag to specify the maximum number of new Bloom
filters that a 'git commit-graph write' is willing to compute from
scratch.
Prior to this patch, a commit-graph write with '--changed-paths' would
compute Bloom filters for all selected commits which haven't already
been computed (i.e., by a previous commit-graph write with '--split'
such that a roll-up or replacement is performed).
This behavior can cause prohibitively-long commit-graph writes for a
variety of reasons:
* There may be lots of filters whose diffs take a long time to
generate (for example, they have close to the maximum number of
changes, diffing itself takes a long time, etc).
* Old-style commit-graphs (which encode filters with too many entries
as not having been computed at all) cause us to waste time
recomputing filters that appear to have not been computed only to
discover that they are too-large.
This can make the upper-bound of the time it takes for 'git commit-graph
write --changed-paths' to be rather unpredictable.
To make this command behave more predictably, introduce
'--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters
from scratch. This lets "computing" already-known filters proceed
quickly, while bounding the number of slow tasks that Git is willing to
do.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the subsequent commit, additional options will be added to the
commit-graph API which have nothing to do with splitting.
Rename the 'split_commit_graph_opts' structure to the more-generic
'commit_graph_opts' to encompass both. Likewise, rename the 'flags'
member to instead be 'split_flags' to clarify that it only has to do
with the behavior implied by '--split'.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a future commit, some commit-graph internals will want access to
'r->settings', but we only have the 'struct object_directory *'
corresponding to that repository.
Add an additional parameter to pass the repository around in more
places.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Updates to the changed-paths bloom filter.
* ds/commit-graph-bloom-updates:
commit-graph: check all leading directories in changed path Bloom filters
revision: empty pathspecs should not use Bloom filters
revision.c: fix whitespace
commit-graph: check chunk sizes after writing
commit-graph: simplify chunk writes into loop
commit-graph: unify the signatures of all write_graph_chunk_*() functions
commit-graph: persist existence of changed-paths
bloom: fix logic in get_bloom_filter()
commit-graph: change test to die on parse, not load
commit-graph: place bloom_settings in context
|
|
Fix to the code to produce progress bar, which is new in the
upcoming release.
* tb/commit-graph-no-check-oids:
commit-graph: fix "Collecting commits from input" progress line
|
|
To display a progress line while reading commits from standard input
and looking them up, 5b6653e523 (builtin/commit-graph.c: dereference
tags in builtin, 2020-05-13) should have added a pair of
start_delayed_progress() and stop_progress() calls around the loop
reading stdin. Alas, the stop_progress() call ended up at the wrong
place, after write_commit_graph(), which does all the commit-graph
computation and writing, and has several progress lines of its own.
Consequently, that new
Collecting commits from input: 1234
progress line is overwritten by the first progress line shown by
write_commit_graph(), and its final "done" line is shown last, after
everything is finished:
$ { sleep 3 ; git rev-list -3 HEAD ; sleep 1 ; } | ~/src/git/git commit-graph write --stdin-commits
Expanding reachable commits in commit graph: 873402, done.
Writing out commit graph in 4 passes: 100% (3493608/3493608), done.
Collecting commits from input: 3, done.
Furthermore, that stop_progress() call was added after the 'cleanup'
label, where that loop reading stdin jumps in case of an error. In
case of invalid input this then results in the "done" line shown after
the error message:
$ { sleep 3 ; git rev-list -3 HEAD ; echo junk ; } | ~/src/git/git commit-graph write --stdin-commits
error: unexpected non-hex object ID: junk
Collecting commits from input: 3, done.
Move that stop_progress() call to the right place.
While at it, drop the unnecessary 'if (progress)' condition protecting
the stop_progress() call, because that function is prepared to handle
a NULL progress struct.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The changed-path Bloom filters were released in v2.27.0, but have a
significant drawback. A user can opt-in to writing the changed-path
filters using the "--changed-paths" option to "git commit-graph write"
but the next write will drop the filters unless that option is
specified.
This becomes even more important when considering the interaction with
gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of
features.experimental). These config options trigger commit-graph writes
that the user did not signal, and hence there is no --changed-paths
option available.
Allow a user that opts-in to the changed-path filters to persist the
property of "my commit-graph has changed-path filters" automatically. A
user can drop filters using the --no-changed-paths option.
In the process, we need to be extremely careful to match the Bloom
filter settings as specified by the commit-graph. This will allow future
versions of Git to customize these settings, and the version with this
change will persist those settings as commit-graphs are rewritten on
top.
Use the trace2 API to signal the settings used during the write, and
check that output in a test after manually adjusting the correct bytes
in the commit-graph file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
14ba97f8 (alloc: allow arbitrary repositories for alloc functions,
2018-05-15) introduced parsed_object_pool->commit_count to keep count of
commits per repository and was used to assign commit->index.
However, commit-slab code requires commit->index values to be unique
and a global count would be correct, rather than a per-repo count.
Let's introduce a static counter variable, `parsed_commits_count` to
keep track of parsed commits so far.
As commit_count has no use anymore, let's also drop it from the struct.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.
This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.
Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.
If callers do wish to retain this behavior, they can easily work around
this change by doing the following:
git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
awk '
!/commit/ { print "not-a-commit:"$1 }
/commit/ { print $1 }
' |
git commit-graph write --stdin-commits
To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When given a list of commits, the commit-graph machinery calls
'lookup_commit_reference_gently()' on each element in the set and treats
the resulting set of OIDs as the base over which to close for
reachability.
In an earlier collection of commits, the 'git commit-graph write
--reachable' case made the inner-most call to
'lookup_commit_reference_gently()' by peeling references before they
were passed over to the commit-graph internals.
Do the analog for 'git commit-graph write --stdin-commits' by calling
'lookup_commit_reference_gently()' outside of the commit-graph
machinery, making the inner-most call a noop.
Since this may incur additional processing time, surround
'read_one_commit' with a progress meter to provide output to the caller.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With either '--stdin-commits' or '--stdin-packs', the commit-graph
builtin will read line-delimited input, and interpret it either as a
series of commit OIDs, or pack names.
In a subsequent commit, we will begin handling '--stdin-commits'
differently by processing each line as it comes in, instead of in one
shot at the end. To make adequate room for this additional logic, split
the '--stdin-commits' case from '--stdin-packs' by only storing the
input when '--stdin-packs' is given.
In the case of '--stdin-commits', feed each line to a new
'read_one_commit' helper, which (for now) will merely call
'parse_oid_hex'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce an extension to the commit-graph to make it efficient to
check for the paths that were modified at each commit using Bloom
filters.
* gs/commit-graph-path-filter:
bloom: ignore renames when computing changed paths
commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag
t4216: add end to end tests for git log with Bloom filters
revision.c: add trace2 stats around Bloom filter usage
revision.c: use Bloom filters to speed up path based revision walks
commit-graph: add --changed-paths option to write subcommand
commit-graph: reuse existing Bloom filters during write
commit-graph: write Bloom filters to commit graph file
commit-graph: examine commits by generation number
commit-graph: examine changed-path objects in pack order
commit-graph: compute Bloom filters for changed paths
diff: halt tree-diff early after max_changes
bloom.c: core Bloom filter implementation for changed paths.
bloom.c: introduce core Bloom filter constructs
bloom.c: add the murmur3 hash implementation
commit-graph: define and use MAX_NUM_CHUNKS
|
|
"git commit-graph write" learned different ways to write out split
files.
* tb/commit-graph-split-strategy:
Revert "commit-graph.c: introduce '--[no-]check-oids'"
commit-graph.c: introduce '--[no-]check-oids'
commit-graph.h: replace 'commit_hex' with 'commits'
oidset: introduce 'oidset_size'
builtin/commit-graph.c: introduce split strategy 'replace'
builtin/commit-graph.c: introduce split strategy 'no-merge'
builtin/commit-graph.c: support for '--split[=<strategy>]'
t/helper/test-read-graph.c: support commit-graph chains
|
|
This reverts commit 7a9ce0269bc0f4ef230f930b3910b70ac3142552,
which has not yet gained consensus.
|
|
"git commit-graph write --expire-time=<timestamp>" did not use the
given timestamp correctly, which has been corrected.
* ds/commit-graph-expiry-fix:
commit-graph: fix buggy --expire-time option
|
|
When operating on a stream of commit OIDs on stdin, 'git commit-graph
write' checks that each OID refers to an object that is indeed a commit.
This is convenient to make sure that the given input is well-formed, but
can sometimes be undesirable.
For example, server operators may wish to feed the refnames that were
updated during a push to 'git commit-graph write --input=stdin-commits',
and silently discard refs that don't point at commits. This can be done
by combing the output of 'git for-each-ref' with '--format
%(*objecttype)', but this requires opening up a potentially large number
of objects. Instead, it is more convenient to feed the updated refs to
the commit-graph machinery, and let it throw out refs that don't point
to commits.
Introduce '--[no-]check-oids' to make such a behavior possible. With
'--check-oids' (the default behavior to retain backwards compatibility),
'git commit-graph write' will barf on a non-commit line in its input.
With 'no-check-oids', such lines will be silently ignored, making the
above possible by specifying this option.
No matter which is supplied, 'git commit-graph write' retains the
behavior from the previous commit of rejecting non-OID inputs like
"HEAD" and "refs/heads/foo" as before.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The 'write_commit_graph()' function takes in either a string list of
pack indices, or a string list of hexadecimal commit OIDs. These
correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from
'git commit-graph write'.
Using a string_list of hexadecimal commit IDs is not the most efficient
use of memory, since we can instead use the 'struct oidset', which is
more well-suited for this case.
This has another benefit which will become apparent in the following
commit. This is that we are about to disambiguate the kinds of errors we
produce with '--stdin-commits' into "non-hex input" and "hex-input, but
referring to a non-commit object". By having 'write_commit_graph' take
in a 'struct oidset *' of commits, we place the burden on the caller (in
this case, the builtin) to handle the first case, and the commit-graph
machinery can handle the second case.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When using split commit-graphs, it is sometimes useful to completely
replace the commit-graph chain with a new base.
For example, consider a scenario in which a repository builds a new
commit-graph incremental for each push. Occasionally (say, after some
fixed number of pushes), they may wish to rebuild the commit-graph chain
with all reachable commits.
They can do so with
$ git commit-graph write --reachable
but this removes the chain entirely and replaces it with a single
commit-graph in 'objects/info/commit-graph'. Unfortunately, this means
that the next push will have to move this commit-graph into the first
layer of a new chain, and then write its new commits on top.
Avoid such copying entirely by allowing the caller to specify that they
wish to replace the entirety of their commit-graph chain, while also
specifying that the new commit-graph should become the basis of a fresh,
length-one chain.
This addresses the above situation by making it possible for the caller
to instead write:
$ git commit-graph write --reachable --split=replace
which writes a new length-one chain to 'objects/info/commit-graphs',
making the commit-graph incremental generated by the subsequent push
relatively cheap by avoiding the aforementioned copy.
In order to do this, remove an assumption in 'write_commit_graph_file'
that chains are always at least two incrementals long.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the previous commit, we laid the groundwork for supporting different
splitting strategies. In this commit, we introduce the first splitting
strategy: 'no-merge'.
Passing '--split=no-merge' is useful for callers which wish to write a
new incremental commit-graph, but do not want to spend effort condensing
the incremental chain [1]. Previously, this was possible by passing
'--size-multiple=0', but this no longer the case following 63020f175f
(commit-graph: prefer default size_mult when given zero, 2020-01-02).
When '--split=no-merge' is given, the commit-graph machinery will never
condense an existing chain, and it will always write a new incremental.
[1]: This might occur when, for example, a server administrator running
some program after each push may want to ensure that each job runs
proportional in time to the size of the push, and does not "jump" when
the commit-graph machinery decides to trigger a merge.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With '--split', the commit-graph machinery writes new commits in another
incremental commit-graph which is part of the existing chain, and
optionally decides to condense the chain into a single commit-graph.
This is done to ensure that the asymptotic behavior of looking up a
commit in an incremental chain is not dominated by the number of
incrementals in that chain. It can be controlled by the '--max-commits'
and '--size-multiple' options.
In the next two commits, we will introduce additional splitting
strategies that can exert additional control over:
- when a split commit-graph is and isn't written, and
- when the existing commit-graph chain is discarded completely and
replaced with another graph
To prepare for this, make '--split' take an optional strategy (as in
'--split[=<strategy>]'), and add a new enum to describe which strategy
is being used. For now, no strategies are given, and the only enumerated
value is 'COMMIT_GRAPH_SPLIT_UNSPECIFIED', indicating the absence of a
strategy.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag to the test setup suite
in order to toggle writing Bloom filters when running any of the git tests.
If set to true, we will compute and write Bloom filters every time a test
calls `git commit-graph write`, as if the `--changed-paths` option was
passed in.
The test suite passes when GIT_TEST_COMMIT_GRAPH and
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS are enabled.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add --changed-paths option to git commit-graph write. This option will
allow users to compute information about the paths that have changed
between a commit and its first parent, and write it into the commit graph
file. If the option is passed to the write subcommand we set the
COMMIT_GRAPH_WRITE_BLOOM_FILTERS flag and pass it down to the
commit-graph logic.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The commit-graph builtin has an --expire-time option that takes a
datetime using OPT_EXPIRY_DATE(). However, the implementation inside
expire_commit_graphs() was treating a non-zero value as a number of
seconds to subtract from "now".
Update t5323-split-commit-graph.sh to demonstrate the correct value
of the --expire-time option by actually creating a crud .graph file
with mtime earlier than the expire time. Instead of using a super-
early time (1980) we use an explicit, and recent, time. Using
test-tool chmtime to create two files on either end of an exact
second, we create a test that catches this failure no matter the
current time. Using a fixed date is more portable than trying to
format a relative date string into the --expiry-date input.
I noticed this when inspecting some Scalar repos that had an excess
number of commit-graph files. In Scalar, we were using this second
interpretation by using "--expire-time=3600" to mean "delete graphs
older than one hour ago" to avoid deleting a commit-graph that a
foreground process may be trying to load.
Also I noticed that the help text was copied from the --max-commits
option. Fix that help text.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Returning a shared buffer invites very subtle bugs due to reentrancy or
multi-threading, as demonstrated by the previous patch.
There was an unfinished effort to abolish this [1].
Let's finally rid of `real_path()`, using `strbuf_realpath()` instead.
This patch uses a local `strbuf` for most places where `real_path()` was
previously called.
However, two places return the value of `real_path()` to the caller. For
them, a `static` local `strbuf` was added, effectively pushing the
problem one level higher:
read_gitfile_gently()
get_superproject_working_tree()
[1] https://lore.kernel.org/git/1480964316-99305-1-git-send-email-bmwill@google.com/
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply a similar treatment as in the previous patch to pass a 'struct
object_directory *' through the 'load_commit_graph_one_fd_st'
initializer, too.
This prevents a potential bug where a pointer comparison is made to a
NULL 'g->odb', which would cause the commit-graph machinery to think
that a pair of commit-graphs belonged to different alternates when in
fact they do not (i.e., in the case of no '--object-dir').
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As of the previous patch, all calls to 'commit-graph.c' functions which
perform path normalization (for e.g., 'get_commit_graph_filename()') are
of the form 'ctx->odb->path', which is always in normalized form.
Now that there are no callers passing non-normalized paths to these
functions, ensure that future callers are bound by the same restrictions
by making these functions take a 'struct object_directory *' instead of
a 'const char *'. To match, replace all calls with arguments of the form
'ctx->odb->path' with 'ctx->odb' To recover the path, functions that
perform path manipulation simply use 'odb->path'.
Further, avoid string comparisons with arguments of the form
'odb->path', and instead prefer raw pointer comparisons, which
accomplish the same effect, but are far less brittle.
This has a pleasant side-effect of making these functions much more
robust to paths that cannot be normalized by 'normalize_path_copy()',
i.e., because they are outside of the current working directory.
For example, prior to this patch, Valgrind reports that the following
uninitialized memory read [1]:
$ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ )
because 'normalize_path_copy()' can't normalize '../.git' (since it's
relative to but above of the current working directory) [2].
By using a 'struct object_directory *' directly,
'get_commit_graph_filename()' does not need to normalize, because all
paths are relative to the current working directory since they are
always read from the '->path' of an object directory.
[1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net.
[2]: The bug here is that 'get_commit_graph_filename()' returns the
result of 'normalize_path_copy()' without checking the return
value.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a previous patch, the 'char *object_dir' in 'struct commit_graph' was
replaced with a 'struct object_directory'. This patch applies the same
treatment to 'struct commit_graph', which is another intermediate step
towards getting rid of all path normalization in 'commit-graph.c'.
Instead of taking a 'char *object_dir', functions that construct a
'struct commit_graph' now take a 'struct object_directory *'. Any code
that needs an object directory path use '->path' instead.
This ensures that all calls to functions that perform path normalization
are given arguments which do not themselves require normalization. This
prepares those functions to drop their normalization entirely, which
will occur in the subsequent patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are lots of places in 'commit-graph.h' where a function either has
(or almost has) a full 'struct object_directory *', accesses '->path',
and then throws away the rest of the struct.
This can cause headaches when comparing the locations of object
directories across alternates (e.g., in the case of deciding if two
commit-graph layers can be merged). These paths are normalized with
'normalize_path_copy()' which mitigates some comparison issues, but not
all [1].
Replace usage of 'char *object_dir' with 'odb->path' by storing a
'struct object_directory *' in the 'write_commit_graph_context'
structure. This is an intermediate step towards getting rid of all path
normalization in 'commit-graph.c'.
Resolving a user-provided '--object-dir' argument now requires that we
compare it to the known alternates for equality. Prior to this patch,
an unknown '--object-dir' argument would silently exit with status zero.
This can clearly lead to unintended behavior, such as verifying
commit-graphs that aren't in a repository's own object store (or one of
its alternates), or causing a typo to mask a legitimate commit-graph
verification failure. Make this error non-silent by 'die()'-ing when the
given '--object-dir' does not match any known alternate object store.
[1]: In my testing, for example, I can get one side of the commit-graph
code to fill object_dir with "./objects" and the other with just
"objects".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The 'git commit-graph read' subcommand is used in test scripts to check
that the commit-graph contents match the expected data. Mostly, this
helps check the header information and the list of chunks. Users do not
need this information, so move the functionality to a test helper.
Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Message fix.
* sg/commit-graph-usage-fix:
builtin/commit-graph.c: remove subcommand-less usage string
|
|
The first line in 'git commit-graph's usage string indicates that this
command can be invoked without specifying a subcommand. However, this
is not the case:
$ git commit-graph
usage: git commit-graph [--object-dir <objdir>]
or: git commit-graph read [--object-dir <objdir>]
[...]
$ echo $?
129
Remove this line from the usage string.
The synopsis in the manpage doesn't contain this line.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dev support.
* gs/commit-graph-trace-with-cmd:
commit-graph: emit trace2 cmd_mode for each sub-command
|
|
* gs/commit-graph-progress:
commit-graph: add --[no-]progress to write and verify
|
|
A pair of small fixups to "git commit-graph" have been applied.
* jk/commit-graph-cleanup:
commit-graph: turn off save_commit_buffer
commit-graph: don't show progress percentages while expanding reachable commits
|
|
Emit trace2_cmd_mode() messages for each commit-graph
sub-command.
The commit graph commands were in flux when trace2 was
making it's way to git. Now that we have enough sub-commands
in commit-graph, we can label the various modes within them.
Distinguishing between read, write and verify is a great
start.
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add --[no-]progress to git commit-graph write and verify.
The progress feature was introduced in 7b0f229
("commit-graph write: add progress output", 2018-09-17) but
the ability to opt-out was overlooked.
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The commit-graph tool may read a lot of commits, but it only cares about
parsing their metadata (parents, trees, etc) and doesn't ever show the
messages to the user. And so it should not need save_commit_buffer,
which is meant for holding onto the object data of parsed commits so
that we can show them later. In fact, it's quite harmful to do so.
According to massif, the max heap of "git commit-graph write
--reachable" in linux.git before/after this patch (removing the commit
graph file in between) goes from ~1.1GB to ~270MB.
Which isn't surprising, since the difference is about the sum of the
uncompressed sizes of all commits in the repository, and this was
equivalent to leaking them.
This obviously helps if you're under memory pressure, but even without
it, things go faster. My before/after times for that command (without
massif) went from 12.521s to 11.874s, a speedup of ~5%.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While 'git commit-graph write --stdin-commits' expects commit object
ids as input, it accepts and silently skips over any invalid commit
object ids, and still exits with success:
# nonsense
$ echo not-a-commit-oid | git commit-graph write --stdin-commits
$ echo $?
0
# sometimes I forgot that refs are not good...
$ echo HEAD | git commit-graph write --stdin-commits
$ echo $?
0
# valid tree OID, but not a commit OID
$ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits
$ echo $?
0
$ ls -l .git/objects/info/commit-graph
ls: cannot access '.git/objects/info/commit-graph': No such file or directory
Check that all input records are indeed valid commit object ids and
return with error otherwise, the same way '--stdin-packs' handles
invalid input; see e103f7276f (commit-graph: return with errors during
write, 2019-06-12).
Note that it should only return with error when encountering an
invalid commit object id coming from standard input. However,
'--reachable' uses the same code path to process object ids pointed to
by all refs, and that includes tag object ids as well, which should
still be skipped over. Therefore add a new flag to 'enum
commit_graph_write_flags' and a corresponding field to 'struct
write_commit_graph_context', so we can differentiate between those two
cases.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If we wrote a commit-graph chain, we only modified the tip file in
the chain. It is valuable to verify what we wrote, but not waste
time checking files we did not write.
Add a '--shallow' option to the 'git commit-graph verify' subcommand
and check that it does not read the base graph in a two-file chain.
Making the verify subcommand read from a chain of commit-graphs takes
some rearranging of the builtin code.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The split commit-graph feature is now fully implemented, but needs
some more run-time configurability. Allow direct callers to 'git
commit-graph write --split' to specify the values used in the
merge strategy and the expire time.
Update the documentation to specify these values.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new "--split" option to the 'git commit-graph write' subcommand. This
option allows the optional behavior of writing a commit-graph chain.
The current behavior will add a tip commit-graph containing any commits that
are not in the existing commit-graph or commit-graph chain. Later changes
will allow merging the chain and expiring out-dated files.
Add a new test script (t5324-split-commit-graph.sh) that demonstrates this
behavior.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
As we update these methods, adding more parameters this way becomes
cluttered and hard to maintain.
Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type. Return zero on success
and a negative value on failure.
Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.
The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value. Test that 'git commit-graph
write' returns a proper error code when hitting a failure condition
in write_commit_graph().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change "commit-graph verify" to error on open() failures other than
ENOENT. As noted in the third paragraph of 283e68c72f ("commit-graph:
add 'verify' subcommand", 2018-06-27) and the test it added it's
intentional that "commit-graph verify" doesn't error out when the file
doesn't exist.
But let's not be overly promiscuous in what we accept. If we can't
read the file for other reasons, e.g. permission errors, bad file
descriptor etc. we'd like to report an error to the user.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
An earlier change implemented load_commit_graph_one_fd_st() in a way
that was bug-compatible with earlier code in terms of the "graph file
%s is too small" error message printing out the path to the
commit-graph (".git/objects/info/commit-graph").
But change that, because:
* A function that takes an already-open file descriptor also needing
the filename isn't very intuitive.
* The vast majority of errors we might emit when loading the graph
come from parse_commit_graph(), which doesn't report the
filename. Let's not do that either in this case for consistency.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the commit-graph loading code work as a library that returns an
error code instead of calling exit(1) when the commit-graph is
corrupt. This means that e.g. "status" will now report commit-graph
corruption as an "error: [...]" at the top of its output, but then
proceed to work normally.
This required splitting up the load_commit_graph_one() function so
that the code that deals with open()-ing and stat()-ing the graph can
now be called independently as open_commit_graph().
This is needed because "commit-graph verify" where the graph doesn't
exist isn't an error. See the third paragraph in
283e68c72f ("commit-graph: add 'verify' subcommand",
2018-06-27). There's a bug in that logic where we conflate the
intended ENOENT with other errno values (e.g. EACCES), but this change
doesn't address that. That'll be addressed in a follow-up change.
I'm then splitting most of the logic out of load_commit_graph_one()
into load_commit_graph_one_fd_st(), which allows for providing an
existing file descriptor and stat information to the loading
code. This isn't strictly needed, but it would be redundant and
confusing to open() and stat() the file twice for some of the
codepaths, this allows for calling open_commit_graph() followed by
load_commit_graph_one_fd_st(). The "graph_file" still needs to be
passed to that function for the the "graph file %s is too small" error
message.
This leaves load_commit_graph_one() unused by everything except the
internal prepare_commit_graph_one() function, so let's mark it as
"static". If someone needs it in the future we can remove the "static"
attribute. I could also rewrite its sole remaining
user ("prepare_commit_graph_one()") to use
load_commit_graph_one_fd_st() instead, but let's leave it at this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* ds/commit-graph-leakfix:
commit-graph: reduce initial oid allocation
builtin/commit-graph.c: UNLEAK variables
commit-graph: clean up leaked memory during write
|
|
The recently introduced commit-graph auxiliary data is incompatible
with mechanisms such as replace & grafts that "breaks" immutable
nature of the object reference relationship. Disable optimizations
based on its use (and updating existing commit-graph) when these
incompatible features are in use in the repository.
* ds/commit-graph-with-grafts:
commit-graph: close_commit_graph before shallow walk
commit-graph: not compatible with uninitialized repo
commit-graph: not compatible with grafts
commit-graph: not compatible with replace objects
test-repository: properly init repo
commit-graph: update design document
refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback
refs.c: migrate internal ref iteration to pass thru repository argument
|
|
`graph_verify()`, `graph_read()` and `graph_write()` do the hard work of
`cmd_commit_graph()`. As soon as these return, so does
`cmd_commit_graph()`.
`strbuf_getline()` may allocate memory in the strbuf, yet return EOF.
We need to release the strbuf or UNLEAK it. Go for the latter since we
are close to returning from `graph_write()`.
`graph_write()` also fails to free the strings in the string list. They
have been added to the list with `strdup_strings` set to 0. We could
flip `strdup_strings` before clearing the list, which is our usual hack
in situations like this. But since we are about to exit, let's just
UNLEAK the whole string list instead.
UNLEAK `graph` in `graph_verify`. While at it, and for consistency,
UNLEAK in `graph_read()` as well, and remove an unnecessary UNLEAK just
before dying.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository (a test case for a large
monorepository).
Furthermore, since the gc.writeCommitGraph setting was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
there was no indication at all from a "git gc" run that anything was
different. This why one of the progress bars being added here uses
start_progress() instead of start_delayed_progress(), so that it's
guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
repository:
$ git -c gc.writeCommitGraph=true gc
Enumerating objects: 2821, done.
[...]
Computing commit graph generation numbers: 100% (867/867), done.
On larger repositories, such as linux.git the delayed progress bar(s)
will kick in, and we'll show what's going on instead of, as was
previously happening, printing nothing while we write the graph:
$ git -c gc.writeCommitGraph=true gc
[...]
Annotating commits in commit graph: 1565573, done.
Computing commit graph generation numbers: 100% (782484/782484), done.
Note that here we don't show "Finding commits for commit graph", this
is because under "git gc" we seed the search with the commit
references in the repository, and that set is too small to show any
progress, but would e.g. on a smaller repo such as git.git with
--stdin-commits:
$ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits
Finding commits for commit graph: 100% (162576/162576), done.
Computing commit graph generation numbers: 100% (162576/162576), done.
With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either by detecting that
we're only processing one pack, or by first looping over the packs to
discover how many commits they have. I don't see the point in doing
that work. So instead we get (on 2015-04-03-1M-git.git):
$ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs
Finding commits for commit graph: 13064614, done.
Annotating commits in commit graph: 3001341, done.
Computing commit graph generation numbers: 100% (1000447/1000447), done.
No GC mode uses --stdin-packs. It's what they use at Microsoft to
manually compute the generation numbers for their collection of large
packs which are never coalesced.
The reason we need a "report_progress" variable passed down from "git
gc" is so that we don't report this output when we're running in the
process "git gc --auto" detaches from the terminal.
Since we write the commit graph from the "git gc" process itself (as
opposed to what we do with say the "git repack" phase), we'd end up
writing the output to .git/gc.log and reporting it to the user next
time as part of the "The last gc run reported the following[...]"
error, see 329e6e8794 ("gc: save log from daemonized gc --auto and
print it next time", 2015-09-19).
So we must keep track of whether or not we're running in that
demonized mode, and if so print no progress.
See [2] and subsequent replies for a discussion of an approach not
taken in compute_generation_numbers(). I.e. we're saying "Computing
commit graph generation numbers", even though on an established
history we're mostly skipping over all the work we did in the
past. This is similar to the white lie we tell in the "Writing
objects" phase (not all are objects being written).
Always showing progress is considered more important than
accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang
for 6 seconds with no output on the second "git gc" if no changes were
made to any objects in the interim if we'd take the approach in [2].
1. https://github.com/avar/2015-04-03-1M-git
2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com>
(https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Create new method commit_graph_compatible(r) to check if a given
repository r is compatible with the commit-graph feature. Fill the
method with a check to see if replace-objects exist. Test this
interaction succeeds, including ignoring an existing commit-graph and
failing to write a new commit-graph. However, we do ensure that
we write a new commit-graph by setting read_replace_refs to 0, thereby
ignoring the replace refs.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.
Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.
If no commit-graph file exists, this is an acceptable state. Do
not report any errors.
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach git-commit-graph to add all commits from the existing
commit-graph file to the file about to be written. This should be
used when adding new commits without performing garbage collection.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.
For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach git-commit-graph to read commit graph files and summarize their contents.
Use the read subcommand to verify the contents of a commit graph file in the
tests.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach git-commit-graph to write graph files. Create new test script to verify
this command succeeds without failure.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach git the 'commit-graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for an '--object-dir' option.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|