From 7ed76b4eb25f03f3bec6abad219d4dda651fc30f Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 28 Sep 2023 00:38:07 -0400 Subject: commit-graph: factor out chain opening function The load_commit_graph_chain() function opens the chain file and all of the slices of graph that it points to. If there is no chain file (which is a totally normal condition), we return NULL. But if we run into errors with the chain file or loading the actual graph data, we also return NULL, and the caller cannot tell the difference. The caller can check for ENOENT for the unremarkable "no such file" case. But I'm hesitant to assume that the rest of the function would never accidentally set errno to ENOENT itself, since it is opening the slice files (and that would mean the caller fails to notice a real error). So let's break this into two functions: one to open the file, and one to actually load it. This matches the interface we provide for the non-chain graph file, which will also come in handy in a moment when we fix some bugs in the "git commit-graph verify" code. Some notes: - I've kept the "1 is good, 0 is bad" return convention (and the weird "fd" out-parameter) used by the matching open_commit_graph() function and other parts of the commit-graph code. This is unlike most of the rest of Git (which would just return the fd, with -1 for error), but it makes sense to stay consistent with the adjacent bits of the API here. - The existing chain loading function will quietly return if the file is too small to hold a single entry. I've retained that behavior (and explicitly set ENOENT in the opener function) for now, under the notion that it's probably valid (though I'd imagine unusual) to have an empty chain file. There are two small behavior changes here, but I think both are strictly positive: 1. The original blindly did a stat() before checking if fopen() succeeded, meaning we were making a pointless extra stat call. 2. We now use fstat() to check the file size. The previous code using a regular stat() on the pathname meant we could technically race with somebody updating the chain file, and end up with a size that does not match what we just opened with fopen(). I doubt anybody ever hit this in practice, but it may have caused an out-of-bounds read. We'll retain the load_commit_graph_chain() function which does both the open and reading steps (most existing callers do not care about seeing errors anyway, since loading commit-graphs is optimistic). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- commit-graph.c | 58 +++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 19 deletions(-) (limited to 'commit-graph.c') diff --git a/commit-graph.c b/commit-graph.c index 5e8a3a5085..12cdd9af8e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -513,31 +513,34 @@ static int add_graph_to_chain(struct commit_graph *g, return 1; } -static struct commit_graph *load_commit_graph_chain(struct repository *r, - struct object_directory *odb) +int open_commit_graph_chain(const char *chain_file, + int *fd, struct stat *st) +{ + *fd = git_open(chain_file); + if (*fd < 0) + return 0; + if (fstat(*fd, st)) { + close(*fd); + return 0; + } + if (st->st_size <= the_hash_algo->hexsz) { + close(*fd); + errno = ENOENT; + return 0; + } + return 1; +} + +struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, + int fd, struct stat *st) { struct commit_graph *graph_chain = NULL; struct strbuf line = STRBUF_INIT; - struct stat st; struct object_id *oids; int i = 0, valid = 1, count; - char *chain_name = get_commit_graph_chain_filename(odb); - FILE *fp; - int stat_res; + FILE *fp = xfdopen(fd, "r"); - fp = fopen(chain_name, "r"); - stat_res = stat(chain_name, &st); - free(chain_name); - - if (!fp) - return NULL; - if (stat_res || - st.st_size <= the_hash_algo->hexsz) { - fclose(fp); - return NULL; - } - - count = st.st_size / (the_hash_algo->hexsz + 1); + count = st->st_size / (the_hash_algo->hexsz + 1); CALLOC_ARRAY(oids, count); prepare_alt_odb(r); @@ -585,6 +588,23 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return graph_chain; } +static struct commit_graph *load_commit_graph_chain(struct repository *r, + struct object_directory *odb) +{ + char *chain_file = get_commit_graph_chain_filename(odb); + struct stat st; + int fd; + struct commit_graph *g = NULL; + + if (open_commit_graph_chain(chain_file, &fd, &st)) { + /* ownership of fd is taken over by load function */ + g = load_commit_graph_chain_fd_st(r, fd, &st); + } + + free(chain_file); + return g; +} + /* * returns 1 if and only if all graphs in the chain have * corrected commit dates stored in the generation_data chunk. -- cgit 1.2.3-korg From 8298b54317772ca28fec2673f59a8c2048552ae6 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 28 Sep 2023 00:38:17 -0400 Subject: commit-graph: check mixed generation validation when loading chain file In read_commit_graph_one(), we call validate_mixed_generation_chain() after loading the graph. Even though we don't check the return value, this has the side effect of clearing the read_generation_data flag, which is important when working with mixed generation numbers. But doing this in load_commit_graph_chain_fd_st() makes more sense: 1. We are calling it even when we did not load a chain at all, which is pointless (you cannot have mixed generations in a single file). 2. For now, all callers load the graph via read_commit_graph_one(). But the point of factoring out the open/load in the previous commit was to let "commit-graph verify" call them separately. So it needs to trigger this function as part of the load. Without this patch, the mixed-generation tests in t5324 would start failing on "git commit-graph verify" calls, once we switch to using a separate open/load call there. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- commit-graph.c | 54 +++++++++++++++++++++++++++--------------------------- 1 file changed, 27 insertions(+), 27 deletions(-) (limited to 'commit-graph.c') diff --git a/commit-graph.c b/commit-graph.c index 12cdd9af8e..8b29c6de24 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -473,6 +473,31 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, return g; } +/* + * returns 1 if and only if all graphs in the chain have + * corrected commit dates stored in the generation_data chunk. + */ +static int validate_mixed_generation_chain(struct commit_graph *g) +{ + int read_generation_data = 1; + struct commit_graph *p = g; + + while (read_generation_data && p) { + read_generation_data = p->read_generation_data; + p = p->base_graph; + } + + if (read_generation_data) + return 1; + + while (g) { + g->read_generation_data = 0; + g = g->base_graph; + } + + return 0; +} + static int add_graph_to_chain(struct commit_graph *g, struct commit_graph *chain, struct object_id *oids, @@ -581,6 +606,8 @@ struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, } } + validate_mixed_generation_chain(graph_chain); + free(oids); fclose(fp); strbuf_release(&line); @@ -605,31 +632,6 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, return g; } -/* - * returns 1 if and only if all graphs in the chain have - * corrected commit dates stored in the generation_data chunk. - */ -static int validate_mixed_generation_chain(struct commit_graph *g) -{ - int read_generation_data = 1; - struct commit_graph *p = g; - - while (read_generation_data && p) { - read_generation_data = p->read_generation_data; - p = p->base_graph; - } - - if (read_generation_data) - return 1; - - while (g) { - g->read_generation_data = 0; - g = g->base_graph; - } - - return 0; -} - struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb) { @@ -638,8 +640,6 @@ struct commit_graph *read_commit_graph_one(struct repository *r, if (!g) g = load_commit_graph_chain(r, odb); - validate_mixed_generation_chain(g); - return g; } -- cgit 1.2.3-korg From 7754a565e2e78e4163dbf597bba5fc729cc3bbc7 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 28 Sep 2023 00:39:10 -0400 Subject: commit-graph: tighten chain size check When we open a commit-graph-chain file, if it's smaller than a single entry, we just quietly treat that as ENOENT. That make some sense if the file is truly zero bytes, but it means that "commit-graph verify" will quietly ignore a file that contains garbage if that garbage happens to be short. Instead, let's only simulate ENOENT when the file is truly empty, and otherwise return EINVAL. The normal graph-loading routines don't care, but "commit-graph verify" will notice and complain about the difference. It's not entirely clear to me that the 0-is-ENOENT case actually happens in real life, so we could perhaps just eliminate this special-case altogether. But this is how we've always behaved, so I'm preserving it in the name of backwards compatibility (though again, it really only matters for "verify", as the regular routines are happy to load what they can). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- commit-graph.c | 10 ++++++++-- t/t5324-split-commit-graph.sh | 12 ++++++++++++ 2 files changed, 20 insertions(+), 2 deletions(-) (limited to 'commit-graph.c') diff --git a/commit-graph.c b/commit-graph.c index 8b29c6de24..b1d3e5bf94 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -548,9 +548,15 @@ int open_commit_graph_chain(const char *chain_file, close(*fd); return 0; } - if (st->st_size <= the_hash_algo->hexsz) { + if (st->st_size < the_hash_algo->hexsz) { close(*fd); - errno = ENOENT; + if (!st->st_size) { + /* treat empty files the same as missing */ + errno = ENOENT; + } else { + warning("commit-graph chain file too small"); + errno = EINVAL; + } return 0; } return 1; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index a5ac0440f1..be58545810 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -342,6 +342,18 @@ test_expect_success 'verify after commit-graph-chain corruption (tip)' ' ) ' +test_expect_success 'verify notices too-short chain file' ' + git clone --no-hardlinks . verify-chain-short && + ( + cd verify-chain-short && + git commit-graph verify && + echo "garbage" >$graphdir/commit-graph-chain && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + grep "commit-graph chain file too small" err + ) +' + test_expect_success 'verify across alternates' ' git clone --no-hardlinks . verify-alt && ( -- cgit 1.2.3-korg From 5f259197eea0a3acc48f46748778f33c935476cb Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 28 Sep 2023 00:39:51 -0400 Subject: commit-graph: report incomplete chains during verification The load_commit_graph_chain_fd_st() function will stop loading chains when it sees an error. But if it has loaded any graph slice at all, it will return it. This is a good thing for normal use (we use what data we can, and this is just an optimization). But it's a bad thing for "commit-graph verify", which should be careful about finding any irregularities. We do complain to stderr with a warning(), but the verify command still exits with a successful return code. The new tests here cover corruption of both the base and tip slices of the chain. The corruption of the base file already works (it is the first file we look at, so when we see the error we return NULL). The "tip" case is what is fixed by this patch (it complains to stderr but still returns the base slice). Likewise the existing tests for corruption of the commit-graph-chain file itself need to be updated. We already exited non-zero correctly for the "base" case, but the "tip" case can now do so, too. Note that this also causes us to adjust a test later in the file that similarly corrupts a tip (though confusingly the test script calls this "base"). It checks stderr but erroneously expects the whole "verify" command to exit with a successful code. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- builtin/commit-graph.c | 10 +++++++++- commit-graph.c | 7 +++++-- commit-graph.h | 3 ++- t/t5324-split-commit-graph.sh | 32 +++++++++++++++++++++++++++++--- 4 files changed, 45 insertions(+), 7 deletions(-) (limited to 'commit-graph.c') diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 50c15d9bfe..f5e66e9969 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -74,6 +74,7 @@ static int graph_verify(int argc, const char **argv, const char *prefix) int fd; struct stat st; int flags = 0; + int incomplete_chain = 0; int ret; static struct option builtin_commit_graph_verify_options[] = { @@ -122,13 +123,20 @@ static int graph_verify(int argc, const char **argv, const char *prefix) else if (opened == OPENED_GRAPH) graph = load_commit_graph_one_fd_st(the_repository, fd, &st, odb); else - graph = load_commit_graph_chain_fd_st(the_repository, fd, &st); + graph = load_commit_graph_chain_fd_st(the_repository, fd, &st, + &incomplete_chain); if (!graph) return 1; ret = verify_commit_graph(the_repository, graph, flags); free_commit_graph(graph); + + if (incomplete_chain) { + error("one or more commit-graph chain files could not be loaded"); + ret |= 1; + } + return ret; } diff --git a/commit-graph.c b/commit-graph.c index b1d3e5bf94..1a56efcf69 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -563,7 +563,8 @@ int open_commit_graph_chain(const char *chain_file, } struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, - int fd, struct stat *st) + int fd, struct stat *st, + int *incomplete_chain) { struct commit_graph *graph_chain = NULL; struct strbuf line = STRBUF_INIT; @@ -618,6 +619,7 @@ struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, fclose(fp); strbuf_release(&line); + *incomplete_chain = !valid; return graph_chain; } @@ -630,8 +632,9 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, struct commit_graph *g = NULL; if (open_commit_graph_chain(chain_file, &fd, &st)) { + int incomplete; /* ownership of fd is taken over by load function */ - g = load_commit_graph_chain_fd_st(r, fd, &st); + g = load_commit_graph_chain_fd_st(r, fd, &st, &incomplete); } free(chain_file); diff --git a/commit-graph.h b/commit-graph.h index 3b662fd2b5..20ada7e891 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -107,7 +107,8 @@ struct commit_graph *load_commit_graph_one_fd_st(struct repository *r, int fd, struct stat *st, struct object_directory *odb); struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r, - int fd, struct stat *st); + int fd, struct stat *st, + int *incomplete_chain); struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index be58545810..06bb897f02 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -285,6 +285,32 @@ test_expect_success 'verify hashes along chain, even in shallow' ' ) ' +test_expect_success 'verify notices chain slice which is bogus (base)' ' + git clone --no-hardlinks . verify-chain-bogus-base && + ( + cd verify-chain-bogus-base && + git commit-graph verify && + base_file=$graphdir/graph-$(sed -n 1p $graphdir/commit-graph-chain).graph && + echo "garbage" >$base_file && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + grep "commit-graph file is too small" err + ) +' + +test_expect_success 'verify notices chain slice which is bogus (tip)' ' + git clone --no-hardlinks . verify-chain-bogus-tip && + ( + cd verify-chain-bogus-tip && + git commit-graph verify && + tip_file=$graphdir/graph-$(sed -n 2p $graphdir/commit-graph-chain).graph && + echo "garbage" >$tip_file && + test_must_fail git commit-graph verify 2>test_err && + grep -v "^+" test_err >err && + grep "commit-graph file is too small" err + ) +' + test_expect_success 'verify --shallow does not check base contents' ' git clone --no-hardlinks . verify-shallow && ( @@ -306,7 +332,7 @@ test_expect_success 'warn on base graph chunk incorrect' ' git commit-graph verify && base_file=$graphdir/graph-$(tail -n 1 $graphdir/commit-graph-chain).graph && corrupt_file "$base_file" $(test_oid base) "\01" && - git commit-graph verify --shallow 2>test_err && + test_must_fail git commit-graph verify --shallow 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "commit-graph chain does not match" err ) @@ -332,11 +358,11 @@ test_expect_success 'verify after commit-graph-chain corruption (tip)' ' ( cd verify-chain-tip && corrupt_file "$graphdir/commit-graph-chain" 70 "G" && - git commit-graph verify 2>test_err && + test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "invalid commit-graph chain" err && corrupt_file "$graphdir/commit-graph-chain" 70 "A" && - git commit-graph verify 2>test_err && + test_must_fail git commit-graph verify 2>test_err && grep -v "^+" test_err >err && test_i18ngrep "unable to find all commit-graph files" err ) -- cgit 1.2.3-korg