| Age | Commit message (Collapse) | Author | Files | Lines |
|
A corner case bug in "git log -L..." has been corrected.
* sg/line-log-boundary-fixes:
line-log: show all line ranges touched by the same diff range
line-log: fix assertion error
|
|
In process_ranges_arbitrary_commit() the condition deciding whether
the given commit is not a merge, i.e. that it doesn't have more than
one parent, is head-scratchingly backwards, flip it.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
process_ranges_ordinary_commit() uses a local diff queue variable,
which it leaves uninitialized before passing its address to
queue_diffs(). This is not an issue, because at the end of that
function the contents of an other diff queue is moved into it by
simply overwriting whatever is in there, i.e. without reading any
uninitialized memory.
Still, seeing the uninitialized diff queue being passed around scared
me more than once, so out of caution let's make sure that it's
initialized.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We can easily iterate through the parents of a merge commit without
turning the list of parents into a dynamically allocated array of
parents, so let's do so. This way we can avoid a memory allocation
for each processed merge commit, though its effect on runtime seems to
be unmeasurable.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In process_ranges_merge_commit(), the line-level log first creates an
array of diff queues by iterating over all parents of a merge commit
and computing a tree diff for each. Then in a second loop it iterates
over those diff queues, and if it finds that none of the interesting
paths were modified in one of them, then it will return early. This
means that when none of the interesting paths were modified between a
merge and its first parent, then the tree diff between the merge and
its second (Nth...) parent was computed in vain.
Unify these two loops, so when it iterates over all parents of a merge
commit, then it first computes the tree diff between the merge and
that particular parent and then processes the resulting diff queue
right away. This way we can spare some tree diff computing, thereby
speeding up line-level log in repositories with mergy history:
# git.git, 25.8% of commits are merges:
Benchmark 1: ./git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0
Time (mean ± σ): 1.001 s ± 0.009 s [User: 0.906 s, System: 0.095 s]
Range (min … max): 0.991 s … 1.023 s 10 runs
Benchmark 2: ./git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0
Time (mean ± σ): 445.5 ms ± 3.4 ms [User: 358.8 ms, System: 84.3 ms]
Range (min … max): 440.1 ms … 450.3 ms 10 runs
Summary
'./git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0' ran
2.25 ± 0.03 times faster than './git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0'
# linux.git, 7.5% of commits are merges:
Benchmark 1: ./git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16
Time (mean ± σ): 3.246 s ± 0.007 s [User: 2.835 s, System: 0.409 s]
Range (min … max): 3.232 s … 3.255 s 10 runs
Benchmark 2: ./git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16
Time (mean ± σ): 2.467 s ± 0.014 s [User: 2.113 s, System: 0.353 s]
Range (min … max): 2.455 s … 2.505 s 10 runs
Summary
'./git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16' ran
1.32 ± 0.01 times faster than './git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16'
And since now each iteration computes a tree diff and processes its
result, there is no reason to store the diff queues for each merge
parent anymore, so replace that diff queue array with a loop-local
diff queue variable. With this change the static free_diffqueues()
helper function in 'line-log.c' has no more callers left, remove it.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When line-level log is invoked with more than one disjoint line range
in the same file, and one of the commits happens to change that file
such that one diff range modifies more than one line range, then
changes to all modified line ranges should be shown, but only the
changes in the first modified line range are:
$ git log --oneline -p
80ca903 (HEAD -> master) Initial
diff --git a/file b/file
new file mode 100644
index 0000000..00935f1
--- /dev/null
+++ b/file
@@ -0,0 +1,10 @@
+Line 1
+Line 2
+Line 3
+Line 4
+Line 5
+Line 6
+Line 7
+Line 8
+Line 9
+Line 10
$ git log --oneline -L1,2:file -L4,5:file -L7,8:file
80ca903 (HEAD -> master) Initial
diff --git a/file b/file
--- /dev/null
+++ b/file
@@ -0,0 +1,2 @@
+Line 1
+Line 2
The line-log-specific diff printer is already clever enough to handle
the case when one line range covers multiple diff ranges, but the
possibility of one diff range touching multiple disjoint line ranges
was apparently overlooked.
Add the necessary condition to dump_diff_hacky_one() to handle this case
as well, and show all modified line ranges:
$ git log --oneline -L1,2:file -L4,5:file -L7,8:file
0f9a5b4 (HEAD -> master) Initial
diff --git a/file b/file
--- /dev/null
+++ b/file
@@ -0,0 +1,2 @@
+Line 1
+Line 2
@@ -0,0 +4,2 @@
+Line 4
+Line 5
@@ -0,0 +7,2 @@
+Line 7
+Line 8
This bug was already present in the initial line-log implementation
added in 2da1d1f6f (Implement line-history search (git log -L),
2013-03-28). Interestingly, that commit already contained a canned
test case covering a similar scenario:
"-L '/long f/',/^}/:a.c -L /main/,/^}/:a.c simple"
This test case looks for two line ranges in the same file, and both
trace back disjointly to the test repository's inital commit,
therefore changes to both line ranges should have been shown for the
initial commit, but only changes for the first line range are shown.
So this test case should have failed from the very beginning, but it
never did, because, unfortunately, the canned expected result is
incorrect, as it doesn't include changes for the second line range.
A similar test with a similarly incorrect canned expected result was
added later in 209618860c (log -L: fix overlapping input ranges,
2013-04-05).
Correct these two canned expected results to contain the changes for
the second line range for the initial commit as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When line-level log is invoked with more than one disjoint line range
in the same file, and one of the commits happens to change that file
such that:
- the last line of a line range R(n) immediately preceeds the first line
modified or added by a hunk H, and
- subtracting the number of lines added by hunk H from the start and
end of the subsequent line range R(n+1) would result in a range
overlapping with line range R(n),
then git aborts with an assertion error, because those overlapping
line ranges violate the invariants:
$ git log --oneline -p
73e4e2f (HEAD -> master) Add lines 6 7 8 9 10
diff --git a/file b/file
index 572d5d9..00935f1 100644
--- a/file
+++ b/file
@@ -3,3 +3,8 @@ Line 2
Line 3
Line 4
Line 5
+Line 6
+Line 7
+Line 8
+Line 9
+Line 10
66e3561 Add lines 1 2 3 4 5
diff --git a/file b/file
new file mode 100644
index 0000000..572d5d9
--- /dev/null
+++ b/file
@@ -0,0 +1,5 @@
+Line 1
+Line 2
+Line 3
+Line 4
+Line 5
$ git log --oneline -L3,5:file -L7,8:file
git: line-log.c:73: range_set_append: Assertion `rs->nr == 0 || rs->ranges[rs->nr-1].end <= a' failed.
Aborted (core dumped)
The line-log machinery encodes line and diff ranges internally as
[start, end) pairs, i.e. include 'start' but exclude 'end', and line
numbering starts at 0 (as opposed to the -LX,Y option, where it starts
at 1, IOW the parameter -L3,5 is represented internally as { start =
2, end = 5 }).
The reason for this assertion error and some related issues is that
there are a couple of places where 'end' is mistakenly considered to
be part of the range:
- When a commit modifies an interesting path, the line-log machinery
first checks which diff range (i.e. hunk) modify any line ranges.
This is done in diff_ranges_filter_touched(), where the outer loop
iterates over the diff ranges, and in each iteration the inner
loop advances the line ranges supposedly until the current line
range ends at or after the current diff range starts, and then the
current diff and line ranges are checked for overlap.
For HEAD in the above example the first line range [2, 5) ends
just before the diff range [5, 10) starts, so the inner loop
should advance, and then the second line range [6, 8) and the diff
range should be checked for overlap.
Unfortunately, the condition of the inner loop mistakenly
considers 'end' as part of the line range, and, seeing the diff
range starting at 5 and the line range ending at 5, it doesn't
skip the first range. Consequently, the diff range and the first
line range are checked for overlap, and after that the outer loop
runs out of diff ranges, and then the processing goes on in the
false belief that this commit didn't touch any of the interesting
line ranges.
The line-log machinery later shifts the line ranges to account for
any added/removed lines in the diff ranges preceeding each line
range. This leaves the first line range intact, but attempts to
shift the second line range [6, 8) by 5 lines towards the
beginning of the file, resulting in [1, 3), triggering the
assertion error, because the two overlapping line ranges violate
the invariants.
Fix that loop condition in diff_ranges_filter_touched() to not
treat 'end' as part of the line range.
- With the above fix the assertion error is gone... but, alas, we
now get stuck in an endless loop!
This happens in range_set_difference(), where a couple of nested
loops iterate over the line and diff ranges, and a condition is
supposed to break the middle loop when the current line range ends
before the current diff range, so processing could continue with
the next line range.
For HEAD in the above example the first line range [2, 5) ends
just before the diff range [5, 10) starts, so this condition
should trigger and break the middle loop.
Unfortunately, just like in the case of the assertion error, this
conditions mistakenly considers 'end' as part of the line range,
and, seeing the line range ending at 5 and the diff range starting
at 5, it doesn't break the loop, which will then go on and on.
Fix this condition in range_set_difference() to not treat 'end' as
part of the line range.
- With the above fix the endless loop is gone... but, alas, the
output is now wrong, as it shows both line ranges for HEAD, even
though the first line range is not modified by that commit:
$ git log --oneline -L3,5:file -L7,8:file
73e4e2f (HEAD -> master) Add lines 6 7 8 9 10
diff --git a/file b/file
--- a/file
+++ b/file
@@ -3,3 +3,3 @@
Line 3
Line 4
Line 5
@@ -6,0 +7,2 @@
+Line 7
+Line 8
66e3561 Add lines 1 2 3 4 5
diff --git a/file b/file
--- /dev/null
+++ b/file
@@ -0,0 +3,3 @@
+Line 3
+Line 4
+Line 5
In dump_diff_hacky_one() a couple of nested loops are responsible
for finding and printing the modified line ranges: the big outer
loop iterates over all line ranges, and the first inner loop skips
over the diff ranges that end before the start of the current line
range. This is followed by a condition checking whether the
current diff range starts after the end of the current line range,
which, when fulfilled, continues and advances the outer loop to
the next line range.
For HEAD in the above example the first line range [2, 5) ends
just before the diff range [5, 10), so this condition should
trigger, and the outer loop should advance to the second line
range.
Unfortunately, just like in the previous cases, this condition
mistakenly considers 'end' as part of the line range, and, seeing
the first line range ending at 5 and the diff range starting at 5,
it doesn't continue to advance the outher loop, but goes on to
show the (unmodified) first line range.
Fix this condition to not treat 'end' as part of the line range,
just like in the previous cases.
After all this the command in the above example finally finishes and
produces the right output:
$ git log --oneline -L3,5:file -L7,8:file
73e4e2f (HEAD -> master) Add lines 6 7 8 9 10
diff --git a/file b/file
--- a/file
+++ b/file
@@ -6,0 +7,2 @@
+Line 7
+Line 8
66e3561 Add lines 1 2 3 4 5
diff --git a/file b/file
--- /dev/null
+++ b/file
@@ -0,0 +3,3 @@
+Line 3
+Line 4
+Line 5
Add a canned test similar to the above example, with the line ranges
adjusted to the test repository's history.
Reported-by: Evgeni Chasnovski <evgeni.chasnovski@gmail.com>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
git code style requires that functions operating on a struct S
should be named in the form S_verb. However, the functions operating
on struct bloom_key do not follow this convention. Therefore,
fill_bloom_key() and clear_bloom_key() are renamed to bloom_key_fill()
and bloom_key_clear(), respectively.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `process_ranges_merge_commit()` we try to figure out which of the
parents can be blamed for the given line changes. When we figure out
that none of the files in the line-log have changed we assign the
complete blame to that commit and rewrite the parents of the current
commit to only use that single parent.
This is done via `commit_list_append()`, which is misleadingly _not_
appending to the list of parents. Instead, we overwrite the parents with
the blamed parent. This makes us lose track of the old pointers,
creating a memory leak.
Fix this issue by freeing the parents before we overwrite them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* jk/output-prefix-cleanup:
diff: store graph prefix buf in git_graph struct
diff: return line_prefix directly when possible
diff: return const char from output_prefix callback
diff: drop line_prefix_length field
line-log: use diff_line_prefix() instead of custom helper
|
|
More leakfixes.
* ps/leakfixes-part-8: (23 commits)
builtin/send-pack: fix leaking list of push options
remote: fix leaking push reports
t/helper: fix leaks in proc-receive helper
pack-write: fix return parameter of `write_rev_file_order()`
revision: fix leaking saved parents
revision: fix memory leaks when rewriting parents
midx-write: fix leaking buffer
pack-bitmap-write: fix leaking OID array
pseudo-merge: fix leaking strmap keys
pseudo-merge: fix various memory leaks
line-log: fix several memory leaks
diff: improve lifecycle management of diff queues
builtin/revert: fix leaking `gpg_sign` and `strategy` config
t/helper: fix leaking repository in partial-clone helper
builtin/clone: fix leaking repo state when cloning with bundle URIs
builtin/pack-redundant: fix various memory leaks
builtin/stash: fix leaking `pathspec_from_file`
submodule: fix leaking submodule entry list
wt-status: fix leaking buffer with sparse directories
shell: fix leaking strings
...
|
|
Our local output_prefix() is exactly the same as the public
diff_line_prefix() function. Let's just use that one, saving us a little
bit of code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The output_prefix() method in line-log.c may call a function pointer via
the diff_options struct. This function pointer returns a strbuf struct
and then its buffer is passed back. However, that implies that the
consumer is responsible to free the string. This is especially true
because the default behavior is to duplicate the empty string.
The existing functions used in the output_prefix pointer include:
1. idiff_prefix_cb() in diff-lib.c. This returns the data pointer, so
the value exists across multiple calls.
2. diff_output_prefix_callback() in graph.c. This uses a static strbuf
struct, so it reuses buffers across calls. These should not be
freed.
3. output_prefix_cb() in range-diff.c. This is similar to the
diff-lib.c case.
In each case, we should not be freeing this buffer. We can convert the
output_prefix() function to return a const char pointer and stop freeing
the result.
This choice is essentially the opposite of what was done in 394affd46d
(line-log: always allocate the output prefix, 2024-06-07).
This was discovered via 'valgrind' while investigating a public report
of a bug in 'git log --graph -L' [1].
[1] https://github.com/git-for-windows/git/issues/5185
This issue would have been caught by the new test, when Git is compiled
with ASan to catch these double frees.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As described in "line-log.c" itself, the code is "leaking like a sieve".
These leaks are all of rather trivial nature, so this commit plugs them
without going too much into details for each of those leaks.
The leaks are hit by t4211, but plugging them alone does not make the
full test suite pass. The remaining leaks are unrelated to the line-log
subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The lifecycle management of diff queues is somewhat confusing:
- For most of the part this can be attributed to `DIFF_QUEUE_CLEAR()`,
which does not release any memory but rather initializes the queue,
only. This is in contrast to our common naming schema, where
"clearing" means that we release underlying memory and then
re-initialize the data structure such that it is ready to use.
- A second offender is `diff_free_queue()`, which does not free the
queue structure itself. It is rather a release-style function.
Refactor the code to make things less confusing. `DIFF_QUEUE_CLEAR()` is
replaced by `DIFF_QUEUE_INIT` and `diff_queue_init()`, while
`diff_free_queue()` is replaced by `diff_queue_release()`. While on it,
adapt callsites where we call `DIFF_QUEUE_CLEAR()` with the intent to
release underlying memory to instead call `diff_queue_clear()` to fix
memory leaks.
This memory leak is exposed by t4211, but plugging it alone does not
make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The returned string by `output_prefix()` is sometimes a string constant
and sometimes an allocated string. This has been fine until now because
we always leak the allocated strings, and thus we never tried to free
the string constant.
Fix the code to always return an allocated string and free the returned
value at all callsites.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop assigning a string constant to the file parent buffer and instead
assign an allocated string. While the code is fine in practice, it will
break once we compile with `-Wwrite-strings`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The unnecessary include in the header transitively pulled in some
other headers actually needed by source files, though. Have those
source files explicitly include the headers they need.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The point of release_revisions() is to free memory associated with the
rev_info struct, but we have several "struct decoration" members that
are left untouched. Since the previous commit introduced a function to
do that, we can just call it.
We do have to provide some specialized callbacks to map the void
pointers onto real ones (the alternative would be casting the existing
function pointers; this generally works because "void *" is usually
interchangeable with a struct pointer, but it is technically forbidden
by the standard).
Since the line-log code does not expose the type it stores in the
decoration (nor of course the function to free it), I put this behind a
generic line_log_free() entry point. It's possible we may need to add
more line-log specific bits anyway (running t4211 shows a number of
other leaks in the line-log code).
While this doubtless cleans up many leaks triggered by the test suite,
the only script which becomes leak-free is t4217, as it does very little
beyond a simple traversal (its existing leak was from the use of
--children, which is now fixed).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This also made it clear that several .c files depended upon various
things that oidset included, but had omitted the direct #include for
those headers. Add those now.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
By moving several declarations to setup.h, the previous patch made it
possible to remove the include of cache.h in several source files. Do
so.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More work towards -Wunused.
* jk/unused-post-2.39-part2: (21 commits)
help: mark unused parameter in git_unknown_cmd_config()
run_processes_parallel: mark unused callback parameters
userformat_want_item(): mark unused parameter
for_each_commit_graft(): mark unused callback parameter
rewrite_parents(): mark unused callback parameter
fetch-pack: mark unused parameter in callback function
notes: mark unused callback parameters
prio-queue: mark unused parameters in comparison functions
for_each_object: mark unused callback parameters
list-objects: mark unused callback parameters
mark unused parameters in signal handlers
run-command: mark error routine parameters as unused
mark "pointless" data pointers in callbacks
ref-filter: mark unused callback parameters
http-backend: mark unused parameters in virtual functions
http-backend: mark argc/argv unused
object-name: mark unused parameters in disambiguate callbacks
serve: mark unused parameters in virtual functions
serve: use repository pointer to get config
ls-refs: drop config caching
...
|
|
The rewrite_parents() function takes a callback, but not every callback
needs the "rev" parameter. Mark the unused one so -Wunused-parameter
will be happy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When processing merge commits, the line-level log first creates an
array of diff queues, each comparing the merge commit with one of its
parents, to check whether any of the files in the given line ranges
were modified. Alas, when freeing these queues it only frees the
filepairs in the queues, but not the queues' internal arrays holding
pointers to those filepairs.
Use the diff_free_queue() helper function introduced in the previous
commit to free the diff queues' internal arrays as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
|
|
When processing a non-merge commit, the line-level log first asks the
tree-diff machinery whether any of the files in the given line ranges
were modified between the current commit and its parent, and if some
of them were, then it loads the contents of those files from both
commits to see whether their line ranges were modified and/or need to
be adjusted. Alas, it doesn't free() the diff queue holding the
results of that query and the contents of those files once its done.
This can add up to a substantial amount of leaked memory, especially
when the file in question is big and is frequently modified: a user
reported "Out of memory, malloc failed" errors with a 2MB text file
that was modified ~2800 times [1] (I estimate the leak would use up
almost 11GB memory in that case).
Free that diff queue to plug this memory leak. However, instead of
simply open-coding the necessary three lines, add them as a helper
function to the diff API, because it will be useful elsewhere as well.
[1] https://public-inbox.org/git/CAFOPqVXz2XwzX8vGU7wLuqb2ZuwTuOFAzBLRM_QPk+NJa=eC-g@mail.gmail.com/
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
|
|
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.
* tb/bloom-improvements:
commit-graph: introduce 'commitGraph.maxNewFilters'
builtin/commit-graph.c: introduce '--max-new-filters=<n>'
commit-graph: rename 'split_commit_graph_opts'
bloom: encode out-of-bounds filters as non-empty
bloom/diff: properly short-circuit on max_changes
bloom: use provided 'struct bloom_filter_settings'
bloom: split 'get_bloom_filter()' in two
commit-graph.c: store maximum changed paths
commit-graph: respect 'commitGraph.readChangedPaths'
t/helper/test-read-graph.c: prepare repo settings
commit-graph: pass a 'struct repository *' in more places
t4216: use an '&&'-chain
commit-graph: introduce 'get_bloom_filter_settings()'
|
|
'get_bloom_filter' takes a flag to control whether it will compute a
Bloom filter if the requested one is missing. In the next patch, we'll
add yet another parameter to this method, which would force all but one
caller to specify an extra 'NULL' parameter at the end.
Instead of doing this, split 'get_bloom_filter' into two functions:
'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only
looks up a Bloom filter (and does not compute one if it's missing,
thus dropping the 'compute_if_not_present' flag). The latter does
compute missing Bloom filters, with an additional parameter to store
whether or not it needed to do so.
This simplifies many call-sites, since the majority of existing callers
to 'get_bloom_filter' do not want missing Bloom filters to be computed
(so they can drop the parameter entirely and use the simpler version of
the function).
While we're at it, instrument the new 'get_or_compute_bloom_filter()'
with counters in the 'write_commit_graph_context' struct which store
the number of filters that we did and didn't compute, as well as filters
that were truncated.
It would be nice to drop the 'compute_if_not_present' flag entirely,
since all remaining callers of 'get_or_compute_bloom_filter' pass it as
'1', but this will change in a future patch and hence cannot be removed.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).
This patch converts remaining files from the first half of the alphabet,
to keep the diff to a manageable size.
The conversion was done purely mechanically with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe '
s/ARGV_ARRAY/STRVEC/g;
s/argv_array/strvec/g;
'
and then selectively staging files with "git add '[abcdefghjkl]*'".
We'll deal with any indentation/style fallouts separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe 's/argv-array.h/strvec.h/'
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git log -L..." now takes advantage of the "which paths are touched
by this commit?" info stored in the commit-graph system.
* ds/line-log-on-bloom:
line-log: integrate with changed-path Bloom filters
line-log: try to use generation number-based topo-ordering
line-log: more responsive, incremental 'git log -L'
t4211-line-log: add tests for parent oids
line-log: remove unused fields from 'struct line_log_data'
|
|
The previous changes to the line-log machinery focused on making the
first result appear faster. This was achieved by no longer walking the
entire commit history before returning the early results. There is still
another way to improve the performance: walk most commits much faster.
Let's use the changed-path Bloom filters to reduce time spent computing
diffs.
Since the line-log computation requires opening blobs and checking the
content-diff, there is still a lot of necessary computation that cannot
be replaced with changed-path Bloom filters. The part that we can reduce
is most effective when checking the history of a file that is deep in
several directories and those directories are modified frequently. In
this case, the computation to check if a commit is TREESAME to its first
parent takes a large fraction of the time. That is ripe for improvement
with changed-path Bloom filters.
We must ensure that prepare_to_use_bloom_filters() is called in
revision.c so that the bloom_filter_settings are loaded into the struct
rev_info from the commit-graph. Of course, some cases are still
forbidden, but in the line-log case the pathspec is provided in a
different way than normal.
Since multiple paths and segments could be requested, we compute the
struct bloom_key data dynamically during the commit walk. This could
likely be improved, but adds code complexity that is not valuable at
this time.
There are two cases to care about: merge commits and "ordinary" commits.
Merge commits have multiple parents, but if we are TREESAME to our first
parent in every range, then pass the blame for all ranges to the first
parent. Ordinary commits have the same condition, but each is done
slightly differently in the process_ranges_[merge|ordinary]_commit()
methods. By checking if the changed-path Bloom filter can guarantee
TREESAME, we can avoid that tree-diff cost. If the filter says "probably
changed", then we need to run the tree-diff and then the blob-diff if
there was a real edit.
The Linux kernel repository is a good testing ground for the performance
improvements claimed here. There are two different cases to test. The
first is the "entire history" case, where we output the entire history
to /dev/null to see how long it would take to compute the full line-log
history. The second is the "first result" case, where we find how long
it takes to show the first value, which is an indicator of how quickly a
user would see responses when waiting at a terminal.
To test, I selected the paths that were changed most frequently in the
top 10,000 commits using this command (stolen from StackOverflow [1]):
git log --pretty=format: --name-only -n 10000 | sort | \
uniq -c | sort -rg | head -10
which results in
121 MAINTAINERS
63 fs/namei.c
60 arch/x86/kvm/cpuid.c
59 fs/io_uring.c
58 arch/x86/kvm/vmx/vmx.c
51 arch/x86/kvm/x86.c
45 arch/x86/kvm/svm.c
42 fs/btrfs/disk-io.c
42 Documentation/scsi/index.rst
(along with a bogus first result). It appears that the path
arch/x86/kvm/svm.c was renamed, so we ignore that entry. This leaves the
following results for the real command time:
| | Entire History | First Result |
| Path | Before | After | Before | After |
|------------------------------|--------|--------|--------|--------|
| MAINTAINERS | 4.26 s | 3.87 s | 0.41 s | 0.39 s |
| fs/namei.c | 1.99 s | 0.99 s | 0.42 s | 0.21 s |
| arch/x86/kvm/cpuid.c | 5.28 s | 1.12 s | 0.16 s | 0.09 s |
| fs/io_uring.c | 4.34 s | 0.99 s | 0.94 s | 0.27 s |
| arch/x86/kvm/vmx/vmx.c | 5.01 s | 1.34 s | 0.21 s | 0.12 s |
| arch/x86/kvm/x86.c | 2.24 s | 1.18 s | 0.21 s | 0.14 s |
| fs/btrfs/disk-io.c | 1.82 s | 1.01 s | 0.06 s | 0.05 s |
| Documentation/scsi/index.rst | 3.30 s | 0.89 s | 1.46 s | 0.03 s |
It is worth noting that the least speedup comes for the MAINTAINERS file
which is
* edited frequently,
* low in the directory heirarchy, and
* quite a large file.
All of those points lead to spending more time doing the blob diff and
less time doing the tree diff. Still, we see some improvement in that
case and significant improvement in other cases. A 2-4x speedup is
likely the more typical case as opposed to the small 5% change for that
file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The current line-level log implementation performs a preprocessing
step in prepare_revision_walk(), during which the line_log_filter()
function filters and rewrites history to keep only commits modifying
the given line range. This preprocessing affects both responsiveness
and correctness:
- Git doesn't produce any output during this preprocessing step.
Checking whether a commit modified the given line range is
somewhat expensive, so depending on the size of the given revision
range this preprocessing can result in a significant delay before
the first commit is shown.
- Limiting the number of displayed commits (e.g. 'git log -3 -L...')
doesn't limit the amount of work during preprocessing, because
that limit is applied during history traversal. Alas, by that
point this expensive preprocessing step has already churned
through the whole revision range to find all commits modifying the
revision range, even though only a few of them need to be shown.
- It rewrites parents, with no way to turn it off. Without the user
explicitly requesting parent rewriting any parent object ID shown
should be that of the immediate parent, just like in case of a
pathspec-limited history traversal without parent rewriting.
However, after that preprocessing step rewrote history, the
subsequent "regular" history traversal (i.e. get_revision() in a
loop) only sees commits modifying the given line range.
Consequently, it can only show the object ID of the last ancestor
that modified the given line range (which might happen to be the
immediate parent, but many-many times it isn't).
This patch addresses both the correctness and, at least for the common
case, the responsiveness issues by integrating line-level log
filtering into the regular revision walking machinery:
- Make process_ranges_arbitrary_commit(), the static function in
'line-log.c' deciding whether a commit modifies the given line
range, public by removing the static keyword and adding the
'line_log_' prefix, so it can be called from other parts of the
revision walking machinery.
- If the user didn't explicitly ask for parent rewriting (which, I
believe, is the most common case):
- Call this now-public function during regular history traversal,
namely from get_commit_action() to ignore any commits not
modifying the given line range.
Note that while this check is relatively expensive, it must be
performed before other, much cheaper conditions, because the
tracked line range must be adjusted even when the commit will
end up being ignored by other conditions.
- Skip the line_log_filter() call, i.e. the expensive
preprocessing step, in prepare_revision_walk(), because, thanks
to the above points, the revision walking machinery is now able
to filter out commits not modifying the given line range while
traversing history.
This way the regular history traversal sees the unmodified
history, and is therefore able to print the object ids of the
immediate parents of the listed commits. The eliminated
preprocessing step can greatly reduce the delay before the first
commit is shown, see the numbers below.
- However, if the user did explicitly ask for parent rewriting via
'--parents' or a similar option, then stick with the current
implementation for now, i.e. perform that expensive filtering and
history rewriting in the preprocessing step just like we did
before, leaving the initial delay as long as it was.
I tried to integrate line-level log filtering with parent rewriting
into the regular history traversal, but, unfortunately, several
subtleties resisted... :) Maybe someday we'll figure out how to do
that, but until then at least the simple and common (i.e. without
parent rewriting) 'git log -L:func:file' commands can benefit from the
reduced delay.
This change makes the failing 'parent oids without parent rewriting'
test in 't4211-line-log.sh' succeed.
The reduced delay is most noticable when there's a commit modifying
the line range near the tip of a large-ish revision range:
# no parent rewriting requested, no commit-graph present
$ time git --no-pager log -L:read_alternate_refs:sha1-file.c -1 v2.23.0
Before:
real 0m9.570s
user 0m9.494s
sys 0m0.076s
After:
real 0m0.718s
user 0m0.674s
sys 0m0.044s
A significant part of the remaining delay is spent reading and parsing
commit objects in limit_list(). With the help of the commit-graph we
can eliminate most of that reading and parsing overhead, so here are
the timing results of the same command as above, but this time using
the commit-graph:
Before:
real 0m8.874s
user 0m8.816s
sys 0m0.057s
After:
real 0m0.107s
user 0m0.091s
sys 0m0.013s
The next patch will further reduce the remaining delay.
To be clear: this patch doesn't actually optimize the line-level log,
but merely moves most of the work from the preprocessing step to the
history traversal, so the commits modifying the line range can be
shown as soon as they are processed, and the traversal can be
terminated as soon as the given number of commits are shown.
Consequently, listing the full history of a line range, potentially
all the way to the root commit, will take the same time as before (but
at least the user might start reading the output earlier).
Furthermore, if the most recent commit modifying the line range is far
away from the starting revision, then that initial delay will still be
significant.
Additional testing by Derrick Stolee: In the Linux kernel repository,
the MAINTAINERS file was changed ~3,500 times across the ~915,000
commits. In addition to that edit frequency, the file itself is quite
large (~18,700 lines). This means that a significant portion of the
computation is taken up by computing the patch-diff of the file. This
patch improves the real time it takes to output the first result quite
a bit:
Command: git log -L 100,200:MAINTAINERS -n 1 >/dev/null
Before: 3.88 s
After: 0.71 s
If we drop the "-n 1" in the command, then there is no change in
end-to-end process time. This is because the command still needs to
walk the entire commit history, which negates the point of this
patch. This is expected.
As a note for future reference, the ~4.3 seconds in the old code
spends ~2.6 seconds computing the patch-diffs, and the rest of the
time is spent walking commits and computing diffs for which paths
changed at each commit. The changed-path Bloom filters could improve
the end-to-end computation time (i.e. no "-n 1" in the command).
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The behavior of diff_populate_filespec() currently can be customized
through a bitflag, but a subsequent patch requires it to support a
non-boolean option. Replace the bitflag with an options struct.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Optimize unnecessary full-tree diff away from "git log -L" machinery.
* sg/line-log-tree-diff-optim:
line-log: avoid unnecessary full tree diffs
line-log: extract pathspec parsing from line ranges into a helper function
|
|
With rename detection enabled the line-level log is able to trace the
evolution of line ranges across whole-file renames [1]. Alas, to
achieve that it uses the diff machinery very inefficiently, making the
operation very slow [2]. And since rename detection is enabled by
default, the line-level log is very slow by default.
When the line-level log processes a commit with rename detection
enabled, it currently does the following (see queue_diffs()):
1. Computes a full tree diff between the commit and (one of) its
parent(s), i.e. invokes diff_tree_oid() with an empty
'diffopt->pathspec'.
2. Checks whether any paths in the line ranges were modified.
3. Checks whether any modified paths in the line ranges are missing
in the parent commit's tree.
4. If there is such a missing path, then calls diffcore_std() to
figure out whether the path was indeed renamed based on the
previously computed full tree diff.
5. Continues doing stuff that are unrelated to the slowness.
So basically the line-level log computes a full tree diff for each
commit-parent pair in step (1) to be used for rename detection in step
(4) in the off chance that an interesting path is missing from the
parent.
Avoid these expensive and mostly unnecessary full tree diffs by
limiting the diffs to paths in the line ranges. This is much cheaper,
and makes step (2) unnecessary. If it turns out that an interesting
path is missing from the parent, then fall back and compute a full
tree diff, so the rename detection will still work.
Care must be taken when to update the pathspec used to limit the diff
in case of renames. A path might be renamed on one branch and
modified on several parallel running branches, and while processing
commits on these branches the line-level log might have to alternate
between looking at a path's new and old name. However, at any one
time there is only a single 'diffopt->pathspec'.
So add a step (0) to the above to ensure that the paths in the
pathspec match the paths in the line ranges associated with the
currently processed commit, and re-parse the pathspec from the paths
in the line ranges if they differ.
The new test cases include a specially crafted piece of history with
two merged branches and two files, where each branch modifies both
files, renames on of them, and then modifies both again. Then two
separate 'git log -L' invocations check the line-level log of each of
those two files, which ensures that at least one of those invocations
have to do that back-and-forth between the file's old and new name (no
matter which branch is traversed first). 't/t4211-line-log.sh'
already contains two tests involving renames, they don't don't trigger
this back-and-forth.
Avoiding these unnecessary full tree diffs can have huge impact on
performance, especially in big repositories with big trees and mergy
history. Tracing the evolution of a function through the whole
history:
# git.git
$ time git --no-pager log -L:read_alternate_refs:sha1-file.c v2.23.0
Before:
real 0m8.874s
user 0m8.816s
sys 0m0.057s
After:
real 0m2.516s
user 0m2.456s
sys 0m0.060s
# linux.git
$ time ~/src/git/git --no-pager log \
-L:build_restore_work_registers:arch/mips/mm/tlbex.c v5.2
Before:
real 3m50.033s
user 3m48.041s
sys 0m0.300s
After:
real 0m2.599s
user 0m2.466s
sys 0m0.157s
That's just over 88x speedup.
[1] Line-level log's rename following is quite similar to 'git log
--follow path', with the notable differences that it does handle
multiple paths at once as well, and that it doesn't show the
commit performing the rename if it's an exact rename.
[2] This slowness might not have been apparent initially, because back
when the line-level log feature was introduced rename detection
was not yet enabled by default; 12da1d1f6f (Implement line-history
search (git log -L), 2013-03-28) and 5404c116aa (diff: activate
diff.renames by default, 2016-02-25).
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A helper function to parse the paths involved in the line ranges and
to turn them into a pathspec will be useful in the next patch.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git merge-recursive" backend recently learned a new heuristics to
infer file movement based on how other files in the same directory
moved. As this is inherently less robust heuristics than the one
based on the content similarity of the file itself (rather than
based on what its neighbours are doing), it sometimes gives an
outcome unexpected by the end users. This has been toned down to
leave the renamed paths in higher/conflicted stages in the index so
that the user can examine and confirm the result.
* en/merge-directory-renames:
merge-recursive: switch directory rename detection default
merge-recursive: give callers of handle_content_merge() access to contents
merge-recursive: track information associated with directory renames
t6043: fix copied test description to match its purpose
merge-recursive: switch from (oid,mode) pairs to a diff_filespec
merge-recursive: cleanup handle_rename_* function signatures
merge-recursive: track branch where rename occurred in rename struct
merge-recursive: remove ren[12]_other fields from rename_conflict_info
merge-recursive: shrink rename_conflict_info
merge-recursive: move some struct declarations together
merge-recursive: use 'ci' for rename_conflict_info variable name
merge-recursive: rename locals 'o' and 'a' to 'obuf' and 'abuf'
merge-recursive: rename diff_filespec 'one' to 'o'
merge-recursive: rename merge_options argument from 'o' to 'opt'
Use 'unsigned short' for mode, like diff_filespec does
|
|
struct diff_filespec defines mode to be an 'unsigned short'. Several
other places in the API which we'd like to interact with using a
diff_filespec used a plain unsigned (or unsigned int). This caused
problems when taking addresses, so switch to unsigned short.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When "-L" is in use, we ignore any diff output format that the user
provides to us, and just always print a patch (with extra context lines
covering the whole area of interest). It's not entirely clear what we
should do with all formats (e.g., should "--stat" show just the diffstat
of the touched lines, or the stat for the whole file?).
But "-s" is pretty clear: the user probably wants to see just the
commits that touched those lines, without any diff at all. Let's at
least make that work.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
diff and textconv code has so widespread use that it's hard to simply
update their api and all call sites at once because it would result in
a big patch. For now reduce the_index references to two places:
diff_setup() and fill_textconv().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
lookup_commit_reference() and friends have been updated to find
in-core object for a specific in-core repository instance.
* sb/object-store-lookup: (32 commits)
commit.c: allow lookup_commit_reference to handle arbitrary repositories
commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories
tag.c: allow deref_tag to handle arbitrary repositories
object.c: allow parse_object to handle arbitrary repositories
object.c: allow parse_object_buffer to handle arbitrary repositories
commit.c: allow get_cached_commit_buffer to handle arbitrary repositories
commit.c: allow set_commit_buffer to handle arbitrary repositories
commit.c: migrate the commit buffer to the parsed object store
commit-slabs: remove realloc counter outside of slab struct
commit.c: allow parse_commit_buffer to handle arbitrary repositories
tag: allow parse_tag_buffer to handle arbitrary repositories
tag: allow lookup_tag to handle arbitrary repositories
commit: allow lookup_commit to handle arbitrary repositories
tree: allow lookup_tree to handle arbitrary repositories
blob: allow lookup_blob to handle arbitrary repositories
object: allow lookup_object to handle arbitrary repositories
object: allow object_as_type to handle arbitrary repositories
tag: add repository argument to deref_tag
tag: add repository argument to parse_tag_buffer
tag: add repository argument to lookup_tag
...
|
|
Parsing of -L[<N>][,[<M>]] parameters "git blame" and "git log"
take has been tweaked.
* is/parsing-line-range:
log: prevent error if line range ends past end of file
blame: prevent error if range ends past end of file
|
|
Add a repository argument to allow the callers of deref_tag
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If the -L option is used to specify a line range in git log, and the end
of the range is past the end of the file, git will fail with a fatal
error. This commit prevents such behaviour - instead we perform the log
for existing lines within the specified range.
This commit also fixes a corner case where -L ,-n:file would be treated
as a log over the whole file. Now we treat this as -L 1,-n:file and
blame the first line of the file instead.
Signed-off-by: Isabella Stephens <istephens@atlassian.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code has been taught to use the duplicated information stored
in the commit-graph file to learn the tree object name for a commit
to avoid opening and parsing the commit object when it makes sense
to do so.
* ds/lazy-load-trees:
coccinelle: avoid wrong transformation suggestions from commit.cocci
commit-graph: lazy-load trees for commits
treewide: replace maybe_tree with accessor methods
commit: create get_commit_tree() method
treewide: rename tree to maybe_tree
|
|
In anticipation of making trees load lazily, create a Coccinelle
script (contrib/coccinelle/commit.cocci) to ensure that all
references to the 'maybe_tree' member of struct commit are either
mutations or accesses through get_commit_tree() or
get_commit_tree_oid().
Apply the Coccinelle script to create the rest of the patch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using the commit-graph file to walk commit history removes the large
cost of parsing commits during the walk. This exposes a performance
issue: lookup_tree() takes a large portion of the computation time,
even when Git never uses those trees.
In anticipation of lazy-loading these trees, rename the 'tree' member
of struct commit to 'maybe_tree'. This serves two purposes: it hints
at the future role of possibly being NULL even if the commit has a
valid tree, and it allows for unambiguous transformation from simple
member access (i.e. commit->maybe_tree) to method access.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert get_tree_entry and find_tree_entry to take pointers to struct
object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
|
|
Replace occurrences of `free(ptr); ptr = NULL` which weren't caught by
the coccinelle rule. These fall into two categories:
- free/NULL assignments one after the other which coccinelle all put
on one line, which is functionally equivalent code, but very ugly.
- manually spotted occurrences where the NULL assignment isn't right
after the free() call.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Discovered by Coverity.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The command-line parsing of "git log -L" copied internal data
structures using incorrect size on ILP32 systems.
* vn/line-log-memcpy-size-fix:
line-log: use COPY_ARRAY to fix mis-sized memcpy
|
|
The code to parse "git log -L..." command line was buggy when there
are many ranges specified with -L; overrun of the allocated buffer
has been fixed.
* ax/line-log-range-merge-fix:
line-log.c: prevent crash during union of too many ranges
|
|
This memcpy meant to get the sizeof a "struct range", not a
"range_set", as the former is what our array holds. Rather
than swap out the types, let's convert this site to
COPY_ARRAY, which avoids the problem entirely (and confirms
that the src and dst types match).
Note for curiosity's sake that this bug doesn't trigger on
I32LP64 systems, but does on ILP32 systems. The mistaken
"struct range_set" has two ints and a pointer. That's 16
bytes on LP64, or 12 on ILP32. The correct "struct range"
type has two longs, which is also 16 on LP64, but only 8 on
ILP32.
Likewise an IL32P64 system would experience the bug.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The existing implementation of range_set_union does not correctly
reallocate memory, leading to a heap overflow when it attempts to union
more than 24 separate line ranges.
For struct range_set *out to grow correctly it must have out->nr set to
the current size of the buffer when it is passed to range_set_grow.
However, the existing implementation of range_set_union only updates
out->nr at the end of the function, meaning that it is always zero
before this. This results in range_set_grow never growing the buffer, as
well as some of the union logic itself being incorrect as !out->nr is
always true.
The reason why 24 is the limit is that the first allocation of size 1
ends up allocating a buffer of size 24 (due to the call to alloc_nr in
ALLOC_GROW). This goes some way to explain why this hasn't been
caught before.
Fix the problem by correctly updating out->nr after reallocating the
range_set. As this results in out->nr containing the same value as the
variable o, replace o with out->nr as well.
Finally, add a new test to help prevent the problem reoccurring in the
future. Thanks to Vegard Nossum for writing the test.
Signed-off-by: Allan Xavier <allan.x.xavier@oracle.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT. The resulting code is
shorter and supports empty arrays with NULL pointers.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Conversion from unsigned char sha1[20] to struct object_id
continues.
* bc/cocci:
diff: convert prep_temp_blob() to struct object_id
merge-recursive: convert merge_recursive_generic() to object_id
merge-recursive: convert leaf functions to use struct object_id
merge-recursive: convert struct merge_file_info to object_id
merge-recursive: convert struct stage_data to use object_id
diff: rename struct diff_filespec's sha1_valid member
diff: convert struct diff_filespec to struct object_id
coccinelle: apply object_id Coccinelle transformations
coccinelle: convert hashcpy() with null_sha1 to hashclr()
contrib/coccinelle: add basic Coccinelle transforms
hex: add oid_to_hex_r()
|
|
The commands in the "log/diff" family have had an FILE* pointer in the
data structure they pass around for a long time, but some codepaths
used to always write to the standard output. As a preparatory step
to make "git format-patch" available to the internal callers, these
codepaths have been updated to consistently write into that FILE*
instead.
* js/log-to-diffopt-file:
mingw: fix the shortlog --output=<file> test
diff: do not color output when --color=auto and --output=<file> is given
t4211: ensure that log respects --output=<file>
shortlog: respect the --output=<file> setting
format-patch: use stdout directly
format-patch: avoid freopen()
format-patch: explicitly switch off color when writing to files
shortlog: support outputting to streams other than stdout
graph: respect the diffopt.file setting
line-log: respect diffopt's configured output file stream
log-tree: respect diffopt's configured output file stream
log: prepare log/log-tree to reuse the diffopt.close_file attribute
|
|
Now that this struct's sha1 member is called "oid", update the comment
and the sha1_valid member to be called "oid_valid" instead. The
following Coccinelle semantic patch was used to implement this, followed
by the transformations in object_id.cocci:
@@
struct diff_filespec o;
@@
- o.sha1_valid
+ o.oid_valid
@@
struct diff_filespec *p;
@@
- p->sha1_valid
+ p->oid_valid
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert struct diff_filespec's sha1 member to use a struct object_id
called "oid" instead. The following Coccinelle semantic patch was used
to implement this, followed by the transformations in object_id.cocci:
@@
struct diff_filespec o;
@@
- o.sha1
+ o.oid.hash
@@
struct diff_filespec *p;
@@
- p->sha1
+ p->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* jc/deref-tag:
blame, line-log: do not loop around deref_tag()
|
|
The diff machinery can optionally output to a file stream other than
stdout, by overriding diffopt.file. In such a case, the rest of the
log tree machinery should also write to that stream.
Currently, there is no user of the line level log that wants to
redirect output to a file. Therefore, one might argue that it is
superfluous to support that now. However, it is better to be
consistent now, rather than to face hard-to-debug problems later.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These callers appear to expect that deref_tag() is to peel one layer
of a tag, but the function does not work that way; it has its own
loop to unwrap tags until an object that is not a tag appears.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:
1. It automatically checks the array-size multiplication
for overflow.
2. It always uses sizeof(*array) for the element-size,
so that it can never go out of sync with the declared
type of the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are many manual argv allocations that predate the
argv_array API. Switching to that API brings a few
advantages:
1. We no longer have to manually compute the correct final
array size (so it's one less thing we can screw up).
2. In many cases we had to make a separate pass to count,
then allocate, then fill in the array. Now we can do it
in one pass, making the code shorter and easier to
follow.
3. argv_array handles memory ownership for us, making it
more obvious when things should be free()d and and when
not.
Most of these cases are pretty straightforward. In some, we
switch from "run_command_v" to "run_command" which lets us
directly use the argv_array embedded in "struct
child_process".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object. This provides no
functional change, as it is essentially a macro substitution.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
|
|
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash. Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
|
|
|
|
When we call into xdiff to perform a diff, we generally lose
the return code completely. Typically by ignoring the return
of our xdi_diff wrapper, but sometimes we even propagate
that return value up and then ignore it later. This can
lead to us silently producing incorrect diffs (e.g., "git
log" might produce no output at all, not even a diff header,
for a content-level diff).
In practice this does not happen very often, because the
typical reason for xdiff to report failure is that it
malloc() failed (it uses straight malloc, and not our
xmalloc wrapper). But it could also happen when xdiff
triggers one our callbacks, which returns an error (e.g.,
outf() in builtin/rerere.c tries to report a write failure
in this way). And the next patch also plans to add more
failure modes.
Let's notice an error return from xdiff and react
appropriately. In most of the diff.c code, we can simply
die(), which matches the surrounding code (e.g., that is
what we do if we fail to load a file for diffing in the
first place). This is not that elegant, but we are probably
better off dying to let the user know there was a problem,
rather than simply generating bogus output.
We could also just die() directly in xdi_diff, but the
callers typically have a bit more context, and can provide a
better message (and if we do later decide to pass errors up,
we're one step closer to doing so).
There is one interesting case, which is in diff_grep(). Here
if we cannot generate the diff, there is nothing to match,
and we silently return "no hits". This is actually what the
existing code does already, but we make it a little more
explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"color.diff.plain" was a misnomer; give it 'color.diff.context' as
a more logical synonym.
* jk/color-diff-plain-is-context:
diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT
diff: accept color.diff.context as a synonym for "plain"
|
|
The latter is a much more descriptive name (and we support
"color.diff.context" now). This also updates the name of any
local variables which were used to store the color.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* sb/line-log-plug-pairdiff-leak:
line-log.c: fix a memleak
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The old message did not mention the :regex:file form.
To avoid overly long lines, split the message into two lines (in case
item->string is long, it will be the only part truncated in a narrow
terminal).
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `filepair` is assigned new memory with any iteration via
process_diff_filepair, so free it before the current iteration ends.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
No external callers exist.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git log --first-parent -L..." used to crash.
* tm/line-log-first-parent:
line-log: fix crash when --first-parent is used
|
|
line-log tries to access all parents of a commit, but only the first
parent has been loaded if "--first-parent" is specified, resulting
in a crash.
Limit the number of parents to one if "--first-parent" is specified.
Reported-by: Eric N. Vander Weele <ericvw@gmail.com>
Signed-off-by: Tzvetan Mikov <tmikov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Call commit_list_count() instead of open-coding it repeatedly.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since diff_tree_sha1() can now accept empty trees via NULL sha1, we
could just call it without manually reading trees into tree_desc and
duplicating code.
Cc: Thomas Rast <tr@thomasrast.ch>
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
All callers to parse_pathspec() must choose between getting no
pathspec or one path that is limited to the current directory
when there is no paths given on the command line, but there were
two callers that violated this rule, triggering a BUG().
* nd/magic-pathspec:
Fix calling parse_pathspec with no paths nor PATHSPEC_PREFER_* flags
|
|
When parse_pathspec() is called with no paths, the behavior could be
either return no paths, or return one path that is cwd. Some commands
do the former, some the latter. parse_pathspec() itself does not make
either the default and requires the caller to specify either flag if
it may run into this situation.
I've grep'd through all parse_pathspec() call sites. Some pass
neither, but those are guaranteed never pass empty path to
parse_pathspec(). There are two call sites that may pass empty path
and are fixed with this patch.
[jc: added a test from Antoine's bug report]
Reported-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
|
|
This is complicated slightly by having to remember the previous -L range
for each file specified via -L<range>:file.
The existing implementation coalesces ranges for each file as each -L is
parsed which makes it impossible to refer back to the previous -L range
for any particular file. Re-implement to instead store each file's set
of -L ranges verbatim, and then coalesce the ranges in a post-processing
step.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Range specification -L/RE/ for blame/log unconditionally begins
searching at line one. Mailing list discussion [1] suggests that, in the
presence of multiple -L options, -L/RE/ should search relative to the
endpoint of the previous -L range, if any.
Teach the parsing machinery underlying blame's and log's -L options to
accept a start point for -L/RE/ searches. Follow-up patches will upgrade
blame and log to take advantage of this ability.
[1]: http://thread.gmane.org/gmane.comp.version-control.git/229755/focus=229966
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
git-blame is slated to accept multiple -L ranges. git-log already
accepts multiple -L's but its implementation of range-set, which
organizes and normalizes -L ranges, is private. Publish the small
subset of range-set API which is needed for git-blame multiple -L
support.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When 12da1d1f added -L support to git-log, a broken bounds check was
copied from git-blame -L which incorrectly allows -LX to extend one line
past end of file without reporting an error. Instead, it generates an
empty range. Fix this bug.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
range-set invariants are: ranges must be (1) non-empty, (2) disjoint,
(3) sorted in ascending order.
line_log_data_insert() breaks the non-empty invariant under the
following conditions: the incoming range is empty and the pathname
attached to the range has not yet been encountered. In this case,
line_log_data_insert() assigns the empty range to a new line_log_data
record without taking any action to ensure that the empty range is
eventually folded out. Subsequent range-set functions crash or throw an
assertion failure upon encountering such an anomaly. Fix this bug.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Acked-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
range-set invariants are: ranges must be (1) non-empty, (2) disjoint,
(3) sorted in ascending order.
During processing, various range-set utility functions break the
invariants (for instance, by adding empty ranges), with the
expectation that a finalizing sort_and_merge_range_set() will restore
sanity.
sort_and_merge_range_set(), however, neglects to fold out empty
ranges, thus it fails to satisfy the non-empty constraint. Subsequent
range-set functions crash or throw an assertion failure upon
encountering such an anomaly. Rectify the situation by having
sort_and_merge_range_set() fold out empty ranges.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Acked-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When handed an empty range_set (range_set.nr == 0),
sort_and_merge_range_set() incorrectly sets range_set.nr to 1 at exit.
Subsequent range_set functions then access the bogus range at element
zero and crash or throw an assertion failure. Fix this bug.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Acked-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When coalescing ranges, sort_and_merge_range_set() unconditionally
assumes that the end of a range being folded into a preceding range
should become the end of the coalesced range. This assumption, however,
is invalid when one range is a subset of another. For example, given
ranges 1-5 and 2-3 added via range_set_append_unsafe(),
sort_and_merge_range_set() incorrectly coalesces them to range 1-3
rather than the correct union range 1-5. Fix this bug.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The funny range assignment in process_all_files() had me sidetracked
while investigating what led to the previous commit. Let's improve
the comments.
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
line_log_data has held a diff_filespec* since the very early versions
of the code. However, the only place in the code where we actually
need the full filespec is parse_range_arg(); in all other cases, we
are only interested in the path, so there is hardly a reason to store
a filespec. Even worse, it causes a lot of redundant ->spec->path
pointer dereferencing.
And *even* worse, it caused the following bug. If you merge a rename
with a modification to the old filename, like so:
* Merge
| \
| * Modify foo
| |
* | Rename foo->bar
| /
* Create foo
we internally -- in process_ranges_merge_commit() -- scan all parents.
We are mainly looking for one that doesn't have any modifications, so
that we can assign all the blame to it and simplify away the merge.
In doing so, we run the normal machinery on all parents in a loop.
For each parent, we prepare a "working set" line_log_data by making a
copy with line_log_data_copy(), which does *not* make a copy of the
spec.
Now suppose the rename is the first parent. The diff machinery tells
us that the filepair is ('foo', 'bar'). We duly update the path we
are interested in:
rg->spec->path = xstrdup(pair->one->path);
But that 'struct spec' is shared between the output line_log_data and
the original input line_log_data. So we just wrecked the state of
process_ranges_merge_commit(). When we get around to the second
parent, the ranges tell us we are interested in a file 'foo' while the
commits touch 'bar'.
So most of this patch is just s/->spec->path/->path/ and associated
management changes. This implicitly fixes the bug because we removed
the shared parts between input and output of line_log_data_copy(); it
is now safe to overwrite the path in the copy.
There's one only somewhat related change: the comment in
process_all_files() explains the reasoning behind using 'range' there.
That bit of half-correct code had me sidetracked for a while.
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The existing code was too defensive, and would trigger the assert in
range_set_append() if the user gave overlapping ranges.
The intent was always to define overlapping ranges as just the union
of all of them, as evidenced by the call to sort_and_merge_range_set().
(Which was already used, unlike what the comment said.)
Fix by splitting out the meat of range_set_append() to a new _unsafe()
function that lacks the paranoia. sort_and_merge_range_set will fix
up the ranges, so we don't need the checks there.
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
lookup_line_range() is a good place to check that the range sets
satisfy the invariants: they have been computed and set in earlier
iterations, and we now start working with them.
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
So far log -L only used the implicit diff filtering by pathspec. If
the user specifies -M, we cannot do that, and so we simply handed the
whole diff queue (which is approximately 'git show --raw') to
diffcore_std().
Unfortunately this is very slow. We can optimize a lot if we throw
out files that we know cannot possibly be interesting, in the same
spirit that the pathspec filtering reduces the number of files.
However, in this case, we have to be more careful. Because we want to
look out for renames, we need to keep all filepairs where something
was deleted.
This is a bit hacky and should really be replaced by equivalent
support in --follow, and just using that. However, in the meantime it
speeds up 'log -M -L' by an order of magnitude.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This new syntax finds a funcname matching /pattern/, and then takes from there
up to (but not including) the next funcname. So you can say
git log -L:main:main.c
and it will dig up the main() function and show its line-log, provided
there are no other funcnames matching 'main'.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is a rewrite of much of Bo's work, mainly in an effort to split
it into smaller, easier to understand routines.
The algorithm is built around the struct range_set, which encodes a
series of line ranges as intervals [a,b). This is used in two
contexts:
* A set of lines we are tracking (which will change as we dig through
history).
* To encode diffs, as pairs of ranges.
The main routine is range_set_map_across_diff(). It processes the
diff between a commit C and some parent P. It determines which diff
hunks are relevant to the ranges tracked in C, and computes the new
ranges for P.
The algorithm is then simply to process history in topological order
from newest to oldest, computing ranges and (partial) diffs. At
branch points, we need to merge the ranges we are watching. We will
find that many commits do not affect the chosen ranges, and mark them
TREESAME (in addition to those already filtered by pathspec limiting).
Another pass of history simplification then gets rid of such commits.
This is wired as an extra filtering pass in the log machinery. This
currently only reduces code duplication, but should allow for other
simplifications and options to be used.
Finally, we hook a diff printer into the output chain. Ideally we
would wire directly into the diff logic, to optionally use features
like word diff. However, that will require some major reworking of
the diff chain, so we completely replace the output with our own diff
for now.
As this was a GSoC project, and has quite some history by now, many
people have helped. In no particular order, thanks go to
Jakub Narebski <jnareb@gmail.com>
Jens Lehmann <Jens.Lehmann@web.de>
Jonathan Nieder <jrnieder@gmail.com>
Junio C Hamano <gitster@pobox.com>
Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Will Palmer <wmpalmer@gmail.com>
Apologies to everyone I forgot.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|