| Age | Commit message (Collapse) | Author | Files | Lines |
|
In show_line(), we check to see if colors are desired with just:
if (opt->color)
...we want colors...
But this is incorrect. The color field here is really a git_colorbool,
so it may be "true" for GIT_COLOR_UNKNOWN or GIT_COLOR_AUTO. Either of
those _might_ end up true eventually (once we apply default fallbacks
and check stdout's tty), but they may not. E.g.:
git grep foo | cat
will enter the conditional even though we're not going to show colors.
We should collapse it into a true boolean by calling want_color().
It turns out that this does not produce a user-visible bug. We do some
extra processing to isolate the matched portion of the line in order to
colorize it, but ultimately we pass it to our output_color() helper,
which does correctly check want_color(). So we end up with no colors.
But dropping the extra processing saves a measurable amount of time. For
example, running under hyperfine (which redirects to /dev/null, and thus
does not colorize):
Benchmark 1: ./git.old grep a
Time (mean ± σ): 58.7 ms ± 3.5 ms [User: 580.6 ms, System: 74.3 ms]
Range (min … max): 53.5 ms … 67.1 ms 48 runs
Benchmark 2: ./git.new grep a
Time (mean ± σ): 35.5 ms ± 0.9 ms [User: 276.8 ms, System: 73.8 ms]
Range (min … max): 34.3 ms … 39.3 ms 79 runs
Summary
./git.new grep a ran
1.65 ± 0.11 times faster than ./git.old grep a
That's a fairly extreme benchmark, just because it will come up with a
ton of small matches, but it shows that this really does matter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Rename `repo_read_object_file()` to `odb_read_object()` to match other
functions related to the object database and our modern coding
guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.
This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.
There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:
- "builtin/grep.c"
- "grep.c"
- "refs/debug.c"
There are also two non-trivial exceptions:
- "diff-no-index.c": Here we know that we may not have a repository
initialized at all, so we cannot rely on `the_repository`. Instead,
we adapt `diff_no_index()` to get a `struct git_hash_algo` as
parameter. The only caller is located in "builtin/diff.c", where we
know to call `repo_set_hash_algo()` in case we're running outside of
a Git repository. Consequently, it is fine to continue passing
`the_repository->hash_algo` even in this case.
- "builtin/ls-files.c": There is an in-flight patch series that drops
`USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
conflict because we use `null_oid()` in `show_submodule()`. The
value is passed to `repo_submodule_init()`, which may use the object
ID to resolve a tree-ish in the superproject from which we want to
read the submodule config. As such, the object ID should refer to an
object in the superproject, and consequently we need to use its hash
algorithm.
This means that we could in theory just not bother about this edge
case at all and just use `the_repository` in "diff-no-index.c". But
doing so would feel misdesigned.
Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More leakfixes.
* ps/leakfixes-part-9: (22 commits)
list-objects-filter-options: work around reported leak on error
builtin/merge: release output buffer after performing merge
dir: fix leak when parsing "status.showUntrackedFiles"
t/helper: fix leaking buffer in "dump-untracked-cache"
t/helper: stop re-initialization of `the_repository`
sparse-index: correctly free EWAH contents
dir: release untracked cache data
combine-diff: fix leaking lost lines
builtin/tag: fix leaking key ID on failure to sign
transport-helper: fix leaking import/export marks
builtin/commit: fix leaking cleanup config
trailer: fix leaking strbufs when formatting trailers
trailer: fix leaking trailer values
builtin/commit: fix leaking change data contents
upload-pack: fix leaking URI protocols
pretty: clear signature check
diff-lib: fix leaking diffopts in `do_diff_cache()`
revision: fix leaking bloom filters
builtin/grep: fix leak with `--max-count=0`
grep: fix leak in `grep_splice_or()`
...
|
|
In `grep_splice_or()` we search for the next `TRUE` node in our tree of
grep expressions and replace it with the given new expression. But we
don't free the old node, which causes a memory leak. Plug it.
This leak is exposed by t7810, but plugging it alone isn't sufficient to
make the test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
regexec(3) can fail. E.g. on macOS it fails if it is used with an UTF-8
locale to match a valid regex against a buffer containing invalid UTF-8
characters.
git grep has two ways to search for matches in a file: Either it splits
its contents into lines and matches them separately, or it matches the
whole content and figures out line boundaries later. The latter is done
by look_ahead() and it's quicker in the common case where most files
don't contain a match.
Fall back to line-by-line matching if look_ahead() encounters an
regexec(3) error by propagating errors out of patmatch() and bailing out
of look_ahead() if there is one. This way we at least can find matches
in lines that contain only valid characters. That matches the behavior
of grep(1) on macOS.
pcre2match() dies if pcre2_jit_match() or pcre2_match() fail, but since
we use the flag PCRE2_MATCH_INVALID_UTF it handles invalid UTF-8
characters gracefully. So implement the fall-back only for regexec(3)
and leave the PCRE2 matching unchanged.
Reported-by: David Gstir <david@sigma-star.at>
Signed-off-by: René Scharfe <l.s.r@web.de>
Tested-by: David Gstir <david@sigma-star.at>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
|
|
When creating a pattern via `create_grep_pat()` we allocate the pattern
member of the structure regardless of the token type. But later, when we
try to free the structure, we free the pattern member conditionally on
the token type and thus leak memory.
Plug this leak. The leak is exposed by t7814, but plugging it alone does
not make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We provide custom malloc/free callbacks for the pcre library to use.
Those take an extra "data" parameter, but we don't use it. Back when
these were added in 513f2b0bbd (grep: make PCRE2 aware of custom
allocator, 2019-10-16), we only had MAYBE_UNUSED.
But these days we have UNUSED, which we should prefer, as it will
let the compiler inform us if the code changes to actually use the
parameters.
I also moved the annotations to come after the variable name, which is
how we typically spell it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
4aa2c4753d (grep: -W: don't extend context to trailing empty lines,
2016-05-28) stopped showing empty lines at the end of function context
when using -W. Do the same for trailing empty lines at the end of
files, for consistency -- it doesn't matter whether a function section
is ended by the next function or the end of the file.
Test it by adding a trailing empty line to the file used by the test
"grep -W" and leave its expected output the same.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Imagine you want to grep for (. Easy:
$ git grep '('
fatal: unmatched parenthesis
uhoh. This is plainly wrong. Unless you know specifically that
(a) git grep has expression groups and '(' ... ')' are used for them.
(b) you can use -e '(' to explicitly say '(' is what you are looking
for, not the beginning of a group.
Similarly,
$ git grep ')'
fatal: incomplete pattern expression: )
is somehow worse. ")" is a complete regular expression pattern.
Of course, the error wants to say "group" here.
In this case it is also not "incomplete", it is unmatched.
Make them say
$ ./git grep '('
fatal: unmatched ( for expression group
$ ./git grep ')'
fatal: incomplete pattern expression group: )
which are clearer in indicating that it is not the expression that
is wrong (since no pattern had been parsed at all), but rather that
it is been misconstrued as a grouping operator.
Link: https://bugs.debian.org/1051205
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When USE_LIBPCRE2 is not defined, we compile several noop fallbacks.
These need to have their parameters annotated to avoid
-Wunused-parameter warnings (and obviously we cannot remove the
parameters, since the functions must match the non-fallback versions).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is a callback used with grep_options.output, but we don't look at
the grep_opt parameter, as we're just writing the output to stdout.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Further shuffling of declarations across header files to streamline
file dependencies.
* cw/compat-util-header-cleanup:
git-compat-util: move alloc macros to git-compat-util.h
treewide: remove unnecessary includes for wrapper.h
kwset: move translation table from ctype
sane-ctype.h: create header for sane-ctype macros
git-compat-util: move wrapper.c funcs to its header
git-compat-util: move strbuf.c funcs to its header
|
|
Reduce reliance on a global state in the config reading API.
* gc/config-context:
config: pass source to config_parser_event_fn_t
config: add kvi.path, use it to evaluate includes
config.c: remove config_reader from configsets
config: pass kvi to die_bad_number()
trace2: plumb config kvi
config.c: pass ctx with CLI config
config: pass ctx with config files
config.c: pass ctx in configsets
config: add ctx arg to config_fn_t
urlmatch.h: use config_fn_t type
config: inline git_color_default_config
|
|
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More header clean-up.
* en/header-split-cache-h-part-2: (22 commits)
reftable: ensure git-compat-util.h is the first (indirect) include
diff.h: reduce unnecessary includes
object-store.h: reduce unnecessary includes
commit.h: reduce unnecessary includes
fsmonitor: reduce includes of cache.h
cache.h: remove unnecessary headers
treewide: remove cache.h inclusion due to previous changes
cache,tree: move basic name compare functions from read-cache to tree
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
hash-ll.h: split out of hash.h to remove dependency on repository.h
tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
dir.h: move DTYPE defines from cache.h
versioncmp.h: move declarations for versioncmp.c functions from cache.h
ws.h: move declarations for ws.c functions from cache.h
match-trees.h: move declarations for match-trees.c functions from cache.h
pkt-line.h: move declarations for pkt-line.c functions from cache.h
base85.h: move declarations for base85.c functions from cache.h
copy.h: move declarations for copy.c functions from cache.h
server-info.h: move declarations for server-info.c functions from cache.h
packfile.h: move pack_window and pack_entry from cache.h
...
|
|
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
cache.h's nature of a dumping ground of includes prevented it from
being included in some compat/ files, forcing us into a workaround
of having a double forward declaration of the read_in_full() function
(see commit 14086b0a13 ("compat/pread.c: Add a forward declaration to
fix a warning", 2007-11-17)). Now that we have moved functions like
read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered
with unrelated and scary #defines, get rid of the extra forward
declaration and just have compat/pread.c include wrapper.h.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.
* en/header-split-cleanup:
csum-file.h: remove unnecessary inclusion of cache.h
write-or-die.h: move declarations for write-or-die.c functions from cache.h
treewide: remove cache.h inclusion due to setup.h changes
setup.h: move declarations for setup.c functions from cache.h
treewide: remove cache.h inclusion due to environment.h changes
environment.h: move declarations for environment.c functions from cache.h
treewide: remove unnecessary includes of cache.h
wrapper.h: move declarations for wrapper.c functions from cache.h
path.h: move function declarations for path.c functions from cache.h
cache.h: remove expand_user_path()
abspath.h: move absolute path functions from cache.h
environment: move comment_line_char from cache.h
treewide: remove unnecessary cache.h inclusion from several sources
treewide: remove unnecessary inclusion of gettext.h
treewide: be explicit about dependence on gettext.h
treewide: remove unnecessary cache.h inclusion from a few headers
|
|
A recent-ish change to allow unicode character classes to be used
with "grep -P" triggered a JIT bug in older pcre2 libraries.
The problematic change in Git built with these older libraries has
been disabled to work around the bug.
* mk/workaround-pcre-jit-ucp-bug:
grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34
|
|
Stephane is reporting[1] a regression introduced in git v2.40.0 that leads
to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an
older version of libpcre2 that triggers a wild pointer dereference in
the generated JIT code that was fixed in PCRE2 10.35.
Instead of completely disabling the JIT compiler for the buggy version,
just mask out the Unicode property handling as we used to do prior to
commit acabd2048ee0 ("grep: correctly identify utf-8 characters with
\{b,w} in -P").
[1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@clumio.com/
Reported-by: Stephane Odul <stephane@clumio.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A number of files were apparently including cache.h solely to get
gettext.h. By making those files explicitly include gettext.h, we can
already drop the include of cache.h in these files. On top of that,
there were some files using cache.h that didn't need to for any reason.
Remove these unnecessary includes.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Leak fixes.
* ab/various-leak-fixes:
push: free_refs() the "local_refs" in set_refspecs()
push: refactor refspec_append_mapped() for subsequent leak-fix
receive-pack: release the linked "struct command *" list
grep API: plug memory leaks by freeing "header_list"
grep.c: refactor free_grep_patterns()
builtin/merge.c: free "&buf" on "Your local changes..." error
builtin/merge.c: use fixed strings, not "strbuf", fix leak
show-branch: free() allocated "head" before return
commit-graph: fix a parse_options_concat() leak
http-backend.c: fix cmd_main() memory leak, refactor reg{exec,free}()
http-backend.c: fix "dir" and "cmd_arg" leaks in cmd_main()
worktree: fix a trivial leak in prune_worktrees()
repack: fix leaks on error with "goto cleanup"
name-rev: don't xstrdup() an already dup'd string
various: add missing clear_pathspec(), fix leaks
clone: use free() instead of UNLEAK()
commit-graph: use free_commit_graph() instead of UNLEAK()
bundle.c: don't leak the "args" in the "struct child_process"
tests: mark tests as passing with SANITIZE=leak
|
|
In an environment where dynamically generated code is prohibited to
run (e.g. SELinux), failure to JIT pcre patterns is expected. Fall
back to interpreted execution in such a case.
* cb/grep-fallback-failing-jit:
grep: fall back to interpreter if JIT memory allocation fails
|
|
When the "header_list" struct member was added in [1], freeing this
field was neglected. Fix that now, so that commands like
./git -P log -1 --color=always --author=A origin/master
will run leak-free.
1. 80235ba79ef ("log --author=me --grep=it" should find intersection,
not union, 2010-01-17)
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor the free_grep_patterns() function to split out the freeing of
the "struct grep_pat" it contains. Right now we're only freeing the
"pattern_list", but we should be freeing another member of the same
type, which we'll do in the subsequent commit.
Let's also replace the "return" if we don't have an
"opt->pattern_expression" with a conditional call of
free_pattern_expr().
Before db84376f981 (grep.c: remove "extended" in favor of
"pattern_expression", fix segfault, 2022-10-11) the pattern here was:
if (!x)
return;
free_pattern_expr(y);
While at it, instead of:
if (!x)
return;
free_pattern_expr(x);
Let's instead do:
if (x)
free_pattern_expr(x);
This will make it easier to free additional members from
free_grep_patterns() in the future.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT
enabled, the allocation of PCRE2's JIT rwx memory may be prohibited,
making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48):
[user@fedora git]$ git grep -c PCRE2_JIT
grep.c:1
[user@fedora git]$ # Enable SELinux's W^X policy
[user@fedora git]$ sudo semanage boolean -m -1 deny_execmem
[user@fedora git]$ # JIT memory allocation fails, breaking 'git grep'
[user@fedora git]$ git grep -c PCRE2_JIT
fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48'
Instead of failing hard in this case and making 'git grep' unusable on
such systems, simply fall back to interpreter mode, leading to a much
better user experience.
As having a functional PCRE2 JIT compiler is a legitimate use case for
performance reasons, we'll only do the fallback if the supposedly
available JIT is found to be non-functional by attempting to JIT compile
a very simple pattern. If this fails, JIT is deemed to be non-functional
and we do the interpreter fallback. For all other cases, i.e. the simple
pattern can be compiled but the user provided cannot, we fail hard as we
do now as the reason for the failure must be the pattern itself. To aid
users in helping themselves change the error message to include a hint
about the '(*NO_JIT)' prefix. Also clip the pattern at 64 characters to
ensure the hint will be seen by the user and not internally truncated by
the die() function.
Cc: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When UTF is enabled for a PCRE match, the corresponding flags are
added to the pcre2_compile() call, but PCRE2_UCP wasn't included.
This prevents extending the meaning of the character classes to
include those new valid characters and therefore result in failed
matches for expressions that rely on that extention, for ex:
$ git grep -P '\bÆvar'
Add PCRE2_UCP so that \w will include Æ and therefore \b could
correctly match the beginning of that word.
This has an impact on performance that has been estimated to be
between 20% to 40% and that is shown through the added performance
test.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Giving "--invert-grep" and "--all-match" without "--grep" to the
"git log" command resulted in an attempt to access grep pattern
expression structure that has not been allocated, which has been
corrected.
* ab/grep-simplify-extended-expression:
grep.c: remove "extended" in favor of "pattern_expression", fix segfault
|
|
Since 79d3696cfb4 (git-grep: boolean expression on pattern matching.,
2006-06-30) the "pattern_expression" member has been used for complex
queries (AND/OR...), with "pattern_list" being used for the simple OR
queries. Since then we've used both "pattern_expression" and its
associated boolean "extended" member to see if we have a complex
expression.
Since f41fb662f57 (revisions API: have release_revisions() release
"grep_filter", 2022-04-13) we've had a subtle bug relating to that: If
we supplied options that were only used for "complex queries", but
didn't supply the query itself we'd set "opt->extended", but would
have a NULL "pattern_expression". As a result these would segfault as
we tried to call "free_grep_patterns()" from "release_revisions()":
git -P log -1 --invert-grep
git -P log -1 --all-match
The root cause of this is that we were conflating the state management
we needed in "compile_grep_patterns()" itself with whether or not we
had an "opt->pattern_expression" later on.
In this cases as we're going through "compile_grep_patterns()" we have
no "opt->pattern_list" but have "opt->no_body_match" or
"opt->all_match". So we'd set "opt->extended = 1", but not "return" on
"opt->extended" as that's an "else if" in the same "if" statement.
That behavior is intentional and required, as the common case is that
we have an "opt->pattern_list" that we're about to parse into the
"opt->pattern_expression".
But we don't need to keep track of this "extended" flag beyond the
state management in compile_grep_patterns() itself. It needs it, but
once we're out of that function we can rely on
"opt->pattern_expression" being non-NULL instead for using these
extended patterns.
As 79d3696cfb4 itself shows we've assumed that there's a one-to-one
mapping between the two since the very beginning. I.e. "match_line()"
would check "opt->extended" to see if it should call "match_expr()",
and the first thing we do in that function is assume that we have a
"opt->pattern_expression". We'd then call "match_expr_eval()", which
would have died if that "opt->pattern_expression" was NULL.
The "die" was added in c922b01f54c (grep: fix segfault when "git grep
'('" is given, 2009-04-27), and can now be removed as it's now clearly
unreachable. We still do the right thing in the case that prompted
that fix:
git grep '('
fatal: unmatched parenthesis
Arguably neither the "--invert-grep" option added in [1] nor the
earlier "--all-match" option added in [2] were intended to be used
stand-alone, and another approach[3] would be to error out in those
cases. But since we've been treating them as a NOOP when given without
--grep for a long time let's keep doing that.
We could also return in "free_pattern_expr()" if the argument is
non-NULL, as an alternative fix for this segfault does [4]. That would
be more elegant in making the "free_*()" function behave like
"free()", but it would also remove a sanity check: The
"free_pattern_expr()" function calls itself recursively, and only the
top-level is allowed to be NULL, let's not conflate those two
conditions.
1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12)
2. 0ab7befa31d (grep --all-match, 2006-09-27)
3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com/
4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@ttaylorr.com
Reported-by: orygaw <orygaw@protonmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).
Signed-off-by: Carlos López 00xc@protonmail.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Workaround we have for versions of PCRE2 before their version 10.36
were in effect only for their versions newer than 10.36 by mistake,
which has been corrected.
* rs/pcre-invalid-utf8-fix-fix:
grep: fix triggering PCRE2_NO_START_OPTIMIZE workaround
|
|
Some code clean-up in the "git grep" machinery.
* ab/grep-patterntype:
grep: simplify config parsing and option parsing
grep.c: do "if (bool && memchr())" not "if (memchr() && bool)"
grep.h: make "grep_opt.pattern_type_option" use its enum
grep API: call grep_config() after grep_init()
grep.c: don't pass along NULL callback value
built-ins: trust the "prefix" from run_builtin()
grep tests: add missing "grep.patternType" config tests
grep tests: create a helper function for "BRE" or "ERE"
log tests: check if grep_config() is called by "log"-like cmds
grep.h: remove unused "regex_t regexp" from grep_opt
|
|
PCRE2 bug 2642 was fixed in version 10.36. Our 95ca1f987e (grep/pcre2:
better support invalid UTF-8 haystacks, 2021-01-24) worked around it on
older versions by setting the flag PCRE2_NO_START_OPTIMIZE. 797c359978
(grep/pcre2: use compile-time PCREv2 version test, 2021-02-18) switched
it around to set the flag on 10.36 and higher instead, while it claimed
to use "the same test done at compile-time".
Switch the condition back to apply the workaround on PCRE2 versions
_before_ 10.36.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Simplify the parsing of "grep.patternType" and
"grep.extendedRegexp". This changes no behavior, but gets rid of
complex parsing logic that isn't needed anymore.
When "grep.patternType" was introduced in 84befcd0a4a (grep: add a
grep.patternType configuration setting, 2012-08-03) we promised that:
1. You can set "grep.patternType", and "[setting it to] 'default'
will return to the default matching behavior".
In that context "the default" meant whatever the configuration
system specified before that change, i.e. via grep.extendedRegexp.
2. We'd support the existing "grep.extendedRegexp" option, but ignore
it when the new "grep.patternType" option is set. We said we'd
only ignore the older "grep.extendedRegexp" option "when the
`grep.patternType` option is set to a value other than
'default'".
In a preceding commit we changed grep_config() to be called after
grep_init(), which means that much of the complexity here can go
away.
As before both "grep.patternType" and "grep.extendedRegexp" are
last-one-wins variable, with "grep.extendedRegexp" yielding to
"grep.patternType", except when "grep.patternType=default".
Note that as the previously added tests indicate this cannot be done
on-the-fly as we see the config variables, without introducing more
state keeping. I.e. if we see:
-c grep.extendedRegexp=false
-c grep.patternType=default
-c extendedRegexp=true
We need to select ERE, since grep.patternType=default unselects that
variable, which normally has higher precedence, but we also need to
select BRE in cases of:
-c grep.extendedRegexp=true \
-c grep.extendedRegexp=false
Which would not be the case for this, which select ERE:
-c grep.patternType=extended \
-c grep.extendedRegexp=false
Therefore we cannot do this on-the-fly in grep_config without also
introducing tracking variables for not only the pattern type, but what
the source of that pattern type was.
So we need to decide on the pattern after our config was fully
parsed. Let's do that by deferring the decision on the pattern type
until it's time to compile it in compile_regexp().
By that time we've not only parsed the config, but also handled the
command-line options. Those will set "opt.pattern_type_option" (*not*
"opt.extended_regexp_option"!).
At that point all we need to do is see if "grep.patternType" was
UNSPECIFIED in the end (including an explicit "=default"), if so we'll
use the "grep.extendedRegexp" configuration, if any.
See my 07a3d411739 (grep: remove regflags from the public grep_opt
API, 2017-06-29) for addition of the two comments being removed here,
i.e. the complexity noted in that commit is now going away.
1. https://lore.kernel.org/git/patch-v8-09.10-c211bb0c69d-20220118T155211Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change code in compile_regexp() to check the cheaper boolean
"!opt->pcre2" condition before the "memchr()" search.
This doesn't noticeably optimize anything, but makes the code more
obvious and conventional. The line wrapping being added here also
makes a subsequent commit smaller.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The grep_init() function used the odd pattern of initializing the
passed-in "struct grep_opt" with a statically defined "grep_defaults"
struct, which would be modified in-place when we invoked
grep_config().
So we effectively (b) initialized config, (a) then defaults, (c)
followed by user options. Usually those are ordered as "a", "b" and
"c" instead.
As the comments being removed here show the previous behavior needed
to be carefully explained as we'd potentially share the populated
configuration among different instances of grep_init(). In practice we
didn't do that, but now that it can't be a concern anymore let's
remove those comments.
This does not change the behavior of any of the configuration
variables or options. That would have been the case if we didn't move
around the grep_config() call in "builtin/log.c". But now that we call
"grep_config" after "git_log_config" and "git_format_config" we'll
need to pass in the already initialized "struct grep_opt *".
See 6ba9bb76e02 (grep: copy struct in one fell swoop, 2020-11-29) and
7687a0541e0 (grep: move the configuration parsing logic to grep.[ch],
2012-10-09) for the commits that added the comments.
The memcpy() pattern here will be optimized away and follows the
convention of other *_init() functions. See 5726a6b4012 (*.c *_init():
define in terms of corresponding *_INIT macro, 2021-07-01).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change code in "builtin/grep.c" and "builtin/ls-tree.c" to trust the
"prefix" passed from "run_builtin()". The "prefix" we get from setup.c
is either going to be NULL or a string of length >0, never "".
So we can drop the "prefix && *prefix" checks added for
"builtin/grep.c" in 0d042fecf2f (git-grep: show pathnames relative to
the current directory, 2006-08-11), and for "builtin/ls-tree.c" in
a69dd585fca (ls-tree: chomp leading directories when run from a
subdirectory, 2005-12-23).
As seen in code in revision.c that was added in cd676a51367 (diff
--relative: output paths as relative to the current subdirectory,
2008-02-12) we already have existing code that does away with this
assertion.
This makes it easier to reason about a subsequent change to the
"prefix_length" code in grep.c in a subsequent commit, and since we're
going to the trouble of doing that let's leave behind an assert() to
promise this to any future callers.
For "builtin/grep.c" it would be painful to pass the "prefix" down the
callchain of:
cmd_grep -> grep_tree -> grep_submodule -> grep_cache -> grep_oid ->
grep_source_name
So for the code that needs it in grep_source_name() let's add a
"grep_prefix" variable similar to the existing "ls_tree_prefix".
While at it let's move the code in cmd_ls_tree() around so that we
assign to the "ls_tree_prefix" right after declaring the variables,
and stop assigning to "prefix". We only subsequently used that
variable later in the function after clobbering it. Let's just use our
own "grep_prefix" instead.
Let's also add an assert() in git.c, so that we'll make this promise
about the "prefix" to any current and future callers, as well as to
any readers of the code.
Code history:
* The strlen() in "grep.c" hasn't been used since 493b7a08d80 (grep:
accept relative paths outside current working directory, 2009-09-05).
When that code was added in 0d042fecf2f (git-grep: show pathnames
relative to the current directory, 2006-08-11) we used the length.
But since 493b7a08d80 we haven't used it for anything except a
boolean check that we could have done on the "prefix" member
itself.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* rs/grep-expr-cleanup:
grep: use grep_and_expr() in compile_pattern_and()
grep: extract grep_binexp() from grep_or_expr()
grep: use grep_not_expr() in compile_pattern_not()
grep: use grep_or_expr() in compile_pattern_or()
|
|
The color palette used by "git grep" has been updated to match that
of GNU grep.
* lh/use-gnu-color-in-grep:
grep: align default colors with GNU grep ones
|
|
In a similar spirit as a previous commit, use the new `grep_and_expr()`
to construct the AND node in `compile_pattern_and()`.
Unlike the aforementioned previous commit, this is not about code
duplication, since this is the only spot in grep.c where an AND node is
constructed. Rather, this is about visual consistency with the other
`compile_pattern_xyz()` functions.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When constructing an OR node, the grep.c code uses `grep_or_expr()` to
make a node, assign its kind, and set its left and right children. The
same is not done for AND nodes.
Prepare to introduce a new `grep_and_expr()` function which will share
code with the existing implementation of `grep_or_expr()` by introducing
a new function which compiles either kind of binary expression, and
reimplement `grep_or_expr()` in terms of it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the definition of grep_not_expr() up and use this function in
compile_pattern_not() to simplify the code and reduce duplication.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the definition of grep_or_expr() up and use this function in
compile_pattern_or() to reduce code duplication.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git grep --perl-regexp" failed to match UTF-8 characters with
wildcard when the pattern consists only of ASCII letters, which has
been corrected.
* rs/pcre2-utf:
grep/pcre2: factor out literal variable
grep/pcre2: use PCRE2_UTF even with ASCII patterns
|
|
git-grep shares a lot of options with the standard grep tool.
Like GNU grep, it has coloring options to highlight the matching text.
And like it, it has options to customize the various colored parts.
This patch updates the default git-grep colors to make them match the
GNU grep default ones [1].
It was possible to get the same result by setting the various `color.grep.<slot>`
options, but this patch makes `git grep --color` share the same color scheme as
`grep --color` by default without any user configuration.
[1] https://www.man7.org/linux/man-pages/man1/grep.1.html#ENVIRONMENT
Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Patterns that contain no wildcards and don't have to be case-folded are
literal. Give this condition a name to increase the readability of the
boolean expression for enabling the option PCRE2_UTF.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
compile_pcre2_pattern() currently uses the option PCRE2_UTF only for
patterns with non-ASCII characters. Patterns with ASCII wildcards can
match non-ASCII strings, though. Without that option PCRE2 mishandles
UTF-8 input, though -- it matches parts of multi-byte characters. Fix
that by using PCRE2_UTF even for ASCII-only patterns.
This is a remake of the reverted ae39ba431a (grep/pcre2: fix an edge
case concerning ascii patterns and UTF-8 data, 2021-10-15). The change
to the condition and the test are simplified and more targeted.
Original-patch-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The option --invert-grep is documented to filter out commits whose
messages match the --grep filters. However, it also affects the
header matches (--author, --committer), which is not intended.
Move the handling of that option to grep.c, as only the code there can
distinguish between matches in the header from those in the message
body. If --invert-grep is given then enable extended expressions (not
the regex type, we just need git grep's --not to work), negate the body
patterns and check if any of them match by piggy-backing on the
collect_hits mechanism of grep_source_1().
Collecting the matches in struct grep_opt is a bit iffy, but with
"last_shown" we have a precedent for writing state information to that
struct.
Reported-by: Dotan Cohen <dotancohen@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This reverts commit ae39ba431ab861548eb60b4bd2e1d8b8813db76f, as it
breaks "grep" when looking for a string in non UTF-8 haystack, when
linked with certain versions of PCREv2 library.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If we attempt to grep non-ascii log message text with an ascii pattern, we
run into the following issue:
$ git log --color --author='.var.*Bjar' -1 origin/master | grep ^Author
grep: (standard input): binary file matches
So, to fix this teach the grep code to use PCRE2_UTF, as long as the log
output is encoded in UTF-8.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These changes are made in preparation of, the colorization support for the
"git log" subcommands that, rely on regex functionality (i.e. "--author",
"--committer" and "--grep"). These changes are necessary primarily because
match_one_pattern() expects header lines to be prefixed, however, in
pretty, the prefixes are stripped from the lines because the name-email
pairs need to go through additional parsing, before they can be printed and
because next_match() doesn't handle the case of
"ctx == GREP_CONTEXT_HEAD" at all. So, teach next_match() how to handle the
new case and move match_one_pattern()'s core logic to
headerless_match_one_pattern() while preserving match_one_pattern()'s uses
that depend on the additional processing.
Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Our grep_buffer() function takes a non-const buffer, which is confusing:
we don't take ownership of nor write to the buffer.
This mostly comes from the fact that the underlying grep_source struct
in which we store the buffer uses non-const pointer. The memory pointed
to by the struct is sometimes owned by us (for FILE or OID sources), and
sometimes not (for BUF sources).
Let's store it as const, which lets us err on the side of caution (i.e.,
the compiler will warn us if any of our code writes to or tries to free
it).
As a result, we must annotate the one place where we do free it by
casting away the constness. But that's a small price to pay for the
extra safety and clarity elsewhere (and indeed, it already had a comment
explaining why GREP_SOURCE_BUF _didn't_ free it).
And then we can mark grep_buffer() as taking a const buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we're grepping in a buffer, we don't need to modify it. So we can
take "const char *" buffers, rather than "char *". This can avoid some
awkward casts in our callers, and make our expectations more clear (we
will not take ownership of the memory, nor will we ever write to it).
These spots don't all necessarily have to be converted at the same time,
but some of them are dependent on others, because we pass
pointers-to-pointers in a few cases. And none of this should change any
behavior, since we're just adding "const" qualifiers (and likewise, the
compiler will let us know if we missed any spots). So it's relatively
low-risk to just do this all at once.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We find the end of each matching line of a buffer, and then temporarily
write a NUL to turn it into a regular C string. But we don't need to do
so, because the only thing we do in the interim is pass the line and its
length (via an "eol" pointer) to match_line(). And that function should
only look at the bytes we passed it, whether it has a terminating NUL or
not.
We can drop this temporary write in order to simplify the code and make
it easier to use const buffers in more of grep.c.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When showing lines via grep (or looking for funcnames), we call
show_line() on a multi-line buffer. It finds the end of line and marks
it with a NUL. However, we don't need to do so, as the resulting line is
only used along with its "eol" marker:
- we pass both to next_match(), which takes care to look at only the
bytes we specified
- we pass the line to output_color() without its matching eol marker.
However, we do use the "match" struct we got from next_match() to
tell it how many bytes to look at (which can never exceed the string
we passed it).
So we can stop setting and restoring this NUL marker. That makes the
code simpler, and will allow us to take a const buffer in a future
patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When grepping for headers in commit objects, we receive individual
lines (e.g., "author Name <email> 1234 -0000"), and then strip off the
timestamp to do our match. We do so by writing a NUL byte over the
whitespace separator, and then remembering to restore it later.
We had to do it this way when this was added back in a4d7d2c6db (log
--author/--committer: really match only with name part, 2008-09-04),
because we fed the result directly to regexec(), which expects a
NUL-terminated string. But since b7d36ffca0 (regex: use regexec_buf(),
2016-09-21), we have a function which can match part of a buffer.
So instead of modifying the string, we can instead just move the "eol"
pointer, and the rest of the code will do the right thing. This will let
further patches mark more buffers as "const".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Record the repository whenever an OID grep source is created, and teach
the worker threads to explicitly provide the repository when accessing
objects.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
grep_source_init() can create "struct grep_source" objects and,
depending on the value of the type passed, some void-pointer parameters have
different meanings. Because one of these types (GREP_SOURCE_OID) will
require an additional parameter in a subsequent patch, take the
opportunity to increase clarity and type safety by replacing this
function with individual functions for each type.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git grep --and -e foo" ought to have been diagnosed as an error
but instead segfaulted, which has been corrected.
* rs/grep-parser-fix:
grep: report missing left operand of --and
|
|
Git grep allows combining two patterns with --and. It checks and
reports if the second pattern is missing when compiling the expression.
A missing first pattern, however, is only reported later at match time.
Thus no error is returned if no matching is done, e.g. because no file
matches the also given pathspec.
When that happens we get an expression tree with an GREP_NODE_AND node
and a NULL pointer to the missing left child. free_pattern_expr()
tries to dereference it during the cleanup at the end, which results
in a segmentation fault.
Fix this by verifying the presence of the left operand at expression
compilation time.
Reported-by: Matthew Hughes <matthewhughes934@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Up until recently, object IDs did not have an algorithm member, only a
hash. Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms. Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.
Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Updates to memory allocation code around the use of pcre2 library.
* ab/grep-pcre2-allocfix:
grep/pcre2: move definitions of pcre2_{malloc,free}
grep/pcre2: move back to thread-only PCREv2 structures
grep/pcre2: actually make pcre2 use custom allocator
grep/pcre2: use pcre2_maketables_free() function
grep/pcre2: use compile-time PCREv2 version test
grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode
grep/pcre2: prepare to add debugging to pcre2_malloc()
grep/pcre2: correct reference to grep_init() in comment
grep/pcre2: drop needless assignment to NULL
grep/pcre2: drop needless assignment + assert() on opt->pcre2
|
|
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the definitions of the pcre2_{malloc,free} functions above the
compile_pcre2_pattern() function they're used in.
Before the preceding commit they used to be needed earlier, but now we
can move them to be adjacent to the other PCREv2 functions.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the setup of the "pcre2_general_context" to happen per-thread
in compile_pcre2_pattern() instead of in grep_init().
This change brings it in line with how the rest of the pcre2_* members
in the grep_pat structure are set up.
As noted in the preceding commit the approach 513f2b0bbd4 (grep: make
PCRE2 aware of custom allocator, 2019-10-16) took to allocate the
pcre2_general_context seems to have been initially based on a
misunderstanding of how PCREv2 memory allocation works.
The approach of creating a global context in grep_init() is just added
complexity for almost zero gain. On my system it's 24 bytes saved
per-thread. For comparison PCREv2 will then go on to allocate at least
a kilobyte for its own thread-local state.
As noted in 6d423dd542f (grep: don't redundantly compile throwaway
patterns under threading, 2017-05-25) the grep code is intentionally
not trying to micro-optimize allocations by e.g. sharing some PCREv2
structures globally, while making others thread-local.
So let's remove this special case and make all of them thread-local
again for simplicity. With this change we could move the
pcre2_{malloc,free} functions around to live closer to their current
use. I'm not doing that here to keep this change small, that cleanup
will be done in a follow-up commit.
See also the discussion in 94da9193a6 (grep: add support for PCRE v2,
2017-06-01) about thread safety, and Johannes's comments[1] to the
effect that we should be doing what this patch is doing.
1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.1908052120302.46@tvgsbejvaqbjf.bet/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Continue work started in 513f2b0bbd4 (grep: make PCRE2 aware of custom
allocator, 2019-10-16) and make PCREv2 use our pcre2_{malloc,free}().
functions for allocation. We'll now use it for all PCREv2 allocations.
The reason 513f2b0bbd4 worked as a bugfix for the USE_NED_ALLOCATOR
issue is because it targeted the allocation freed via free(), as
opposed to by a pcre2_*free() function. I.e. the pcre2_maketables()
and pcre2_maketables_free() pair.
For most of the rest we continued allocating with stock malloc()
inside PCREv2 itself, but didn't segfault because we'd use its
corresponding free().
In a preceding commit of mine I changed the free() to
pcre2_maketables_free() on versions of PCREv2 10.34 and newer. So as
far as fixing the segfault goes we could revert 513f2b0bbd4. But then
we wouldn't use the desired allocator, let's just use it instead.
Before this patch we'd on e.g.:
grep --threads=1 -iP æ.*var.*xyz
Only use pcre2_{malloc,free}() for 2 malloc() calls and 2
corresponding free() calls. Now it's 12 calls to each. This can be
observed with the GREP_PCRE2_DEBUG_MALLOC debug mode.
Reading the history of how this bug got introduced it wasn't present
in Johannes's original patch[1] to fix the issue.
My reading of that thread is that the approach the follow-up patches
to Johannes's original pursued were based on misunderstanding of how
the PCREv2 API works. In particular this part of [2]:
"most of the time (like when using UTF-8) the chartable (and
therefore the global context) is not needed (even when using
alternate allocators)"
That's simply not how PCREv2 memory allocation works. It's easy to see
how the misunderstanding came about. It's because (as noted above) the
issue was noticed because of our use of free() in our own grep.c for
freeing the memory allocated by pcre2_maketables().
Thus the misunderstanding that PCREv2's compile context is something
only needed for pcre2_maketables(), and e.g. an aborted earlier
attempt[3] to only set it up when we ourselves called
pcre2_maketables().
That's not what PCREv2's compile context is. To quote PCREv2's
documentation:
"This context just contains pointers to (and data for) external
memory management functions that are called from several places in
the PCRE2 library."
Thus the failed attempts to go down the route of only creating the
general context in cases where we ourselves call pcre2_maketables(),
before finally settling on the approach 513f2b0bbd4 took of always
creating it, but then mostly not using it.
Instead we should always create it, and then pass the general context
to those functions that accept it, so that they'll consistently use
our preferred memory allocation functions.
1. https://lore.kernel.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/
2. https://lore.kernel.org/git/CAPUEsphMh_ZqcH3M7PXC9jHTfEdQN3mhTAK2JDkdvKBp53YBoA@mail.gmail.com/
3. https://lore.kernel.org/git/20190806085014.47776-3-carenas@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make use of the pcre2_maketables_free() function to free the memory
allocated by pcre2_maketables().
At first sight it's strange that 10da030ab75 (grep: avoid leak of
chartables in PCRE2, 2019-10-16) which added the free() call here
doesn't make use of the pcre2_free() the author introduced in the
preceding commit in 513f2b0bbd4 (grep: make PCRE2 aware of custom
allocator, 2019-10-16).
The reason is that at the time the function didn't exist. It was first
introduced in PCREv2 version 10.34, released on 2019-11-21.
Let's make use of it behind a macro. I don't think this matters for
anything to do with custom allocators, but it makes our use of PCREv2
more discoverable.
At some distant point in the future we'll be able to drop the version
guard, as nobody will be running a version older than 10.34.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Replace a use of pcre2_config(PCRE2_CONFIG_VERSION, ...) which I added
in 95ca1f987ed (grep/pcre2: better support invalid UTF-8 haystacks,
2021-01-24) with the same test done at compile-time.
It might be cuter to do this at runtime since we don't have to do the
"major >= 11 || (major >= 10 && ...)" test. But in the next commit
we'll add another version comparison that absolutely needs to be done
at compile-time, so we're better of being consistent across the board.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add optional printing of PCREv2 allocations to stderr for a developer
who manually changes the GREP_PCRE2_DEBUG_MALLOC definition to "1".
You need to manually change the definition in the source file similar
to the DEBUG_MAILMAP, there's no Makefile knob for this.
This will be referenced a subsequent commit, and is generally useful
to manually see what's going on with PCREv2 allocations while working
on that code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change pcre2_malloc() in a way that'll make it easier for a debugging
fprintf() to spew out the allocated pointer.
This doesn't introduce any functional change, it just makes a
subsequent commit's diff easier to read. Changes code added in
513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Correct a comment added in 513f2b0bbd4 (grep: make PCRE2 aware of
custom allocator, 2019-10-16). This comment was never correct in
git.git, but was consistent with an older version of the patch[1].
1. https://lore.kernel.org/git/20190806163658.66932-3-carenas@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove a redundant assignment of pcre2_compile_context dating back to
my 94da9193a6e (grep: add support for PCRE v2, 2017-06-01).
In create_grep_pat() we xcalloc() the "grep_pat" struct, so there's no
need to NULL out individual members here.
I think this was probably something left over from an earlier
development version of mine.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Drop an assignment added in b65abcafc7a (grep: use PCRE v2 for
optimized fixed-string search, 2019-07-01) and the overly cautious
assert() I added in 94da9193a6e (grep: add support for PCRE v2,
2017-06-01).
There was never a good reason for this, it's just a relic from when I
initially wrote the PCREv2 support. We're not going to have confusion
about compile_pcre2_pattern() being called when it shouldn't just
because we forgot to cargo-cult this opt->pcre2 option.
Furthermore the "struct grep_opt" is (mostly) used for the options the
user supplied, let's avoid the pattern of needlessly assigning to it.
With my recent removal of the PCREv1 backend in 7599730b7e2 (Remove
support for v1 of the PCRE library, 2021-01-24) there's even less
confusion around what we call where in these codepaths, which is one
more reason to remove this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update support for invalid UTF-8 in PCRE2.
* ab/grep-pcre-invalid-utf8:
grep/pcre2: better support invalid UTF-8 haystacks
grep/pcre2 tests: don't rely on invalid UTF-8 data test
|
|
The support for deprecated PCRE1 library has been dropped.
* ab/retire-pcre1:
Remove support for v1 of the PCRE library
config.mak.uname: remove redundant NO_LIBPCRE1_JIT flag
|
|
Remove the hidden "grep --debug" and "log --grep-debug" options added
in 17bf35a3c7b (grep: teach --debug option to dump the parse tree,
2012-09-13).
At the time these options seem to have been intended to go along with
a documentation discussion and to help the author of relevant tests to
perform ad-hoc debugging on them[1].
Reasons to want this gone:
1. They were never documented, and the only (rather trivial) use of
them in our own codebase for testing is something I removed back
in e01b4dab01e (grep: change non-ASCII -i test to stop using
--debug, 2017-05-20).
2. Googling around doesn't show any in-the-wild uses I could dig up,
and on the Git ML the only mentions after the original discussion
seem to have been when they came up in unrelated diff contexts, or
that test commit of mine.
3. An exception to that is c581e4a7499 (grep: under --debug, show
whether PCRE JIT is enabled, 2019-08-18) where we added the
ability to dump out when PCREv2 has the JIT in effect.
The combination of that and my earlier b65abcafc7a (grep: use PCRE
v2 for optimized fixed-string search, 2019-07-01) means Git prints
this out in its most common in-the-wild configuration:
$ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match
pcre2_jit_on=1
pcre2_jit_on=1
pcre2_jit_on=1
[all-match]
(or
pattern_body<body>foo
(or
pattern_body<body>bar
pattern_body<body>baz
)
)
$ git grep --debug \( -e foo --and -e bar \) --or -e baz
pcre2_jit_on=1
pcre2_jit_on=1
pcre2_jit_on=1
(or
(and
patternfoo
patternbar
)
patternbaz
)
I.e. for each pattern we're considering for the and/or/--all-match
etc. debugging we'll now diligently spew out another identical line
saying whether the PCREv2 JIT is on or not.
I think that nobody's complained about that rather glaringly obviously
bad output says something about how much this is used, i.e. it's
not.
The need for this debugging aid for the composed grep/log patterns
seems to have passed, and the desire to dump the JIT config seems to
have been another one-off around the time we had JIT-related issues on
the PCREv2 codepath. That the original author of this debugging
facility seemingly hasn't noticed the bad output since then[2] is
probably some indicator.
1. https://lore.kernel.org/git/cover.1347615361.git.git@drmicha.warpmail.net/
2. https://lore.kernel.org/git/xmqqk1b8x0ac.fsf@gitster-ct.c.googlers.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Improve the support for invalid UTF-8 haystacks given a non-ASCII
needle when using the PCREv2 backend.
This is a more complete fix for a bug I started to fix in
870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching,
2019-07-26), now that PCREv2 has the PCRE2_MATCH_INVALID_UTF mode we
can make use of it.
This fixes the sort of case described in 8a5999838e (grep: stess test
PCRE v2 on invalid UTF-8 data, 2019-07-26), i.e.:
- The subject string is non-ASCII (e.g. "ævar")
- We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C"
- We are using --ignore-case, or we're a non-fixed pattern
If those conditions were satisfied and we matched found non-valid
UTF-8 data PCREv2 might bark on it, in practice this only happened
under the JIT backend (turned on by default on most platforms).
Ultimately this fixes a "regression" in b65abcafc7 ("grep: use PCRE v2
for optimized fixed-string search", 2019-07-01), I'm putting that in
scare-quotes because before then we wouldn't properly support these
complex case-folding, locale etc. cases either, it just broke in
different ways.
There was a bug related to this the PCRE2_NO_START_OPTIMIZE flag fixed
in PCREv2 10.36. It can be worked around by setting the
PCRE2_NO_START_OPTIMIZE flag. Let's do that in those cases, and add
tests for the bug.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove support for using version 1 of the PCRE library. Its use has
been discouraged by upstream for a long time, and it's in a
bugfix-only state.
Anyone who was relying on v1 in particular got a nudge to move to v2
in e6c531b808 (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1,
2018-03-11), which was first released as part of v2.18.0.
With this the LIBPCRE2 test prerequisites is redundant to PCRE. But
I'm keeping it for self-documentation purposes, and to avoid conflict
with other in-flight PCRE patches.
I'm also not changing all of our own "pcre2" names to "pcre", i.e. the
inverse of 6d4b5747f0 (grep: change internal *pcre* variable &
function names to be *pcre1*, 2017-05-25). I don't see the point, and
it makes the history/blame harder to read. Maybe if there's ever a
PCRE v3...
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have a `struct grep_opt` with our defaults which we then copy into
the caller's struct. Rather than zeroing the target struct and copying
each element one by one, just copy everything at once. This leaves the
code simpler and more maintainable.
We don't have any ownership issues with what we're copying now and can
just greedily copy the whole thing. If and when we do need to handle
such elements (`char *`?), we must and can handle it appropriately. Make
sure to leave a comment to our future selves.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 15fabd1bbd ("builtin/grep.c: make configuration callback more
reusable", 2012-10-09), we learned to fill a `static struct grep_opt
grep_defaults` which we can use as a blueprint for other such structs.
At the time, we didn't consider designated initializers to be widely
useable, but these days, we do. (See, e.g., cbc0f81d96 ("strbuf: use
designated initializers in STRBUF_INIT", 2017-07-10).)
Use designated initializers to let the compiler set up the struct and so
that we don't need to remember to call `init_grep_defaults()`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
`init_grep_defaults()` fills a `static struct grep_opt grep_defaults`.
This struct is then used by `grep_init()` as a blueprint for other such
structs. Notably, `grep_init()` takes a `struct repo *` and assigns it
into the target struct.
As a result, it is unnecessary for us to take a `struct repo *` in
`init_grep_defaults()` as well. We assign it into the default struct and
never look at it again. And in light of how we return early if we have
already set up the default struct, it's not just unnecessary, but is
also a bit confusing: If we are called twice and with different repos,
is it a bug or a feature that we ignore the second repo?
Drop the repo parameter for `init_grep_defaults()`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This commit fixes a couple of minor spelling mistakes inside
comments.
Signed-off-by: Steve Kemp <steve@steve.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
git-grep uses 'grep_read_mutex' to protect its calls to object reading
operations. But these have their own internal lock now, which ensures a
better performance (allowing parallel access to more regions). So, let's
remove the former and, instead, activate the latter with
enable_obj_read_lock().
Sections that are currently protected by 'grep_read_mutex' but are not
internally protected by the object reading lock should be surrounded by
obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion
with object reading operations, keeping the current behavior and
avoiding race conditions. Namely, these places are:
In grep.c:
- fill_textconv() at fill_textconv_grep().
- userdiff_get_textconv() at grep_source_1().
In builtin/grep.c:
- parse_object_or_die() and the submodule functions at
grep_submodule().
- deref_tag() and gitmodules_config_oid() at grep_objects().
If these functions become thread-safe, in the future, we might remove
the locking and probably get some speedup.
Note that some of the submodule functions will already be thread-safe
(or close to being thread-safe) with the internal object reading lock.
However, as some of them will require additional modifications to be
removed from the critical section, this will be done in its own patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
git-grep uses an internal grep_read_mutex to protect object reading
operations. Similarly, there's a grep_attr_mutex to protect calls to the
gitattributes machinery. However, two of the three functions protected
by the last mutex may also perform object reading, as seen below:
- userdiff_get_textconv() > notes_cache_init() >
notes_cache_match_validity() > lookup_commit_reference_gently() >
parse_object() > repo_has_object_file() >
repo_has_object_file_with_flags() > oid_object_info_extended()
- userdiff_find_by_path() > git_check_attr() > collect_some_attrs() >
prepare_attr_stack() > read_attr() > read_attr_from_index() >
read_blob_data_from_index() > read_object_file()
As these calls are not protected by grep_read_mutex, there might be race
conditions with other threads performing object reading (e.g. threads
calling fill_textconv() at grep.c:fill_textconv_grep()). To prevent
that, let's make sure to acquire the lock before both of these calls.
Note: this patch might slow down the threaded grep in worktree, for the
sake of thread-safeness. However, in the following patches, we should
regain performance by replacing grep_read_mutex for an internal object
reading lock and allowing parallel inflation during object reading.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Previously, the void pcre2_free() function in grep.c returned free().
While free() itself is void, afaict it's still an expression as per
section A.2.3, subsection 6.8.6 (jump-statement) in both C99 [1] and C11
[2]:
> return expression
Section 6.8.6.4 in C99 [1] and C11 [2] says that:
> A return statement with an expression shall not appear in a function
> whose return type is void.
The consequence of the old behavior was that developer builds with
pedantic errors enabled broke Git if PCRE2 was enabled and a
smart-enough compiler to detect these errors was used. This commit
fixes pedantic builds of Git that enables --with-libpcre.
[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Leakfix.
* cb/pcre2-chartables-leakfix:
grep: avoid leak of chartables in PCRE2
grep: make PCRE2 aware of custom allocator
grep: make PCRE1 aware of custom allocator
|
|
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) introduced
a small memory leak visible with valgrind in t7813.
Complete the creation of a PCRE2 specific variable that was missing from
the original change and free the generated table just like it is done
for PCRE1.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
94da9193a6 (grep: add support for PCRE v2, 2017-06-01) didn't include
a way to override the system allocator, and so it is incompatible with
custom allocators (e.g. nedmalloc). This problem became obvious when we
tried to plug a memory leak by `free()`ing a data structure allocated by
PCRE2, triggering a segfault in Windows (where we use nedmalloc by
default).
PCRE2 requires the use of a general context to override the allocator
and therefore, there is a lot more code needed than in PCRE1, including
a couple of wrapper functions.
Extend the grep API with a "destructor" that could be called to cleanup
any objects that were created and used globally.
Update `builtin/grep.c` to use that new API, but any other future users
should make sure to have matching `grep_init()`/`grep_destroy()` calls
if they are using the pattern matching functionality.
Move some of the logic that was before done per thread (in the workers)
into an earlier phase to avoid degrading performance, but as the use of
PCRE2 with custom allocators is better understood it is expected more of
its functions will be instructed to use the custom allocator as well as
was done in the original code[1] this work was based on.
[1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/
Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) didn't include a way
to override the system alocator, and so it is incompatible with
USE_NED_ALLOCATOR as reported by Dscho[1] (in similar code from PCRE2)
Note that nedmalloc, as well as other custom allocators like jemalloc
and mi-malloc, can be configured at runtime (via `LD_PRELOAD`),
therefore we cannot know at compile time whether a custom allocator is
used or not.
Make the minimum change possible to ensure this combination is supported
by extending `grep_init()` to set the PCRE1 specific functions to Git's
idea of `malloc()` and `free()` and therefore making sure all
allocations are done inside PCRE1 with the same allocator than the rest
of Git.
This change has negligible performance impact: PCRE needs to allocate
memory once per program run for the character table and for each pattern
compilation. These are both rare events compared to matching patterns
against lines. Actual measurements[2] show that the impact is lost in
the noise.
[1] https://public-inbox.org/git/pull.306.git.gitgitgadget@gmail.com
[2] https://public-inbox.org/git/7f42007f-911b-c570-17f6-1c6af0429586@web.de
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
PCRE fixes.
* cb/pcre1-cleanup:
grep: refactor and simplify PCRE1 support
grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1
|
|
A few simplification and bugfixes to PCRE interface.
* ab/pcre-jit-fixes:
grep: under --debug, show whether PCRE JIT is enabled
grep: do not enter PCRE2_UTF mode on fixed matching
grep: stess test PCRE v2 on invalid UTF-8 data
grep: create a "is_fixed" member in "grep_pat"
grep: consistently use "p->fixed" in compile_regexp()
grep: stop using a custom JIT stack with PCRE v1
grep: stop "using" a custom JIT stack with PCRE v2
grep: remove overly paranoid BUG(...) code
grep: use PCRE v2 for optimized fixed-string search
grep: remove the kwset optimization
grep: drop support for \0 in --fixed-strings <pattern>
grep: make the behavior for NUL-byte in patterns sane
grep tests: move binary pattern tests into their own file
grep tests: move "grep binary" alongside the rest
grep: inline the return value of a function call used only once
t4210: skip more command-line encoding tests on MinGW
grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>"
log tests: test regex backends in "--encode=<enc>" tests
|
|
Make sure the grep machinery does not abort when seeing a payload
that is not UTF-8 even when JIT is not in use with PCRE1.
* cb/skip-utf8-check-with-pcre1:
grep: skip UTF8 checks explicitly
|
|
18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was released
with git 2.10 added the PCRE_UTF8 flag to PCRE1 matching including a
call to has_non_ascii() to try to avoid breakage if there was non-utf8
encoded content in the haystack.
Usually PCRE is compiled with JIT support (even if is not the default),
and therefore the codepath used includes calling pcre_jit_exec, which
skips UTF-8 validation by design (which might result in crashes or hangs)
but when JIT support wasn't compiled we use pcre_exec instead with the
posibility that grep might be aborted if invalid UTF-8 is found in the
haystack.
PCRE1 provides a flag since Mar 5, 2007 that could be used to skip the
checks explicitly so use that to make both codepaths equivalent (the
flag is ignored by pcre1_jit_exec)
this fix is only implemented for PCRE1 because PCRE2 is likely to have
a better solution (without the risks) instead in the future
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code used both a macro and a variable to keep track if JIT
support was desired and relied on the fact that a non JIT
enabled library will ignore a request for JIT compilation
(as defined by the second parameter of the call to pcre_study)
Cleanup the multiple levels of macros used and call pcre_study
with the right parameter after JIT support has been confirmed
and unless it was requested to be disabled with NO_LIBPCRE1_JIT
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This information is useful and not visible anywhere else, so show it.
Signed-off-by: Beat Bolli <dev+git@drbeat.li>
Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
BUG() message fix.
The codepath may want to just simply be removed, though.
* bb/grep-pcre2-bug-message-fix:
grep: print the pcre2_jit_on value
|
|
As discussed in the last commit partially fix a bug introduced in
b65abcafc7 ("grep: use PCRE v2 for optimized fixed-string search",
2019-07-01). Because PCRE v2, unlike kwset, validates its UTF-8 input
we'd die on e.g.:
fatal: pcre2_match failed with error code -22: UTF-8 error:
isolated byte with 0x80 bit set
When grepping a non-ASCII fixed string. This is a more general problem
that's hard to fix, but we can at least fix the most common case of
grepping for a fixed string without "-i". I can't think of a reason
for why we'd turn on PCRE2_UTF when matching byte-for-byte like that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since my b65abcafc7 ("grep: use PCRE v2 for optimized fixed-string
search", 2019-07-01) we've been dying on invalid UTF-8 data when
grepping for fixed strings if the following are all true:
* The subject string is non-ASCII (e.g. "ævar")
* We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C"
* We compiled with PCRE v2
* That PCRE v2 did not have JIT support
The last of those is why this wasn't caught earlier, per pcre2jit(3):
"unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested
for validity. In the interests of speed, these checks do not
happen on the JIT fast path, and if invalid data is passed, the
result is undefined."
I.e. the subject being matched against our pattern was invalid, but we
were lucky and getting away with it on the JIT path, but the non-JIT
one is stricter.
This patch does nothing to fix that, instead we sneak in support for
fixed patterns starting with "(*NO_JIT)", this disables the PCRE v2
jit with implicit fixed-string matching for testing, see
pcre2syntax(3) the syntax.
This is technically a change in behavior, but it's so obscure that I
figured it was OK. We'd previously consider this an invalid regular
expression as regcomp() would die on it, now we feed it to the PCRE v2
fixed-string path. I thought this was better than introducing yet
another GIT_TEST_* environment variable.
We're also relying on a behavior of PCRE v2 that technically could
change, but I think the test coverage is worth dipping our toe into
some somewhat undefined behavior.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This change paves the way for later using this value the regex compile
functions themselves.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
At the start of this function we do:
p->fixed = opt->fixed;
It's less confusing to use that variable consistently that switch back
& forth between the two.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Simplify the PCRE v1 code for the same reasons as for the PCRE v2 code
in the last commit. Unlike with v2 we actually used the custom stack
in v1, but let's use PCRE's built-in 32 KB one instead, since
experience with v2 shows that's enough. Most distros are already using
v2 as a default, and the underlying sljit code is the same.
Unfortunately we can't just pass a NULL to pcre_jit_exec() as with
pcre2_jit_match(). Unlike the v2 function it doesn't support
that. Instead we need to use the fatter pcre_exec() if we'd like the
same behavior.
This will make things slightly slower than on the fast-path function,
but it's OK since we care less about v1 performance these days since
we have and recommend v2. Running a similar performance test as what I
ran in fbaceaac47 ("grep: add support for the PCRE v1 JIT API",
2017-05-25) via:
GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE1=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst' ./run HEAD~ HEAD p7820-grep-engines.sh
Gives us this, just the /perl/ results:
Test HEAD~ HEAD
---------------------------------------------------------------------------------------
7820.3: perl grep 'how.to' 0.19(0.67+0.52) 0.19(0.65+0.52) +0.0%
7820.7: perl grep '^how to' 0.19(0.78+0.44) 0.19(0.72+0.49) +0.0%
7820.11: perl grep '[how] to' 0.39(2.13+0.43) 0.40(2.10+0.46) +2.6%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.44(2.55+0.37) 0.45(2.47+0.41) +2.3%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.23(1.06+0.42) 0.22(1.03+0.43) -4.3%
It will also implicitly re-enable UTF-8 validation for PCRE v1. As
noted in [1] we now have cases as a result where PCRE v1 is more eager
to error out. Subsequent patches will fix that for v2, and I think
it's fair to tell v1 users "just upgrade" and not worry about that
edge case for v1.
1. https://public-inbox.org/git/CAPUEsphZJ_Uv9o1-yDpjNLA_q-f7gWXz9g1gCY2pYAYN8ri40g@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As reported in [1] the code I added in 94da9193a6 ("grep: add support
for PCRE v2", 2017-06-01) to use a custom JIT stack has never
worked. It was incorrectly copy/pasted from code I added in
fbaceaac47 ("grep: add support for the PCRE v1 JIT API", 2017-05-25),
which did work.
Thus our intention of starting with 1 byte of stack at a maximum of 1
MB didn't happen, we'd always use the 32 KB stack provided by PCRE
v2's jit_machine_stack_exec()[2]. The reason I allocated a custom
stack at all was this advice in pcrejit(3) (same in pcre2jit(3)):
"By default, it uses 32KiB on the machine stack. However, some
large or complicated patterns need more than this"
Since we've haven't had any reports of users running into
PCRE2_ERROR_JIT_STACKLIMIT in the wild I think we can safely assume
that we can just use the library defaults instead and drop this
code. This won't change with the wider use of PCRE v2 in
ed0479ce3d ("Merge branch 'ab/no-kwset' into next", 2019-07-15), a
fixed string search is not a "large or complicated pattern".
For good measure I ran the performance test noted in 94da9193a6,
although the command is simpler now due to my 0f50c8e32c ("Makefile:
remove the NO_R_TO_GCC_LINKER flag", 2019-05-17):
GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE2=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst' ./run HEAD~ HEAD p7820-grep-engines.sh
Just the /perl/ results are:
Test HEAD~ HEAD
---------------------------------------------------------------------------------------
7820.3: perl grep 'how.to' 0.17(0.27+0.65) 0.17(0.24+0.68) +0.0%
7820.7: perl grep '^how to' 0.16(0.23+0.66) 0.16(0.23+0.67) +0.0%
7820.11: perl grep '[how] to' 0.18(0.35+0.62) 0.18(0.33+0.65) +0.0%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.17(0.45+0.54) 0.17(0.49+0.50) +0.0%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.16(0.33+0.58) 0.16(0.29+0.62) +0.0%
So, as expected there's no change, and running with valgrind reveals
that we have fewer allocations now.
As noted in [3] there are known regexes that will fail with the lower
stack limit, the way GNU grep fixed it is interesting, although I
believe the implementation is overly verbose, they could make PCRE v2
handle that gradual re-allocation, that's what min/max memory is
for.
So we might end up bringing this back, I'm more inclined to just kick
such cases upstairs to PCRE maintainers as a bug, perhaps they'll add
some overall "just allocate more then" flag to make this easier. In
any case there's no functional change here, we didn't have a custom
stack, so let's apply this first, we can always revert it later.
1. https://public-inbox.org/git/20190721194052.15440-1-carenas@gmail.com/
2. I didn't really intend to start with 1 byte, looking at the PCRE v2
code again what happened is that I cargo-culted some of PCRE v2's
own test code which was meant to test re-allocations. It's more
sane to start with say 32 KB with a max of 1 MB, as pcre2grep.c
does.
3. https://public-inbox.org/git/CAPUEspjj+fG8QDmf=bZXktfpLgkgiu34HTjKLhm-cmEE04FE-A@mail.gmail.com/
Reported-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove code that would trigger if pcre_config() or pcre2_config() was
so broken that "do we have JIT?" wouldn't return a boolean.
I added this code back in fbaceaac47 ("grep: add support for the PCRE
v1 JIT API", 2017-05-25) and then as noted in f002532784 ("grep: print
the pcre2_jit_on value", 2019-07-22) incorrectly copy/pasted some of
it in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01).
Let's just remove this code. Being this paranoid about the
pcre2?_config() function itself being broken is crossing the line into
unreasonable paranoia.
Reported-by: Beat Bolli <dev+git@drbeat.li>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When pcre2_jit_on is neither 1 nor 0, the BUG() call printed the value
of pcre1_jit_on.
Print the value of pcre2_jit_on instead.
Signed-off-by: Beat Bolli <dev+git@drbeat.li>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Bring back optimized fixed-string search for "grep", this time with
PCRE v2 as an optional backend. As noted in [1] with kwset we were
slower than PCRE v1 and v2 JIT with the kwset backend, so that
optimization was counterproductive.
This brings back the optimization for "--fixed-strings", without
changing the semantics of having a NUL-byte in patterns. As seen in
previous commits in this series we could support it now, but I'd
rather just leave that edge-case aside so we don't have one behavior
or the other depending what "--fixed-strings" backend we're using. It
makes the behavior harder to understand and document, and makes tests
for the different backends more painful.
This does change the behavior under non-C locales when "log"'s
"--encoding" option is used and the heystack/needle in the
content/command-line doesn't have a matching encoding. See the recent
change in "t4210: skip more command-line encoding tests on MinGW" in
this series. I think that's OK. We did nothing sensible before
then (just compared raw bytes that had no hope of matching). At least
now the user will get some idea why their grep/log never matches in
that edge case.
I could also support the PCRE v1 backend here, but that would make the
code more complex. I'd rather aim for simplicity here and in future
changes to the diffcore. We're not going to have someone who
absolutely must have faster search, but for whom building PCRE v2
isn't acceptable.
The difference between this series of commits and the current "master"
is, using the same t/perf commands shown in the last commit:
plain grep:
Test origin/master HEAD
-------------------------------------------------------------------------
7821.1: fixed grep int 0.55(1.67+0.56) 0.41(0.98+0.60) -25.5%
7821.2: basic grep int 0.58(1.65+0.52) 0.41(0.96+0.57) -29.3%
7821.3: extended grep int 0.57(1.66+0.49) 0.42(0.93+0.60) -26.3%
7821.4: perl grep int 0.54(1.67+0.50) 0.43(0.88+0.65) -20.4%
7821.6: fixed grep uncommon 0.21(0.52+0.42) 0.16(0.24+0.51) -23.8%
7821.7: basic grep uncommon 0.20(0.49+0.45) 0.17(0.28+0.47) -15.0%
7821.8: extended grep uncommon 0.20(0.54+0.39) 0.16(0.25+0.50) -20.0%
7821.9: perl grep uncommon 0.20(0.58+0.36) 0.16(0.23+0.50) -20.0%
7821.11: fixed grep æ 0.35(1.24+0.43) 0.16(0.23+0.50) -54.3%
7821.12: basic grep æ 0.36(1.29+0.38) 0.16(0.20+0.54) -55.6%
7821.13: extended grep æ 0.35(1.23+0.44) 0.16(0.24+0.50) -54.3%
7821.14: perl grep æ 0.35(1.33+0.34) 0.16(0.28+0.46) -54.3%
grep with -i:
Test origin/master HEAD
----------------------------------------------------------------------------
7821.1: fixed grep -i int 0.62(1.81+0.70) 0.47(1.11+0.64) -24.2%
7821.2: basic grep -i int 0.67(1.90+0.53) 0.46(1.07+0.62) -31.3%
7821.3: extended grep -i int 0.62(1.92+0.53) 0.53(1.12+0.58) -14.5%
7821.4: perl grep -i int 0.66(1.85+0.58) 0.45(1.10+0.59) -31.8%
7821.6: fixed grep -i uncommon 0.21(0.54+0.43) 0.17(0.20+0.55) -19.0%
7821.7: basic grep -i uncommon 0.20(0.52+0.45) 0.17(0.29+0.48) -15.0%
7821.8: extended grep -i uncommon 0.21(0.52+0.44) 0.17(0.26+0.50) -19.0%
7821.9: perl grep -i uncommon 0.21(0.53+0.44) 0.17(0.20+0.56) -19.0%
7821.11: fixed grep -i æ 0.26(0.79+0.44) 0.16(0.29+0.46) -38.5%
7821.12: basic grep -i æ 0.26(0.79+0.42) 0.16(0.20+0.54) -38.5%
7821.13: extended grep -i æ 0.26(0.84+0.39) 0.16(0.24+0.50) -38.5%
7821.14: perl grep -i æ 0.16(0.24+0.49) 0.17(0.25+0.51) +6.3%
plain log:
Test origin/master HEAD
--------------------------------------------------------------------------------
4221.1: fixed log --grep='int' 7.24(6.95+0.28) 7.20(6.95+0.18) -0.6%
4221.2: basic log --grep='int' 7.31(6.97+0.22) 7.20(6.93+0.21) -1.5%
4221.3: extended log --grep='int' 7.37(7.04+0.24) 7.22(6.91+0.25) -2.0%
4221.4: perl log --grep='int' 7.31(7.04+0.21) 7.19(6.89+0.21) -1.6%
4221.6: fixed log --grep='uncommon' 6.93(6.59+0.32) 7.04(6.66+0.37) +1.6%
4221.7: basic log --grep='uncommon' 6.92(6.58+0.29) 7.08(6.75+0.29) +2.3%
4221.8: extended log --grep='uncommon' 6.92(6.55+0.31) 7.00(6.68+0.31) +1.2%
4221.9: perl log --grep='uncommon' 7.03(6.59+0.33) 7.12(6.73+0.34) +1.3%
4221.11: fixed log --grep='æ' 7.41(7.08+0.28) 7.05(6.76+0.29) -4.9%
4221.12: basic log --grep='æ' 7.39(6.99+0.33) 7.00(6.68+0.25) -5.3%
4221.13: extended log --grep='æ' 7.34(7.00+0.25) 7.15(6.81+0.31) -2.6%
4221.14: perl log --grep='æ' 7.43(7.13+0.26) 7.01(6.60+0.36) -5.7%
log with -i:
Test origin/master HEAD
------------------------------------------------------------------------------------
4221.1: fixed log -i --grep='int' 7.31(7.07+0.24) 7.23(7.00+0.22) -1.1%
4221.2: basic log -i --grep='int' 7.40(7.08+0.28) 7.19(6.92+0.20) -2.8%
4221.3: extended log -i --grep='int' 7.43(7.13+0.25) 7.27(6.99+0.21) -2.2%
4221.4: perl log -i --grep='int' 7.34(7.10+0.24) 7.10(6.90+0.19) -3.3%
4221.6: fixed log -i --grep='uncommon' 7.07(6.71+0.32) 7.11(6.77+0.28) +0.6%
4221.7: basic log -i --grep='uncommon' 6.99(6.64+0.28) 7.12(6.69+0.38) +1.9%
4221.8: extended log -i --grep='uncommon' 7.11(6.74+0.32) 7.10(6.77+0.27) -0.1%
4221.9: perl log -i --grep='uncommon' 6.98(6.60+0.29) 7.05(6.64+0.34) +1.0%
4221.11: fixed log -i --grep='æ' 7.85(7.45+0.34) 7.03(6.68+0.32) -10.4%
4221.12: basic log -i --grep='æ' 7.87(7.49+0.29) 7.06(6.69+0.31) -10.3%
4221.13: extended log -i --grep='æ' 7.87(7.54+0.31) 7.09(6.69+0.31) -9.9%
4221.14: perl log -i --grep='æ' 7.06(6.77+0.28) 6.91(6.57+0.31) -2.1%
So as with e05b027627 ("grep: use PCRE v2 for optimized fixed-string
search", 2019-06-26) there's a huge improvement in performance for
"grep", but in "log" most of our time is spent elsewhere, so we don't
notice it that much.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A later change will replace this optimization with optimistic use of
PCRE v2. I'm completely removing it as an intermediate step, as
opposed to replacing it with PCRE v2, to demonstrate that no grep
semantics depend on this (or any other) optimization for the fixed
backend anymore.
For now this is mostly (but not entirely) a performance regression, as
shown by this hacky one-liner:
for opt in '' ' -i'
do
GIT_PERF_7821_GREP_OPTS=$opt GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3 USE_LIBPCRE=YesPlease' ./run origin/master HEAD -- p7821-grep-engines-fixed.sh
done &&
for opt in '' ' -i'
do GIT_PERF_4221_LOG_OPTS=$opt GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3 USE_LIBPCRE=YesPlease' ./run origin/master HEAD -- p4221-log-grep-engines-fixed.sh
done
Which produces:
plain grep:
Test origin/master HEAD
-------------------------------------------------------------------------
7821.1: fixed grep int 0.55(1.60+0.63) 0.82(3.11+0.51) +49.1%
7821.2: basic grep int 0.62(1.68+0.49) 0.85(3.02+0.52) +37.1%
7821.3: extended grep int 0.61(1.63+0.53) 0.91(3.09+0.44) +49.2%
7821.4: perl grep int 0.55(1.60+0.57) 0.41(0.93+0.57) -25.5%
7821.6: fixed grep uncommon 0.20(0.50+0.44) 0.35(1.27+0.42) +75.0%
7821.7: basic grep uncommon 0.20(0.49+0.45) 0.35(1.29+0.41) +75.0%
7821.8: extended grep uncommon 0.20(0.45+0.48) 0.35(1.25+0.44) +75.0%
7821.9: perl grep uncommon 0.20(0.53+0.41) 0.16(0.24+0.49) -20.0%
7821.11: fixed grep æ 0.35(1.27+0.40) 0.25(0.82+0.39) -28.6%
7821.12: basic grep æ 0.35(1.28+0.38) 0.25(0.75+0.44) -28.6%
7821.13: extended grep æ 0.36(1.21+0.46) 0.25(0.86+0.35) -30.6%
7821.14: perl grep æ 0.35(1.33+0.34) 0.16(0.26+0.47) -54.3%
grep with -i:
Test origin/master HEAD
-----------------------------------------------------------------------------
7821.1: fixed grep -i int 0.61(1.84+0.64) 1.11(4.12+0.64) +82.0%
7821.2: basic grep -i int 0.72(1.86+0.57) 1.15(4.48+0.49) +59.7%
7821.3: extended grep -i int 0.94(1.83+0.60) 1.53(4.12+0.58) +62.8%
7821.4: perl grep -i int 0.66(1.82+0.59) 0.55(1.08+0.58) -16.7%
7821.6: fixed grep -i uncommon 0.21(0.51+0.44) 0.44(1.74+0.34) +109.5%
7821.7: basic grep -i uncommon 0.21(0.55+0.41) 0.44(1.72+0.40) +109.5%
7821.8: extended grep -i uncommon 0.21(0.57+0.39) 0.42(1.64+0.45) +100.0%
7821.9: perl grep -i uncommon 0.21(0.48+0.48) 0.17(0.30+0.45) -19.0%
7821.11: fixed grep -i æ 0.25(0.73+0.45) 0.25(0.75+0.45) +0.0%
7821.12: basic grep -i æ 0.25(0.71+0.49) 0.26(0.77+0.44) +4.0%
7821.13: extended grep -i æ 0.25(0.75+0.44) 0.25(0.74+0.46) +0.0%
7821.14: perl grep -i æ 0.17(0.26+0.48) 0.16(0.20+0.52) -5.9%
plain log:
Test origin/master HEAD
---------------------------------------------------------------------------------
4221.1: fixed log --grep='int' 7.31(7.06+0.21) 8.11(7.85+0.20) +10.9%
4221.2: basic log --grep='int' 7.30(6.94+0.27) 8.16(7.89+0.19) +11.8%
4221.3: extended log --grep='int' 7.34(7.05+0.21) 8.08(7.76+0.25) +10.1%
4221.4: perl log --grep='int' 7.27(6.94+0.24) 7.05(6.76+0.25) -3.0%
4221.6: fixed log --grep='uncommon' 6.97(6.62+0.32) 7.86(7.51+0.30) +12.8%
4221.7: basic log --grep='uncommon' 7.05(6.69+0.29) 7.89(7.60+0.28) +11.9%
4221.8: extended log --grep='uncommon' 6.89(6.56+0.32) 7.99(7.66+0.24) +16.0%
4221.9: perl log --grep='uncommon' 7.02(6.66+0.33) 6.97(6.54+0.36) -0.7%
4221.11: fixed log --grep='æ' 7.37(7.03+0.33) 7.67(7.30+0.31) +4.1%
4221.12: basic log --grep='æ' 7.41(7.00+0.31) 7.60(7.28+0.26) +2.6%
4221.13: extended log --grep='æ' 7.35(6.96+0.38) 7.73(7.31+0.34) +5.2%
4221.14: perl log --grep='æ' 7.43(7.10+0.32) 6.95(6.61+0.27) -6.5%
log with -i:
Test origin/master HEAD
------------------------------------------------------------------------------------
4221.1: fixed log -i --grep='int' 7.40(7.05+0.23) 8.66(8.38+0.20) +17.0%
4221.2: basic log -i --grep='int' 7.39(7.09+0.23) 8.67(8.39+0.20) +17.3%
4221.3: extended log -i --grep='int' 7.29(6.99+0.26) 8.69(8.31+0.26) +19.2%
4221.4: perl log -i --grep='int' 7.42(7.16+0.21) 7.14(6.80+0.24) -3.8%
4221.6: fixed log -i --grep='uncommon' 6.94(6.58+0.35) 8.43(8.04+0.30) +21.5%
4221.7: basic log -i --grep='uncommon' 6.95(6.62+0.31) 8.34(7.93+0.32) +20.0%
4221.8: extended log -i --grep='uncommon' 7.06(6.75+0.25) 8.32(7.98+0.31) +17.8%
4221.9: perl log -i --grep='uncommon' 6.96(6.69+0.26) 7.04(6.64+0.32) +1.1%
4221.11: fixed log -i --grep='æ' 7.92(7.55+0.33) 7.86(7.44+0.34) -0.8%
4221.12: basic log -i --grep='æ' 7.88(7.49+0.32) 7.84(7.46+0.34) -0.5%
4221.13: extended log -i --grep='æ' 7.91(7.51+0.32) 7.87(7.48+0.32) -0.5%
4221.14: perl log -i --grep='æ' 7.01(6.59+0.35) 6.99(6.64+0.28) -0.3%
Some of those, as noted in [1] are because PCRE is faster at finding
fixed strings. This looks bad for some engines, but in the next change
we'll optimistically use PCRE v2 for all of these, so it'll look
better.
1. https://public-inbox.org/git/87v9x793qi.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change "-f <file>" to not support patterns with a NUL-byte in them
under --fixed-strings. We'll now only support these under
"--perl-regexp" with PCRE v2.
A previous change to grep's documentation changed the description of
"-f <file>" to be vague enough as to not promise that this would work.
By dropping support for this we make it a whole lot easier to move
away from the kwset backend, which we'll do in a subsequent change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The behavior of "grep" when patterns contained a NUL-byte has always
been haphazard, and has served the vagaries of the implementation more
than anything else. A pattern containing a NUL-byte can only be
provided via "-f <file>". Since pickaxe (log search) has no such flag
the NUL-byte in patterns has only ever been supported by "grep" (and
not "log --grep").
Since 9eceddeec6 ("Use kwset in grep", 2011-08-21) patterns containing
"\0" were considered fixed. In 966be95549 ("grep: add tests to fix
blind spots with \0 patterns", 2017-05-20) I added tests for this
behavior.
Change the behavior to do the obvious thing, i.e. don't silently
discard a regex pattern and make it implicitly fixed just because they
contain a NUL-byte. Instead die if the backend in question can't
handle them, e.g. --basic-regexp is combined with such a pattern.
This is desired because from a user's point of view it's the obvious
thing to do. Whether we support BRE/ERE/Perl syntax is different from
whether our implementation is limited by C-strings. These patterns are
obscure enough that I think this behavior change is OK, especially
since we never documented the old behavior.
Doing this also makes it easier to replace the kwset backend with
something else, since we'll no longer strictly need it for anything we
can't easily use another fixed-string backend for.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since e944d9d932 ("grep: rewrite an if/else condition to avoid
duplicate expression", 2016-06-25) the "ascii_only" variable has only
been used once in compile_regexp(), let's just inline it there.
This makes the code easier to read, and might make it marginally
faster depending on compiler optimizations.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix a bug introduced in 18547aacf5 ("grep/pcre: support utf-8",
2016-06-25) that was missed due to a blindspot in our tests, as
discussed in the previous commit. I then blindly copied the same bug
in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) when
adding the PCRE v2 code.
We should not tell PCRE that we're processing UTF-8 just because we're
dealing with non-ASCII. In the case of e.g. "log --encoding=<...>"
under is_utf8_locale() the haystack might be in ISO-8859-1, and the
needle might be in a non-UTF-8 encoding.
Maybe we should be more strict here and die earlier? Should we also be
converting the needle to the encoding in question, and failing if it's
not a string that's valid in that encoding? Maybe.
But for now matching this as non-UTF8 at least has some hope of
producing sensible results, since we know that our default heuristic
of assuming the text to be matched is in the user locale encoding
isn't true when we've explicitly encoded it to be in a different
encoding.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
grep_source(), which performs much of the work for Git's grep library,
allows passing an arbitrary struct grep_source which represents the text
which grep_source() should search to match a pattern in the provided
struct grep_opt. In most callers, the grep_source::name field is set to
an appropriate prefix to print before a colon when a result matches:
README:Git is an Open Source project covered by the GNU General
One caller, grep_buffer(), leaves the grep_source::name field set to
NULL because there isn't enough context to determine an appropriate name
for this kind of output line. In practice, this has been fine: the only
caller of grep_buffer() is "git log --grep", and that caller sets
grep_opt::status_only, which disables output and only checks whether a
match exists. But this is brittle: a future caller can call
grep_buffer() without grep_opt::status_only set, and as soon as it hits
a match, grep_source() will try to print the match and segfault:
(null):Git is an Open Source project covered by the GNU General
For example, a future caller might want to print all matching lines from
commits which match a regex.
Futureproof by diagnosing early a use of the API that could trigger that
condition, before we know whether the pattern matches:
BUG: grep.c:1783: grep call which could print a name requires
grep_source.name be non-NULL
Aborted
This way, the caller's author gets an indication of how to fix the issue
- by providing grep_source::name or setting grep_opt::status_only - and
they are warned of the potential for a segfault unconditionally, rather
than only if there is a match.
Noticed while adding such a call to a tutorial on revision walks.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More codepaths become aware of working with in-core repository
instance other than the default "the_repository".
* nd/the-index: (22 commits)
rebase-interactive.c: remove the_repository references
rerere.c: remove the_repository references
pack-*.c: remove the_repository references
pack-check.c: remove the_repository references
notes-cache.c: remove the_repository references
line-log.c: remove the_repository reference
diff-lib.c: remove the_repository references
delta-islands.c: remove the_repository references
cache-tree.c: remove the_repository references
bundle.c: remove the_repository references
branch.c: remove the_repository reference
bisect.c: remove the_repository reference
blame.c: remove implicit dependency the_repository
sequencer.c: remove implicit dependency on the_repository
sequencer.c: remove implicit dependency on the_index
transport.c: remove implicit dependency on the_index
notes-merge.c: remove implicit dependency the_repository
notes-merge.c: remove implicit dependency on the_index
list-objects.c: reduce the_repository references
list-objects-filter.c: remove implicit dependency on the_index
...
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is a faithful conversion without attempting to improve
anything. That comes later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
[jc: squashed in missing forward decl in userdiff.h found by Ramsay]
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git grep" learned the "--only-matching" option.
* tb/grep-only-matching:
grep.c: teach 'git grep --only-matching'
grep.c: extract show_line_header()
|
|
"git grep" learned the "--column" option that gives not just the
line number but the column number of the hit.
* tb/grep-column:
contrib/git-jump/git-jump: jump to exact location
grep.c: add configuration variables to show matched option
builtin/grep.c: add '--column' option to 'git-grep(1)'
grep.c: display column number of first match
grep.[ch]: extend grep_opt to allow showing matched column
grep.c: expose {,inverted} match column in match_line()
Documentation/config.txt: camel-case lineNumber for consistency
|
|
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.
* sb/object-store-grafts:
commit: allow lookup_commit_graft to handle arbitrary repositories
commit: allow prepare_commit_graft to handle arbitrary repositories
shallow: migrate shallow information into the object parser
path.c: migrate global git_path_* to take a repository argument
cache: convert get_graft_file to handle arbitrary repositories
commit: convert read_graft_file to handle arbitrary repositories
commit: convert register_commit_graft to handle arbitrary repositories
commit: convert commit_graft_pos() to handle arbitrary repositories
shallow: add repository argument to is_repository_shallow
shallow: add repository argument to check_shallow_file_for_update
shallow: add repository argument to register_shallow
shallow: add repository argument to set_alternate_shallow_file
commit: add repository argument to lookup_commit_graft
commit: add repository argument to prepare_commit_graft
commit: add repository argument to read_graft_file
commit: add repository argument to register_commit_graft
commit: add repository argument to commit_graft_pos
object: move grafts to object parser
object-store: move object access functions to object-store.h
|
|
Teach 'git grep --only-matching', a new option to only print the
matching part(s) of a line.
For instance, a line containing the following (taken from README.md:27):
(`man gitcvs-migration` or `git help cvs-migration` if git is
Is printed as follows:
$ git grep --line-number --column --only-matching -e git -- \
README.md | grep ":27"
README.md:27:7:git
README.md:27:16:git
README.md:27:38:git
The patch works mostly as one would expect, with the exception of a few
considerations that are worth mentioning here.
Like GNU grep, this patch ignores --only-matching when --invert (-v) is
given. There is a sensible answer here, but parity with the behavior of
other tools is preferred.
Because a line might contain more than one match, there are special
considerations pertaining to when to print line headers, newlines, and
how to increment the match column offset. The line header and newlines
are handled as a special case within the main loop to avoid polluting
the surrounding code with conditionals that have large blocks.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The grep code invokes show_line() to display the contents of a matched
or context line in its output. Part of this execution is to print a line
header that includes information such as the kind, the line- and
column-number and etc. of that match.
To prepare for the addition of an option to print only the matching
component(s) of a non-context line, we must prepare for the possibility
that a single line may contain multiple matching parts, and thus will
need multiple headers printed for a single line.
Extracting show_line_header allows us to do just that. In the subsequent
commit, it will be used within the colorization loop to print out only
the matching parts of a line, optionally with LFs delimiting
sub-matches.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Continuing with the idea to programatically enumerate various
pieces of data required for command line completion, teach the
codebase to report the list of configuration variables
subcommands care about to help complete them.
* nd/complete-config-vars:
completion: complete general config vars in two steps
log-tree: allow to customize 'grafted' color
completion: support case-insensitive config vars
completion: keep other config var completion in camelCase
completion: drop the hard coded list of config vars
am: move advice.amWorkDir parsing back to advice.c
advice: keep config name in camelCase in advice_config[]
fsck: produce camelCase config key names
help: add --config to list all available config
fsck: factor out msg_id_info[] lazy initialization code
grep: keep all colors in an array
Add and use generic name->id mapping code for color slot parsing
|
|
To support git-grep(1)'s new option, '--column', document and teach
grep.c how to interpret relevant configuration options, similar to those
associated with '--line-number'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To prepare for 'git grep' learning '--column', teach grep.c's
show_line() how to show the column of the first match on non-context
lines.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To support showing the matched column when calling 'git-grep(1)', teach
'grep_opt' the normal set of options to configure the default behavior
and colorization of this feature.
Now that we have opt->columnnum, use it to disable short-circuiting over
ORs and ANDs so that col and icol are always filled with the earliest
matches on each line. In addition, don't return the first match from
match_line(), for the same reason.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When calling match_line(), callers presently cannot determine the
relative offset of the match because match_line() discards the
'regmatch_t' that contains this information.
Instead, teach match_line() to take in two 'ssize_t's. Fill the first
with the offset of the match produced by the given expression. If
extended, fill the later with the offset of the match produced as if
--invert were given.
For instance, matching "--not -e x" on this line produces a columnar
offset of 0, (i.e., the whole line does not contain an x), but "--invert
--not -e -x" will fill the later ssize_t of the column containing an
"x", because this expression is semantically equivalent to "-e x".
To determine the column for the inverted and non-inverted case, do the
following:
- If matching an atom, the non-inverted column is as given from
match_one_pattern(), and the inverted column is unset.
- If matching a --not, the inverted column and non-inverted column
swap.
- If matching an --and, or --or, the non-inverted column is the
minimum of the two children.
Presently, the existing short-circuiting logic for AND and OR applies as
before. This will change in the following commit when we add options to
configure the --column flag. Taken together, this and the forthcoming
change will always yield the earlier column on a given line.
This patch will become useful when we later pick between the two new
results in order to display the column number of the first match on a
line with --column.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We used to call regfree() after regcomp() failed in some codepaths,
which have been corrected.
* ma/regex-no-regfree-after-comp-fail:
regex: do not call `regfree()` if compilation fails
|
|
Sometimes it helps to list all available config vars so the user can
search for something they want. The config man page can also be used
but it's harder to search if you want to focus on the variable name,
for example.
This is not the best way to collect the available config since it's
not precise. Ideally we should have a centralized list of config in C
code (pretty much like 'struct option'), but that's a lot more work.
This will do for now.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is more inline with how we handle color slots in other code. It
also allows us to get the list of configurable color slots later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is apparently undefined behavior to call `regfree()` on a regex where
`regcomp()` failed. The language in [1] is a bit muddy, at least to me,
but the clearest hint is this (`preg` is the `regex_t *`):
Upon successful completion, the regcomp() function shall return 0.
Otherwise, it shall return an integer value indicating an error as
described in <regex.h>, and the content of preg is undefined.
Funnily enough, there is also the `regerror()` function which should be
given a pointer to such a "failed" `regex_t` -- the content of which
would supposedly be undefined -- and which may investigate it to come up
with a detailed error message.
In any case, the example in that document shows how `regfree()` is not
called after `regcomp()` fails.
We have quite a few users of this API and most get this right. These
three users do not.
Several implementations can handle this just fine [2] and these code paths
supposedly have not wreaked havoc or we'd have heard about it. (These
are all in code paths where git got bad input and is just about to die
anyway.) But let's just avoid the issue altogether.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html
[2] https://www.redhat.com/archives/libvir-list/2013-September/msg00262.html
Researched-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-byi Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae55
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.
Let's just convert all remaining ones in one fell swoop.
This trick was performed by this invocation:
sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add documentation explaining the functions in color.h.
While at it, migrate the function `color_set` into grep.c,
where the only callers are.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git grep" compiled with libpcre2 sometimes triggered a segfault,
which is being fixed.
* ab/pcre2-grep:
grep: fix segfault under -P + PCRE2 <=10.30 + (*NO_JIT)
test-lib: add LIBPCRE1 & LIBPCRE2 prerequisites
|
|
Fix a bug in the compilation of PCRE2 patterns under JIT (the most
common runtime configuration). Any pattern with a (*NO_JIT) verb would
segfault in any currently released PCRE2 version:
$ git grep -P '(*NO_JIT)hi.*there'
Segmentation fault
That this segfaulted was a bug in PCRE2 itself, after reporting it[1]
on pcre-dev it's been fixed in a yet-to-be-released version of
PCRE (presumably released first as 10.31). Now it'll die with:
$ git grep -P '(*NO_JIT)hi.*there'
fatal: pcre2_jit_match failed with error code -45: bad JIT option
But the cause of the bug is in our own code dating back to my
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01).
As explained at more length in the comment being added here, it isn't
sufficient to just check pcre2_config() to see whether the JIT should
be used, pcre2_pattern_info() also has to be asked.
This is something I discovered myself when fiddling around with PCRE2
verbs in patterns passed to git. I don't expect that any user of git
has encountered this given the obscurity of passing PCRE2 verbs
through to the library, along with the relative obscurity of (*NO_JIT)
itself.
1. "How am I supposed to use PCRE2 JIT in the face of (*NO_JIT) ?"
(<CACBZZX5mMqDuWuFmi7sRBp3wH6CFyd-ghACukd=v0NN=rBMnJg@mail.gmail.com> &
https://lists.exim.org/lurker/thread/20171123.101502.7f0d38ca.en.html)
on the pcre-dev mailing list
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Non-empty lines before a function definition are most likely comments
for that function and thus relevant. Include them in function context.
Such a non-empty line might also belong to the preceding function if
there is no separating blank line. Stop extending the context upwards
also at the next function line to make sure only one extra function body
is shown at most.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Function context can be bigger than -A/-B/-C context. To find the
beginning of the combined context we search backwards. Currently we
check at each loop iteration what we're looking for and determine the
effective upper boundary based on that.
Simplify this a bit by setting the variable "from" to the lowest unshown
line number up front if we're looking for a function line and set it
back to the required -B/-C context line number when we find one. This
prepares the ground for the next patch; no functional change intended.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Building with NO_LIBPCRE1_JIT did not disable it, which has been fixed.
* ab/pcre-v2:
grep: fix NO_LIBPCRE1_JIT to fully disable JIT
|
|
If you have a pcre1 library which is compiled with JIT enabled then
PCRE_STUDY_JIT_COMPILE will be defined whether or not the
NO_LIBPCRE1_JIT configuration is set.
This means that we enable JIT functionality when calling pcre_study
even if NO_LIBPCRE1_JIT has been explicitly set and we just use plain
pcre_exec later.
Fix this by using own macro (GIT_PCRE_STUDY_JIT_COMPILE) which we set to
PCRE_STUDY_JIT_COMPILE only if NO_LIBPCRE1_JIT is not set and define to
0 otherwise, as before.
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git grep -L" and "git grep --quiet -L" reported different exit
codes; this has been corrected.
* as/grep-quiet-no-match-exit-code-fix:
git-grep: correct exit code with --quiet and -L
|
|
The handling of `status_only` no longer interferes with the handling of
`unmatch_name_only`. `--quiet` no longer affects the exit code when using
`-L`/`--files-without-match`.
Signed-off-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert grep to use 'struct repository' which enables recursing into
submodules to be handled in-process.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove the redundant REG_NEWLINE regcomp() flag from the code that
compiles a fixed-string regular-expression.
The REG_NEWLINE causes metacharacters such as "." to match a newline,
since the basic_regex_quote_buf() function being called here escapes
all metacharacters using REG_NEWLINE is confusing and redundant.
The use of this flag was introduced as an unintended emergent property
of 793dc676e0 ("grep/icase: avoid kwsset when -F is specified",
2016-06-25).
That change amended the existing regflags, which were initialized to
REG_NEWLINE in init_grep_defaults() assuming a subsequent non-fixed
regcomp().
Manual testing reveals that this was always redundant, since no flags
of any use were inherited from opt->regflags even back
then. 793dc676e0 passes all tests with this on top:
diff --git a/grep.c b/grep.c
index 627ae3e3e8..89e84ed7fd 100644
--- a/grep.c
+++ b/grep.c
@@ -407,3 +407,3 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
basic_regex_quote_buf(&sb, p->pattern);
- regflags = opt->regflags & ~REG_EXTENDED;
+ regflags = 0;
if (opt->ignore_case)
Since this isn't used for anything and never was, remove it to reduce
confusion when reading this code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor calls to the grep machinery to always pass opt.ignore_case &
opt.extended_regexp_option instead of setting the equivalent regflags
bits.
The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log:
make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was
really just plastering over the code smell which this change fixes.
The reason for adding the extensive commentary here is that I
discovered some subtle complexity in implementing this that really
should be called out explicitly to future readers.
Before this change we'd rely on the difference between
`extended_regexp_option` and `regflags` to serve as a membrane between
our preliminary parsing of grep.extendedRegexp and grep.patternType,
and what we decided to do internally.
Now that those two are the same thing, it's necessary to unset
`extended_regexp_option` just before we commit in cases where both of
those config variables are set. See 84befcd0a4 ("grep: add a
grep.patternType configuration setting", 2012-08-03) for the code and
documentation related to that.
The explanation of why the if/else branches in
grep_commit_pattern_type() are ordered the way they are exists in that
commit message, but I think it's worth calling this subtlety out
explicitly with a comment for future readers.
Even though grep_commit_pattern_type() is the only caller of
grep_set_pattern_type_option() it's simpler to reset the
extended_regexp_option flag in the latter, since 2/3 branches in the
former would otherwise need to reset it, this way we can do it in one
place.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove the redundant re-assignments of the fixed/pcre1/pcre2 fields to
zero right after the entire struct has been set to zero via
memset(...).
See an earlier related cleanup commit e0b9f8ae09 ("grep: remove
redundant regflags assignments", 2017-05-25) for an explanation of why
the code was structured like this to begin with.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove the redundant re-assignment of the fixed field to zero right
after the entire struct has been set to zero via memset(...).
Unlike some nearby commits this pattern doesn't date back to the
pattern described in e0b9f8ae09 ("grep: remove redundant regflags
assignments", 2017-05-25), instead it was apparently cargo-culted in
9eceddeec6 ("Use kwset in grep", 2011-08-21).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adjust a now-redundant assignment to extended_regexp_option to make it
zero if grep.extendedRegexp is not set. This is always called right
after init_grep_defaults() which memsets the entire structure to 0, so
there's no need to set it again to zero.
However the reason for the if/else pattern is a holdover from[1] where
this was adjusted from a bitfield assignment to a boolean. Rather than
getting rid of the assignment to 0 in all cases, let's just use the
value returned by git_config_bool(), which is more idiomatic and in
sync with the rest of the boolean handling in this function.
This is a logical follow-up to my commit to remove redundant regflags
assignments[2]. This logic was originally introduced in [3], but as
explained in the former commit it's working around a pattern in our
code that no longer exists, and is now confusing as it leads the
reader to think that this needs to be flipped back & forth.
1. 84befcd0a4 ("grep: add a grep.patternType configuration setting",
2012-08-03)
2. e0b9f8ae09 ("grep: remove redundant regflags assignments",
2017-05-25)
3. b22520a37c ("grep: allow -E and -n to be turned on by default via
configuration", 2011-03-30)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop assigning 0 to the extended_regexp_option field right after we've
zeroed out the entire struct with memset() just a few lines earlier.
Unlike some of the code being refactored in subsequent commits, this
was always completely redundant. See the original code introduced in
84befcd0a4 ("grep: add a grep.patternType configuration setting",
2012-08-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
|
|
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
|
|
Hotfix for a topic already in 'master'.
* ab/pcre-v2:
grep: fix erroneously copy/pasted variable in check/assert pattern
|
|
Fix an erroneously copy/pasted check for the pcre2_jit_stack variable
to check pcre2_match_context instead. The former was already checked
in the preceding "if" statement.
This is a trivial and obvious error introduced in my commit
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01).
In practice if pcre2_match_context_create() returned NULL we were
likely in a situation where malloc() was returning NULL, and were thus
screwed anyway, but if only the pcre2_match_context_create() call
returned NULL (through some transitory bug) PCRE v2 would just
allocate and supply its own context object when matching, and we'd run
normally at the trivial expense of not getting a slight speedup by
sharing the context object between successive matches.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Conversion from uchar[20] to struct object_id continues.
* bw/object-id: (33 commits)
diff: rename diff_fill_sha1_info to diff_fill_oid_info
diffcore-rename: use is_empty_blob_oid
tree-diff: convert path_appendnew to object_id
tree-diff: convert diff_tree_paths to struct object_id
tree-diff: convert try_to_follow_renames to struct object_id
builtin/diff-tree: cleanup references to sha1
diff-tree: convert diff_tree_sha1 to struct object_id
notes-merge: convert write_note_to_worktree to struct object_id
notes-merge: convert verify_notes_filepair to struct object_id
notes-merge: convert find_notes_merge_pair_ps to struct object_id
notes-merge: convert merge_from_diffs to struct object_id
notes-merge: convert notes_merge* to struct object_id
tree-diff: convert diff_root_tree_sha1 to struct object_id
combine-diff: convert find_paths_* to struct object_id
combine-diff: convert diff_tree_combined to struct object_id
diff: convert diff_flush_patch_id to struct object_id
patch-ids: convert to struct object_id
diff: finish conversion for prepare_temp_file to struct object_id
diff: convert reuse_worktree_file to struct object_id
diff: convert fill_filespec to struct object_id
...
|
|
Replace occurrences of `free(ptr); ptr = NULL` which weren't caught by
the coccinelle rule. These fall into two categories:
- free/NULL assignments one after the other which coccinelle all put
on one line, which is functionally equivalent code, but very ugly.
- manually spotted occurrences where the NULL assignment isn't right
after the free() call.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert the remaining parts of grep to use struct object_id.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].
The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.
Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.
With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.
However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.
Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:
$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~5 HEAD~ HEAD p7820-grep-engines.sh
[...]
Test HEAD~5 HEAD~ HEAD
-----------------------------------------------------------------------------------------------------------------
7820.3: perl grep 'how.to' 0.31(1.10+0.48) 0.21(0.35+0.56) -32.3% 0.21(0.34+0.55) -32.3%
7820.7: perl grep '^how to' 0.56(2.70+0.40) 0.24(0.64+0.52) -57.1% 0.20(0.28+0.60) -64.3%
7820.11: perl grep '[how] to' 0.56(2.66+0.38) 0.29(0.95+0.45) -48.2% 0.23(0.45+0.54) -58.9%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.02(5.77+0.42) 0.31(1.02+0.54) -69.6% 0.23(0.50+0.54) -77.5%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.38(1.57+0.42) 0.27(0.85+0.46) -28.9% 0.21(0.33+0.57) -44.7%
See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.
Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.
For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:
[...]
Test HEAD~ HEAD
----------------------------------------------------------------------------------------
7820.3: perl grep 'how.to' 0.21(0.42+0.52) 0.21(0.31+0.58) +0.0%
7820.7: perl grep '^how to' 0.25(0.65+0.50) 0.20(0.31+0.57) -20.0%
7820.11: perl grep '[how] to' 0.30(0.90+0.50) 0.23(0.46+0.53) -23.3%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.30(1.19+0.38) 0.23(0.51+0.51) -23.3%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.27(0.84+0.48) 0.21(0.34+0.57) -22.2%
I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.
A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.
See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.
1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.32.
The JIT support was added in version 8.20 released on 2011-10-21, but
it wasn't until 8.32 released on 2012-11-30 that the fast code path to
use the JIT via pcre_jit_exec() was added[1] (see also [2]).
This means that versions 8.20 through 8.31 could still use the JIT,
but supporting it on those versions would add to the already verbose
macro soup around JIT support it, and I don't expect that the use-case
of compiling a brand new git against a 5 year old PCRE is particularly
common, and if someone does that they can just get the existing
pre-JIT slow codepath.
So just take the easy way out and disable the JIT on any version older
than 8.32.
The reason this change isn't part of the initial change PCRE JIT
support is to have a cleaner history showing which parts of the
implementation are only used for ancient PCRE versions. This also
makes it easier to revert this change if we ever decide to stop
supporting those old versions.
1. http://www.pcre.org/original/changelog.txt ("28. Introducing a
native interface for JIT. Through this interface, the
compiled[...]")
2. https://bugs.exim.org/show_bug.cgi?id=2121
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.
Enabling JIT support usually improves performance by more than
40%. The pattern compilation times are relatively slower, but those
relative numbers are tiny, and are easily made back in all but the
most trivial cases of grep. Detailed benchmarks & overview of
compilation times is at: http://sljit.sourceforge.net/pcre.html
With this change the difference in a t/perf/p7820-grep-engines.sh run
is, with just the /perl/ tests shown:
$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh
Test HEAD~ HEAD
---------------------------------------------------------------------------------------
7820.3: perl grep 'how.to' 0.35(1.11+0.43) 0.23(0.42+0.46) -34.3%
7820.7: perl grep '^how to' 0.64(2.71+0.36) 0.27(0.66+0.44) -57.8%
7820.11: perl grep '[how] to' 0.63(2.51+0.42) 0.33(0.98+0.39) -47.6%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.17(5.61+0.35) 0.34(1.08+0.46) -70.9%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.43(1.52+0.44) 0.30(0.88+0.42) -30.2%
The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.
The implementation is relatively verbose because even if
PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine
if the JIT is available, and if so the faster pcre_jit_exec() function
should be called instead of pcre_exec(), and a different (but not
complimentary!) function needs to be called to free pcre1_extra_info.
There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will simply abort. I don't think
this is worth handling gracefully, it'll only fail in cases where
malloc() doesn't work, in which case we're screwed anyway.
That there's no assignment of `p->pcre1_jit_on = 0` when
PCRE_CONFIG_JIT isn't defined isn't a bug. The create_grep_pat()
function allocates the grep_pat allocates it with calloc(), so it's
guaranteed to be 0 when PCRE_CONFIG_JIT isn't defined.
I you're bisecting and find this change, check that your PCRE isn't
older than 8.32. This change intentionally broke really old versions
of PCRE, but that's fixed in follow-up commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the is_fixed() function which are currently only used in
compile_regexp() earlier so it can be used in the PCRE family of
functions in a later change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the internal PCRE variable & function names to have a "1"
suffix. This is for preparation for libpcre2 support, where having
non-versioned names would be confusing.
An earlier change in this series ("grep: change the internal PCRE
macro names to be PCRE1", 2017-04-07) elaborates on the motivations
behind this change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the internal USE_LIBPCRE define, & build options flag to use a
naming convention ending in PCRE1, without changing the long-standing
USE_LIBPCRE Makefile flag which enables this code.
This is for preparation for libpcre2 support where having things like
USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
need to for backwards compatibility with old Makefile arguments would
be confusing.
In some ways it would be better to change everything that now uses
USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
build scripts.
However I'd like to leave the door open to making
USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
the default.
This code and the USE_LIBPCRE Makefile argument was added in commit
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was
no indication that the PCRE project would release an entirely new &
incompatible API around 3 years later.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Factor the test for \0 in grep patterns into a function. Since commit
9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a
\0 is considered fixed as regcomp() can't handle it.
This change makes later changes that make use of either has_null() or
is_fixed() (but not both) smaller.
While I'm at it make the comment conform to the style guide, i.e. add
an opening "/*\n".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove redundant assignments to the "regflags" variable. This variable
is only used set under GREP_PATTERN_TYPE_ERE, so there's no need to
un-set it under GREP_PATTERN_TYPE_{FIXED,BRE,PCRE}.
Back in 5010cb5fcc[1], we did do "opt.regflags &= ~REG_EXTENDED" upon
seeing "-G" on the command line and flipped the bit on upon seeing
"-E", but I think that was perfectly sensible and it would have been a
bug if we didn't. They were part of the command line parsing that
could have seen "-E" on the command line earlier.
When cca2c172 ("git-grep: do not die upon -F/-P when
grep.extendedRegexp is set.", 2011-05-09) switched the command line
parsing to "read into a 'tentatively this is what we saw the last'
variable and then finally commit just once", we didn't touch
opt.regflags for PCRE and FIXED, but we still had to flip regflags
between BRE and ERE, because parsing of grep.extendedregexp
configuration variable directly touched opt.regflags back then, which
was done by b22520a3 ("grep: allow -E and -n to be turned on by
default via configuration", 2011-03-30).
When 84befcd0 ("grep: add a grep.patternType configuration setting",
2012-08-03) introduced extended_regexp_option field, we stopped
flipping regflags while reading the configuration, and that was when
we should have noticed and stopped dropping REG_EXTENDED bit in the
"now we can commit what type to use" helper function.
There is no reason to do this anymore, so stop doing it, more to
reduce "wait this is used under fixed/BRE/PCRE how?" confusion when
reading the code, than to to save ourselves trivial CPU cycles by
removing one assignment.
1. "built-in "git grep"", 2006-04-30.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These calls to snprintf should always succeed, because their
input is small and fixed. Let's use xsnprintf to make sure
this is the case (and to make auditing for actual truncation
easier).
These could be candidates for turning into heap buffers, but
they fall into a few broad categories that make it not worth
doing:
- formatting single numbers is simple enough that we can
see the result should fit
- the size of a sha1 is likewise well-known, and I didn't
want to cause unnecessary conflicts with the ongoing
process to convert these constants to GIT_MAX_HEXSZ
- the interface for curl_errorstr is dictated by curl
Signed-off-by: Jeff King <peff@peff.net>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.
When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd). If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region. This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.
* js/regexec-buf:
regex: use regexec_buf()
regex: add regexec_buf() that can work on a non NUL-terminated string
regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
|
|
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.
We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).
Note: the original motivation for this patch was to fix a bug where
`git diff -G <regex>` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3). By converting them to use regexec_buf(), the code has
become much cleaner.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git am -3" calls "git merge-recursive" when it needs to fall back
to a three-way merge; this call has been turned into an internal
subroutine call instead of spawning a separate subprocess.
* js/am-3-merge-recursive-direct:
merge-recursive: flush output buffer even when erroring out
merge_trees(): ensure that the callers release output buffer
merge-recursive: offer an option to retain the output in 'obuf'
merge-recursive: write the commit title in one go
merge-recursive: flush output buffer before printing error messages
am -3: use merge_recursive() directly again
merge-recursive: switch to returning errors instead of dying
merge-recursive: handle return values indicating errors
merge-recursive: allow write_tree_from_memory() to error out
merge-recursive: avoid returning a wholesale struct
merge_recursive: abort properly upon errors
prepare the builtins for a libified merge_recursive()
merge-recursive: clarify code in was_tracked()
die(_("BUG")): avoid translating bug messages
die("bug"): report bugs consistently
t5520: verify that `pull --rebase` shows the helpful advice when failing
|
|
"git -c grep.patternType=extended log --basic-regexp" misbehaved
because the internal API to access the grep machinery was not
designed well.
* jc/grep-commandline-vs-configuration:
grep: further simplify setting the pattern type
|
|
The vast majority of error messages in Git's source code which report a
bug use the convention to prefix the message with "BUG:".
As part of cleaning up merge-recursive to stop die()ing except in case of
detected bugs, let's just make the remainder of the bug reports consistent
with the de facto rule.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When c5c31d33 (grep: move pattern-type bits support to top-level
grep.[ch], 2012-10-03) introduced grep_commit_pattern_type() helper
function, the intention was to allow the users of grep API to having
to fiddle only with .pattern_type_option (which can be set to "fixed",
"basic", "extended", and "pcre"), and then immediately before compiling
the pattern strings for use, call grep_commit_pattern_type() to have
it prepare various bits in the grep_opt structure (like .fixed,
.regflags, etc.).
However, grep_set_pattern_type_option() helper function the grep API
internally uses were left as an external function by mistake. This
function shouldn't have been made callable by the users of the API.
Later when the grep API was used in revision traversal machinery,
the caller then mistakenly started calling the function around
34a4ae55 (log --grep: use the same helper to set -E/-F options as
"git grep", 2012-10-03), instead of setting the .pattern_type_option
field and letting the grep_commit_pattern_type() to take care of the
details.
This caused an unnecessary bug that made a configured
grep.patternType take precedence over the command line options
(e.g. --basic-regexp, --fixed-strings) in "git log" family of
commands.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git grep -i" has been taught to fold case in non-ascii locales
correctly.
* nd/icase:
grep.c: reuse "icase" variable
diffcore-pickaxe: support case insensitive match on non-ascii
diffcore-pickaxe: Add regcomp_or_die()
grep/pcre: support utf-8
gettext: add is_utf8_locale()
grep/pcre: prepare locale-dependent tables for icase matching
grep: rewrite an if/else condition to avoid duplicate expression
grep/icase: avoid kwsset when -F is specified
grep/icase: avoid kwsset on literal non-ascii strings
test-regex: expose full regcomp() to the command line
test-regex: isolate the bug test code
grep: break down an "if" stmt in preparation for next changes
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the previous change in this function, we add locale support for
single-byte encodings only. It looks like pcre only supports utf-* as
multibyte encodings, the others are left in the cold (which is
fine).
We need to enable PCRE_UTF8 so pcre can find character boundary
correctly. It's needed for case folding (when --ignore-case is used)
or '*', '+' or similar syntax is used.
The "has_non_ascii()" check is to be on the conservative side. If
there's non-ascii in the pattern, the searched content could still be
in utf-8, but we can treat it just like a byte stream and everything
should work. If we force utf-8 based on locale only and pcre validates
utf-8 and the file content is in non-utf8 encoding, things break.
Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The default tables are usually built with C locale and only suitable
for LANG=C or similar. This should make case insensitive search work
correctly for all single-byte charsets.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"!icase || ascii_only" is repeated twice in this if/else chain as this
series evolves. Rewrite it (and basically revert the first if
condition back to before the "grep: break down an "if" stmt..." commit).
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Similar to the previous commit, we can't use kws on icase search
outside ascii range. But we can't simply pass the pattern to
regcomp/pcre like the previous commit because it may contain regex
special characters, so we need to quote the regex first.
To avoid misquote traps that could lead to undefined behavior, we
always stick to basic regex engine in this case. We don't need fancy
features for grepping a literal string anyway.
basic_regex_quote_buf() assumes that if the pattern is in a multibyte
encoding, ascii chars must be unambiguously encoded as single
bytes. This is true at least for UTF-8. For others, let's wait until
people yell up. Chances are nobody uses multibyte, non utf-8 charsets
anymore.
Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we detect the pattern is just a literal string, we avoid heavy
regex engine and use fast substring search implemented in kwsset.c.
But kws uses git-ctype which is locale-independent so it does not know
how to fold case properly outside ascii range. Let regcomp or pcre
take care of this case instead. Slower, but accurate.
Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git show -W" (extend hunks to cover the entire function, delimited
by lines that match the "funcname" pattern) used to show the entire
file when a change added an entire function at the end of the file,
which has been fixed.
* rs/xdiff-hunk-with-func-line:
xdiff: fix merging of appended hunk with -W
grep: -W: don't extend context to trailing empty lines
t7810: add test for grep -W and trailing empty context lines
xdiff: don't trim common tail with -W
xdiff: -W: don't include common trailing empty lines in context
xdiff: ignore empty lines before added functions with -W
xdiff: handle appended chunks better with -W
xdiff: factor out match_func_rec()
t4051: rewrite, add more tests
|
|
Empty lines between functions are shown by grep -W, as it considers them
to be part of the function preceding them. They are not interesting in
most languages. The previous patches stopped showing them for diff -W.
Stop showing empty lines trailing a function with grep -W. Grep scans
the lines of a buffer from top to bottom and prints matching lines
immediately. Thus we need to peek ahead in order to determine if an
empty line is part of a function body and worth showing or not.
Remember how far ahead we peeked in order to avoid having to do so
repeatedly when handling multiple consecutive empty lines.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While at there, improve the error message a bit (what operation failed?)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We frequently allocate strings as xmalloc(len + 1), where
the extra 1 is for the NUL terminator. This can be done more
simply with xmallocz, which also checks for integer
overflow.
There's no case where switching xmalloc(n+1) to xmallocz(n)
is wrong; the result is the same length, and malloc made no
guarantees about what was in the buffer anyway. But in some
cases, we can stop manually placing NUL at the end of the
allocated buffer. But that's only safe if it's clear that
the contents will always fill the buffer.
In each case where this patch does so, I manually examined
the control flow, and I tried to err on the side of caution.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To set up default colors, we sometimes strcpy() from the
default string literals into our color buffers. This isn't a
bug (assuming the destination is COLOR_MAXLEN bytes), but
makes it harder to audit the code for problematic strcpy
calls.
Let's introduce a color_set which copies under the
assumption that there are COLOR_MAXLEN bytes in the
destination (of course you can call it on a smaller buffer,
so this isn't providing a huge amount of safety, but it's
more convenient than calling xsnprintf yourself).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|