aboutsummaryrefslogtreecommitdiffstats
path: root/grep.h
AgeCommit message (Collapse)AuthorFilesLines
2025-09-16color: use git_colorbool enum type to store colorboolsJeff King1-1/+1
We traditionally used "int" to store and pass around the values defined by "enum git_colorbool" (which were originally just #define macros). Using an int doesn't produce incorrect results, but using the actual enum makes the intent of the code more clear. It would be nice if the compiler could catch cases where we used the enum and an int interchangeably, since it's very easy to accidentally check the boolean true/false of a colorbool like: if (branch_use_color) This is wrong because GIT_COLOR_UNKNOWN and GIT_COLOR_AUTO evaluate to true in C, even though we may ultimately decide not to use color. But C is pretty happy to convert between ints and enums (even with various -Wenum-* warnings). So this sadly doesn't protect us from such mistakes, but it hopefully does make the code easier to read. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-16color: use GIT_COLOR_* instead of numeric constantsJeff King1-1/+1
Long ago Git's decision to show color for a subsytem was stored in a tri-state variable: it could be true (1), false (0), or unknown (-1). But since daa0c3d971 (color: delay auto-color decision until point of use, 2011-08-17) we want to carry around a new state, "auto", which bases the decision on the tty-ness of stdout (rather than collapsing that "auto" state to a true/false immediately). That commit introduced a set of GIT_COLOR_* defines to represent each state: UNKNOWN, ALWAYS, NEVER, and AUTO. But it only used the AUTO value, and left alone code using bare 0/1/-1 values. And of course since then we've grown many new spots that use those bare values. Let's switch all of these to use the named constants. That should make the code a bit easier to read, as it is more obvious that we're representing a color decision. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28config: add ctx arg to config_fn_tGlen Choo1-1/+3
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-23grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34Mathias Krause1-0/+3
Stephane is reporting[1] a regression introduced in git v2.40.0 that leads to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an older version of libpcre2 that triggers a wild pointer dereference in the generated JIT code that was fixed in PCRE2 10.35. Instead of completely disabling the JIT compiler for the buggy version, just mask out the Unicode property handling as we used to do prior to commit acabd2048ee0 ("grep: correctly identify utf-8 characters with \{b,w} in -P"). [1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@clumio.com/ Reported-by: Stephane Odul <stephane@clumio.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21Merge branch 'ab/grep-simplify-extended-expression'Junio C Hamano1-1/+0
Giving "--invert-grep" and "--all-match" without "--grep" to the "git log" command resulted in an attempt to access grep pattern expression structure that has not been allocated, which has been corrected. * ab/grep-simplify-extended-expression: grep.c: remove "extended" in favor of "pattern_expression", fix segfault
2022-10-11grep.c: remove "extended" in favor of "pattern_expression", fix segfaultÆvar Arnfjörð Bjarmason1-1/+0
Since 79d3696cfb4 (git-grep: boolean expression on pattern matching., 2006-06-30) the "pattern_expression" member has been used for complex queries (AND/OR...), with "pattern_list" being used for the simple OR queries. Since then we've used both "pattern_expression" and its associated boolean "extended" member to see if we have a complex expression. Since f41fb662f57 (revisions API: have release_revisions() release "grep_filter", 2022-04-13) we've had a subtle bug relating to that: If we supplied options that were only used for "complex queries", but didn't supply the query itself we'd set "opt->extended", but would have a NULL "pattern_expression". As a result these would segfault as we tried to call "free_grep_patterns()" from "release_revisions()": git -P log -1 --invert-grep git -P log -1 --all-match The root cause of this is that we were conflating the state management we needed in "compile_grep_patterns()" itself with whether or not we had an "opt->pattern_expression" later on. In this cases as we're going through "compile_grep_patterns()" we have no "opt->pattern_list" but have "opt->no_body_match" or "opt->all_match". So we'd set "opt->extended = 1", but not "return" on "opt->extended" as that's an "else if" in the same "if" statement. That behavior is intentional and required, as the common case is that we have an "opt->pattern_list" that we're about to parse into the "opt->pattern_expression". But we don't need to keep track of this "extended" flag beyond the state management in compile_grep_patterns() itself. It needs it, but once we're out of that function we can rely on "opt->pattern_expression" being non-NULL instead for using these extended patterns. As 79d3696cfb4 itself shows we've assumed that there's a one-to-one mapping between the two since the very beginning. I.e. "match_line()" would check "opt->extended" to see if it should call "match_expr()", and the first thing we do in that function is assume that we have a "opt->pattern_expression". We'd then call "match_expr_eval()", which would have died if that "opt->pattern_expression" was NULL. The "die" was added in c922b01f54c (grep: fix segfault when "git grep '('" is given, 2009-04-27), and can now be removed as it's now clearly unreachable. We still do the right thing in the case that prompted that fix: git grep '(' fatal: unmatched parenthesis Arguably neither the "--invert-grep" option added in [1] nor the earlier "--all-match" option added in [2] were intended to be used stand-alone, and another approach[3] would be to error out in those cases. But since we've been treating them as a NOOP when given without --grep for a long time let's keep doing that. We could also return in "free_pattern_expr()" if the argument is non-NULL, as an alternative fix for this segfault does [4]. That would be more elegant in making the "free_*()" function behave like "free()", but it would also remove a sanity check: The "free_pattern_expr()" function calls itself recursively, and only the top-level is allowed to be NULL, let's not conflate those two conditions. 1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12) 2. 0ab7befa31d (grep --all-match, 2006-09-27) 3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@gmail.com/ 4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@ttaylorr.com Reported-by: orygaw <orygaw@protonmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22grep: add --max-count command line optionCarlos López1-0/+2
This patch adds a command line option analogous to that of GNU grep(1)'s -m / --max-count, which users might already be used to. This makes it possible to limit the amount of matches shown in the output while keeping the functionality of other options such as -C (show code context) or -p (show containing function), which would be difficult to do with a shell pipeline (e.g. head(1)). Signed-off-by: Carlos López 00xc@protonmail.com Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15grep: simplify config parsing and option parsingÆvar Arnfjörð Bjarmason1-3/+0
Simplify the parsing of "grep.patternType" and "grep.extendedRegexp". This changes no behavior, but gets rid of complex parsing logic that isn't needed anymore. When "grep.patternType" was introduced in 84befcd0a4a (grep: add a grep.patternType configuration setting, 2012-08-03) we promised that: 1. You can set "grep.patternType", and "[setting it to] 'default' will return to the default matching behavior". In that context "the default" meant whatever the configuration system specified before that change, i.e. via grep.extendedRegexp. 2. We'd support the existing "grep.extendedRegexp" option, but ignore it when the new "grep.patternType" option is set. We said we'd only ignore the older "grep.extendedRegexp" option "when the `grep.patternType` option is set to a value other than 'default'". In a preceding commit we changed grep_config() to be called after grep_init(), which means that much of the complexity here can go away. As before both "grep.patternType" and "grep.extendedRegexp" are last-one-wins variable, with "grep.extendedRegexp" yielding to "grep.patternType", except when "grep.patternType=default". Note that as the previously added tests indicate this cannot be done on-the-fly as we see the config variables, without introducing more state keeping. I.e. if we see: -c grep.extendedRegexp=false -c grep.patternType=default -c extendedRegexp=true We need to select ERE, since grep.patternType=default unselects that variable, which normally has higher precedence, but we also need to select BRE in cases of: -c grep.extendedRegexp=true \ -c grep.extendedRegexp=false Which would not be the case for this, which select ERE: -c grep.patternType=extended \ -c grep.extendedRegexp=false Therefore we cannot do this on-the-fly in grep_config without also introducing tracking variables for not only the pattern type, but what the source of that pattern type was. So we need to decide on the pattern after our config was fully parsed. Let's do that by deferring the decision on the pattern type until it's time to compile it in compile_regexp(). By that time we've not only parsed the config, but also handled the command-line options. Those will set "opt.pattern_type_option" (*not* "opt.extended_regexp_option"!). At that point all we need to do is see if "grep.patternType" was UNSPECIFIED in the end (including an explicit "=default"), if so we'll use the "grep.extendedRegexp" configuration, if any. See my 07a3d411739 (grep: remove regflags from the public grep_opt API, 2017-06-29) for addition of the two comments being removed here, i.e. the complexity noted in that commit is now going away. 1. https://lore.kernel.org/git/patch-v8-09.10-c211bb0c69d-20220118T155211Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15grep.h: make "grep_opt.pattern_type_option" use its enumÆvar Arnfjörð Bjarmason1-1/+1
Change the "pattern_type_option" member of "struct grep_opt" to use the enum type we use for it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15grep API: call grep_config() after grep_init()Ævar Arnfjörð Bjarmason1-0/+21
The grep_init() function used the odd pattern of initializing the passed-in "struct grep_opt" with a statically defined "grep_defaults" struct, which would be modified in-place when we invoked grep_config(). So we effectively (b) initialized config, (a) then defaults, (c) followed by user options. Usually those are ordered as "a", "b" and "c" instead. As the comments being removed here show the previous behavior needed to be carefully explained as we'd potentially share the populated configuration among different instances of grep_init(). In practice we didn't do that, but now that it can't be a concern anymore let's remove those comments. This does not change the behavior of any of the configuration variables or options. That would have been the case if we didn't move around the grep_config() call in "builtin/log.c". But now that we call "grep_config" after "git_log_config" and "git_format_config" we'll need to pass in the already initialized "struct grep_opt *". See 6ba9bb76e02 (grep: copy struct in one fell swoop, 2020-11-29) and 7687a0541e0 (grep: move the configuration parsing logic to grep.[ch], 2012-10-09) for the commits that added the comments. The memcpy() pattern here will be optimized away and follows the convention of other *_init() functions. See 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT macro, 2021-07-01). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15built-ins: trust the "prefix" from run_builtin()Ævar Arnfjörð Bjarmason1-3/+1
Change code in "builtin/grep.c" and "builtin/ls-tree.c" to trust the "prefix" passed from "run_builtin()". The "prefix" we get from setup.c is either going to be NULL or a string of length >0, never "". So we can drop the "prefix && *prefix" checks added for "builtin/grep.c" in 0d042fecf2f (git-grep: show pathnames relative to the current directory, 2006-08-11), and for "builtin/ls-tree.c" in a69dd585fca (ls-tree: chomp leading directories when run from a subdirectory, 2005-12-23). As seen in code in revision.c that was added in cd676a51367 (diff --relative: output paths as relative to the current subdirectory, 2008-02-12) we already have existing code that does away with this assertion. This makes it easier to reason about a subsequent change to the "prefix_length" code in grep.c in a subsequent commit, and since we're going to the trouble of doing that let's leave behind an assert() to promise this to any future callers. For "builtin/grep.c" it would be painful to pass the "prefix" down the callchain of: cmd_grep -> grep_tree -> grep_submodule -> grep_cache -> grep_oid -> grep_source_name So for the code that needs it in grep_source_name() let's add a "grep_prefix" variable similar to the existing "ls_tree_prefix". While at it let's move the code in cmd_ls_tree() around so that we assign to the "ls_tree_prefix" right after declaring the variables, and stop assigning to "prefix". We only subsequently used that variable later in the function after clobbering it. Let's just use our own "grep_prefix" instead. Let's also add an assert() in git.c, so that we'll make this promise about the "prefix" to any current and future callers, as well as to any readers of the code. Code history: * The strlen() in "grep.c" hasn't been used since 493b7a08d80 (grep: accept relative paths outside current working directory, 2009-09-05). When that code was added in 0d042fecf2f (git-grep: show pathnames relative to the current directory, 2006-08-11) we used the length. But since 493b7a08d80 we haven't used it for anything except a boolean check that we could have done on the "prefix" member itself. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15grep.h: remove unused "regex_t regexp" from grep_optÆvar Arnfjörð Bjarmason1-1/+0
This "regex_t" in grep_opt has not been used since f9b9faf6f8a (builtin-grep: allow more than one patterns., 2006-05-02), we still use a "regex_t" for compiling regexes, but that's in the "grep_pat" struct". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-17log: let --invert-grep only invert --grepRené Scharfe1-0/+2
The option --invert-grep is documented to filter out commits whose messages match the --grep filters. However, it also affects the header matches (--author, --committer), which is not intended. Move the handling of that option to grep.c, as only the code there can distinguish between matches in the header from those in the message body. If --invert-grep is given then enable extended expressions (not the regex type, we just need git grep's --not to work), negate the body patterns and check if any of them match by piggy-backing on the collect_hits mechanism of grep_source_1(). Collecting the matches in struct grep_opt is a bit iffy, but with "last_shown" we have a precedent for writing state information to that struct. Reported-by: Dotan Cohen <dotancohen@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-01Merge branch 'hm/paint-hits-in-log-grep'Junio C Hamano1-0/+9
"git log --grep=string --author=name" learns to highlight hits just like "git grep string" does. * hm/paint-hits-in-log-grep: grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data pretty: colorize pattern matches in commit messages grep: refactor next_match() and match_one_pattern() for external use
2021-10-13Merge branch 'mt/grep-submodule-textconv'Junio C Hamano1-3/+3
"git grep --recurse-submodules" takes trees and blobs from the submodule repository, but the textconv settings when processing a blob from the submodule is not taken from the submodule repository. A test is added to demonstrate the issue, without fixing it. * mt/grep-submodule-textconv: grep: demonstrate bug with textconv attributes and submodules
2021-10-06Merge branch 'ab/retire-decl-of-missing-unused-funcs'Junio C Hamano1-1/+0
Remove external declaration of functions that no longer exist. * ab/retire-decl-of-missing-unused-funcs: config.h: remove unused git_config_get_untracked_cache() declaration log-tree.h: remove unused function declarations grep.h: remove unused grep_threads_ok() declaration builtin.h: remove cmd_tar_tree() declaration
2021-10-01grep.h: remove unused grep_threads_ok() declarationÆvar Arnfjörð Bjarmason1-1/+0
This function was removed in 0579f91dd74 (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), but not its corresponding *.h declaration. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29grep: refactor next_match() and match_one_pattern() for external useHamza Mahfooz1-0/+9
These changes are made in preparation of, the colorization support for the "git log" subcommands that, rely on regex functionality (i.e. "--author", "--committer" and "--grep"). These changes are necessary primarily because match_one_pattern() expects header lines to be prefixed, however, in pretty, the prefixes are stripped from the lines because the name-email pairs need to go through additional parsing, before they can be printed and because next_match() doesn't handle the case of "ctx == GREP_CONTEXT_HEAD" at all. So, teach next_match() how to handle the new case and move match_one_pattern()'s core logic to headerless_match_one_pattern() while preserving match_one_pattern()'s uses that depend on the additional processing. Signed-off-by: Hamza Mahfooz <someguy@effective-light.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29grep: demonstrate bug with textconv attributes and submodulesMatheus Tavares1-3/+3
In some circumstances, "git grep --textconv --recurse-submodules" ignores the textconv attributes from the submodules and erroneously applies the attributes defined in the superproject on the submodules' files. The textconv cache is also saved on the superproject, even for submodule objects. A fix for these problems will probably require at least three changes: - Some textconv and attributes functions (as well as their callees) will have to be adjusted to work with arbitrary repositories. Note that "fill_textconv()", for example, already receives a "struct repository" but it writes the textconv cache using "write_loose_object()", which implicitly works on "the_repository". - grep.c functions will have to call textconv/userdiff routines passing the "repo" field from "struct grep_source" instead of the one from "struct grep_opt". The latter always points to "the_repository" on "git grep" executions (see its initialization in builtin/grep.c), but the former points to the correct repository that each source (an object, file, or buffer) comes from. - "userdiff_find_by_path()" might need to use a different attributes stack for each repository it works on or reset its internal static stack when the repository is changed throughout the calls. For now, let's add some tests to demonstrate these problems, and also update a NEEDSWORK comment in grep.h that mentions this bug to reference the added tests. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-22grep: store grep_source buffer as constJeff King1-2/+2
Our grep_buffer() function takes a non-const buffer, which is confusing: we don't take ownership of nor write to the buffer. This mostly comes from the fact that the underlying grep_source struct in which we store the buffer uses non-const pointer. The memory pointed to by the struct is sometimes owned by us (for FILE or OID sources), and sometimes not (for BUF sources). Let's store it as const, which lets us err on the side of caution (i.e., the compiler will warn us if any of our code writes to or tries to free it). As a result, we must annotate the one place where we do free it by casting away the constness. But that's a small price to pay for the extra safety and clarity elsewhere (and indeed, it already had a comment explaining why GREP_SOURCE_BUF _didn't_ free it). And then we can mark grep_buffer() as taking a const buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08grep: add repository to OID grep sourcesJonathan Tan1-1/+16
Record the repository whenever an OID grep source is created, and teach the worker threads to explicitly provide the repository when accessing objects. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08grep: typesafe versions of grep_source_initJonathan Tan1-3/+4
grep_source_init() can create "struct grep_source" objects and, depending on the value of the type passed, some void-pointer parameters have different meanings. Because one of these types (GREP_SOURCE_OID) will require an additional parameter in a subsequent patch, take the opportunity to increase clarity and type safety by replacing this function with individual functions for each type. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17grep/pcre2: move back to thread-only PCREv2 structuresÆvar Arnfjörð Bjarmason1-1/+2
Change the setup of the "pcre2_general_context" to happen per-thread in compile_pcre2_pattern() instead of in grep_init(). This change brings it in line with how the rest of the pcre2_* members in the grep_pat structure are set up. As noted in the preceding commit the approach 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16) took to allocate the pcre2_general_context seems to have been initially based on a misunderstanding of how PCREv2 memory allocation works. The approach of creating a global context in grep_init() is just added complexity for almost zero gain. On my system it's 24 bytes saved per-thread. For comparison PCREv2 will then go on to allocate at least a kilobyte for its own thread-local state. As noted in 6d423dd542f (grep: don't redundantly compile throwaway patterns under threading, 2017-05-25) the grep code is intentionally not trying to micro-optimize allocations by e.g. sharing some PCREv2 structures globally, while making others thread-local. So let's remove this special case and make all of them thread-local again for simplicity. With this change we could move the pcre2_{malloc,free} functions around to live closer to their current use. I'm not doing that here to keep this change small, that cleanup will be done in a follow-up commit. See also the discussion in 94da9193a6 (grep: add support for PCRE v2, 2017-06-01) about thread safety, and Johannes's comments[1] to the effect that we should be doing what this patch is doing. 1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.1908052120302.46@tvgsbejvaqbjf.bet/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17grep/pcre2: use pcre2_maketables_free() functionÆvar Arnfjörð Bjarmason1-0/+3
Make use of the pcre2_maketables_free() function to free the memory allocated by pcre2_maketables(). At first sight it's strange that 10da030ab75 (grep: avoid leak of chartables in PCRE2, 2019-10-16) which added the free() call here doesn't make use of the pcre2_free() the author introduced in the preceding commit in 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16). The reason is that at the time the function didn't exist. It was first introduced in PCREv2 version 10.34, released on 2019-11-21. Let's make use of it behind a macro. I don't think this matters for anything to do with custom allocators, but it makes our use of PCREv2 more discoverable. At some distant point in the future we'll be able to drop the version guard, as nobody will be running a version older than 10.34. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17grep/pcre2: use compile-time PCREv2 version testÆvar Arnfjörð Bjarmason1-0/+3
Replace a use of pcre2_config(PCRE2_CONFIG_VERSION, ...) which I added in 95ca1f987ed (grep/pcre2: better support invalid UTF-8 haystacks, 2021-01-24) with the same test done at compile-time. It might be cuter to do this at runtime since we don't have to do the "major >= 11 || (major >= 10 && ...)" test. But in the next commit we'll add another version comparison that absolutely needs to be done at compile-time, so we're better of being consistent across the board. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10Merge branch 'ab/grep-pcre-invalid-utf8'Junio C Hamano1-0/+4
Update support for invalid UTF-8 in PCRE2. * ab/grep-pcre-invalid-utf8: grep/pcre2: better support invalid UTF-8 haystacks grep/pcre2 tests: don't rely on invalid UTF-8 data test
2021-02-10Merge branch 'ab/retire-pcre1'Junio C Hamano1-14/+0
The support for deprecated PCRE1 library has been dropped. * ab/retire-pcre1: Remove support for v1 of the PCRE library config.mak.uname: remove redundant NO_LIBPCRE1_JIT flag
2021-01-26grep/log: remove hidden --debug and --grep-debug optionsÆvar Arnfjörð Bjarmason1-1/+0
Remove the hidden "grep --debug" and "log --grep-debug" options added in 17bf35a3c7b (grep: teach --debug option to dump the parse tree, 2012-09-13). At the time these options seem to have been intended to go along with a documentation discussion and to help the author of relevant tests to perform ad-hoc debugging on them[1]. Reasons to want this gone: 1. They were never documented, and the only (rather trivial) use of them in our own codebase for testing is something I removed back in e01b4dab01e (grep: change non-ASCII -i test to stop using --debug, 2017-05-20). 2. Googling around doesn't show any in-the-wild uses I could dig up, and on the Git ML the only mentions after the original discussion seem to have been when they came up in unrelated diff contexts, or that test commit of mine. 3. An exception to that is c581e4a7499 (grep: under --debug, show whether PCRE JIT is enabled, 2019-08-18) where we added the ability to dump out when PCREv2 has the JIT in effect. The combination of that and my earlier b65abcafc7a (grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) means Git prints this out in its most common in-the-wild configuration: $ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 [all-match] (or pattern_body<body>foo (or pattern_body<body>bar pattern_body<body>baz ) ) $ git grep --debug \( -e foo --and -e bar \) --or -e baz pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 (or (and patternfoo patternbar ) patternbaz ) I.e. for each pattern we're considering for the and/or/--all-match etc. debugging we'll now diligently spew out another identical line saying whether the PCREv2 JIT is on or not. I think that nobody's complained about that rather glaringly obviously bad output says something about how much this is used, i.e. it's not. The need for this debugging aid for the composed grep/log patterns seems to have passed, and the desire to dump the JIT config seems to have been another one-off around the time we had JIT-related issues on the PCREv2 codepath. That the original author of this debugging facility seemingly hasn't noticed the bad output since then[2] is probably some indicator. 1. https://lore.kernel.org/git/cover.1347615361.git.git@drmicha.warpmail.net/ 2. https://lore.kernel.org/git/xmqqk1b8x0ac.fsf@gitster-ct.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-24grep/pcre2: better support invalid UTF-8 haystacksÆvar Arnfjörð Bjarmason1-0/+4
Improve the support for invalid UTF-8 haystacks given a non-ASCII needle when using the PCREv2 backend. This is a more complete fix for a bug I started to fix in 870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching, 2019-07-26), now that PCREv2 has the PCRE2_MATCH_INVALID_UTF mode we can make use of it. This fixes the sort of case described in 8a5999838e (grep: stess test PCRE v2 on invalid UTF-8 data, 2019-07-26), i.e.: - The subject string is non-ASCII (e.g. "ævar") - We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C" - We are using --ignore-case, or we're a non-fixed pattern If those conditions were satisfied and we matched found non-valid UTF-8 data PCREv2 might bark on it, in practice this only happened under the JIT backend (turned on by default on most platforms). Ultimately this fixes a "regression" in b65abcafc7 ("grep: use PCRE v2 for optimized fixed-string search", 2019-07-01), I'm putting that in scare-quotes because before then we wouldn't properly support these complex case-folding, locale etc. cases either, it just broke in different ways. There was a bug related to this the PCRE2_NO_START_OPTIMIZE flag fixed in PCREv2 10.36. It can be worked around by setting the PCRE2_NO_START_OPTIMIZE flag. Let's do that in those cases, and add tests for the bug. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23Remove support for v1 of the PCRE libraryÆvar Arnfjörð Bjarmason1-14/+0
Remove support for using version 1 of the PCRE library. Its use has been discouraged by upstream for a long time, and it's in a bugfix-only state. Anyone who was relying on v1 in particular got a nudge to move to v2 in e6c531b808 (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1, 2018-03-11), which was first released as part of v2.18.0. With this the LIBPCRE2 test prerequisites is redundant to PCRE. But I'm keeping it for self-documentation purposes, and to avoid conflict with other in-flight PCRE patches. I'm also not changing all of our own "pcre2" names to "pcre", i.e. the inverse of 6d4b5747f0 (grep: change internal *pcre* variable & function names to be *pcre1*, 2017-05-25). I don't see the point, and it makes the history/blame harder to read. Maybe if there's ever a PCRE v3... Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-21grep: use designated initializers for `grep_defaults`Martin Ågren1-1/+0
In 15fabd1bbd ("builtin/grep.c: make configuration callback more reusable", 2012-10-09), we learned to fill a `static struct grep_opt grep_defaults` which we can use as a blueprint for other such structs. At the time, we didn't consider designated initializers to be widely useable, but these days, we do. (See, e.g., cbc0f81d96 ("strbuf: use designated initializers in STRBUF_INIT", 2017-07-10).) Use designated initializers to let the compiler set up the struct and so that we don't need to remember to call `init_grep_defaults()`. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-21grep: don't set up a "default" repo for grepMartin Ågren1-1/+1
`init_grep_defaults()` fills a `static struct grep_opt grep_defaults`. This struct is then used by `grep_init()` as a blueprint for other such structs. Notably, `grep_init()` takes a `struct repo *` and assigns it into the target struct. As a result, it is unnecessary for us to take a `struct repo *` in `init_grep_defaults()` as well. We assign it into the default struct and never look at it again. And in light of how we return early if we have already set up the default struct, it's not just unnecessary, but is also a bit confusing: If we are called twice and with different repos, is it a bug or a feature that we ignore the second repo? Drop the repo parameter for `init_grep_defaults()`. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17grep: replace grep_read_mutex by internal obj read lockMatheus Tavares1-13/+0
git-grep uses 'grep_read_mutex' to protect its calls to object reading operations. But these have their own internal lock now, which ensures a better performance (allowing parallel access to more regions). So, let's remove the former and, instead, activate the latter with enable_obj_read_lock(). Sections that are currently protected by 'grep_read_mutex' but are not internally protected by the object reading lock should be surrounded by obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion with object reading operations, keeping the current behavior and avoiding race conditions. Namely, these places are: In grep.c: - fill_textconv() at fill_textconv_grep(). - userdiff_get_textconv() at grep_source_1(). In builtin/grep.c: - parse_object_or_die() and the submodule functions at grep_submodule(). - deref_tag() and gitmodules_config_oid() at grep_objects(). If these functions become thread-safe, in the future, we might remove the locking and probably get some speedup. Note that some of the submodule functions will already be thread-safe (or close to being thread-safe) with the internal object reading lock. However, as some of them will require additional modifications to be removed from the critical section, this will be done in its own patch. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23Merge branch 'cb/pcre2-chartables-leakfix'Junio C Hamano1-0/+2
Leakfix. * cb/pcre2-chartables-leakfix: grep: avoid leak of chartables in PCRE2 grep: make PCRE2 aware of custom allocator grep: make PCRE1 aware of custom allocator
2019-10-18grep: avoid leak of chartables in PCRE2Carlo Marcelo Arenas Belón1-0/+1
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) introduced a small memory leak visible with valgrind in t7813. Complete the creation of a PCRE2 specific variable that was missing from the original change and free the generated table just like it is done for PCRE1. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-18grep: make PCRE2 aware of custom allocatorCarlo Marcelo Arenas Belón1-0/+1
94da9193a6 (grep: add support for PCRE v2, 2017-06-01) didn't include a way to override the system allocator, and so it is incompatible with custom allocators (e.g. nedmalloc). This problem became obvious when we tried to plug a memory leak by `free()`ing a data structure allocated by PCRE2, triggering a segfault in Windows (where we use nedmalloc by default). PCRE2 requires the use of a general context to override the allocator and therefore, there is a lot more code needed than in PCRE1, including a couple of wrapper functions. Extend the grep API with a "destructor" that could be called to cleanup any objects that were created and used globally. Update `builtin/grep.c` to use that new API, but any other future users should make sure to have matching `grep_init()`/`grep_destroy()` calls if they are using the pattern matching functionality. Move some of the logic that was before done per thread (in the workers) into an earlier phase to avoid degrading performance, but as the use of PCRE2 with custom allocators is better understood it is expected more of its functions will be instructed to use the custom allocator as well as was done in the original code[1] this work was based on. [1] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/ Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-11Merge branch 'cb/pcre1-cleanup'Junio C Hamano1-11/+0
PCRE fixes. * cb/pcre1-cleanup: grep: refactor and simplify PCRE1 support grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1
2019-10-11Merge branch 'ab/pcre-jit-fixes'Junio C Hamano1-11/+2
A few simplification and bugfixes to PCRE interface. * ab/pcre-jit-fixes: grep: under --debug, show whether PCRE JIT is enabled grep: do not enter PCRE2_UTF mode on fixed matching grep: stess test PCRE v2 on invalid UTF-8 data grep: create a "is_fixed" member in "grep_pat" grep: consistently use "p->fixed" in compile_regexp() grep: stop using a custom JIT stack with PCRE v1 grep: stop "using" a custom JIT stack with PCRE v2 grep: remove overly paranoid BUG(...) code grep: use PCRE v2 for optimized fixed-string search grep: remove the kwset optimization grep: drop support for \0 in --fixed-strings <pattern> grep: make the behavior for NUL-byte in patterns sane grep tests: move binary pattern tests into their own file grep tests: move "grep binary" alongside the rest grep: inline the return value of a function call used only once t4210: skip more command-line encoding tests on MinGW grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>" log tests: test regex backends in "--encode=<enc>" tests
2019-09-09grep: skip UTF8 checks explicitlyCarlo Marcelo Arenas Belón1-0/+3
18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was released with git 2.10 added the PCRE_UTF8 flag to PCRE1 matching including a call to has_non_ascii() to try to avoid breakage if there was non-utf8 encoded content in the haystack. Usually PCRE is compiled with JIT support (even if is not the default), and therefore the codepath used includes calling pcre_jit_exec, which skips UTF-8 validation by design (which might result in crashes or hangs) but when JIT support wasn't compiled we use pcre_exec instead with the posibility that grep might be aborted if invalid UTF-8 is found in the haystack. PCRE1 provides a flag since Mar 5, 2007 that could be used to skip the checks explicitly so use that to make both codepaths equivalent (the flag is ignored by pcre1_jit_exec) this fix is only implemented for PCRE1 because PCRE2 is likely to have a better solution (without the risks) instead in the future Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-26grep: refactor and simplify PCRE1 supportCarlo Marcelo Arenas Belón1-9/+0
The code used both a macro and a variable to keep track if JIT support was desired and relied on the fact that a non JIT enabled library will ignore a request for JIT compilation (as defined by the second parameter of the call to pcre_study) Cleanup the multiple levels of macros used and call pcre_study with the right parameter after JIT support has been confirmed and unless it was requested to be disabled with NO_LIBPCRE1_JIT Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-26grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1Carlo Marcelo Arenas Belón1-3/+1
e87de7cab4 ("grep: un-break building with PCRE < 8.32", 2017-05-25) added a restriction for JIT support that is no longer needed after pcre_jit_exec() calls were removed. Reorganize the definitions in grep.h so that JIT support could be detected early and NO_LIBPCRE1_JIT could be used reliably to enforce JIT doesn't get used. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-26grep: create a "is_fixed" member in "grep_pat"Ævar Arnfjörð Bjarmason1-0/+1
This change paves the way for later using this value the regex compile functions themselves. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-26grep: stop using a custom JIT stack with PCRE v1Ævar Arnfjörð Bjarmason1-5/+0
Simplify the PCRE v1 code for the same reasons as for the PCRE v2 code in the last commit. Unlike with v2 we actually used the custom stack in v1, but let's use PCRE's built-in 32 KB one instead, since experience with v2 shows that's enough. Most distros are already using v2 as a default, and the underlying sljit code is the same. Unfortunately we can't just pass a NULL to pcre_jit_exec() as with pcre2_jit_match(). Unlike the v2 function it doesn't support that. Instead we need to use the fatter pcre_exec() if we'd like the same behavior. This will make things slightly slower than on the fast-path function, but it's OK since we care less about v1 performance these days since we have and recommend v2. Running a similar performance test as what I ran in fbaceaac47 ("grep: add support for the PCRE v1 JIT API", 2017-05-25) via: GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE1=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst' ./run HEAD~ HEAD p7820-grep-engines.sh Gives us this, just the /perl/ results: Test HEAD~ HEAD --------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.19(0.67+0.52) 0.19(0.65+0.52) +0.0% 7820.7: perl grep '^how to' 0.19(0.78+0.44) 0.19(0.72+0.49) +0.0% 7820.11: perl grep '[how] to' 0.39(2.13+0.43) 0.40(2.10+0.46) +2.6% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.44(2.55+0.37) 0.45(2.47+0.41) +2.3% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.23(1.06+0.42) 0.22(1.03+0.43) -4.3% It will also implicitly re-enable UTF-8 validation for PCRE v1. As noted in [1] we now have cases as a result where PCRE v1 is more eager to error out. Subsequent patches will fix that for v2, and I think it's fair to tell v1 users "just upgrade" and not worry about that edge case for v1. 1. https://public-inbox.org/git/CAPUEsphZJ_Uv9o1-yDpjNLA_q-f7gWXz9g1gCY2pYAYN8ri40g@mail.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-26grep: stop "using" a custom JIT stack with PCRE v2Ævar Arnfjörð Bjarmason1-4/+0
As reported in [1] the code I added in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) to use a custom JIT stack has never worked. It was incorrectly copy/pasted from code I added in fbaceaac47 ("grep: add support for the PCRE v1 JIT API", 2017-05-25), which did work. Thus our intention of starting with 1 byte of stack at a maximum of 1 MB didn't happen, we'd always use the 32 KB stack provided by PCRE v2's jit_machine_stack_exec()[2]. The reason I allocated a custom stack at all was this advice in pcrejit(3) (same in pcre2jit(3)): "By default, it uses 32KiB on the machine stack. However, some large or complicated patterns need more than this" Since we've haven't had any reports of users running into PCRE2_ERROR_JIT_STACKLIMIT in the wild I think we can safely assume that we can just use the library defaults instead and drop this code. This won't change with the wider use of PCRE v2 in ed0479ce3d ("Merge branch 'ab/no-kwset' into next", 2019-07-15), a fixed string search is not a "large or complicated pattern". For good measure I ran the performance test noted in 94da9193a6, although the command is simpler now due to my 0f50c8e32c ("Makefile: remove the NO_R_TO_GCC_LINKER flag", 2019-05-17): GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE2=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst' ./run HEAD~ HEAD p7820-grep-engines.sh Just the /perl/ results are: Test HEAD~ HEAD --------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.17(0.27+0.65) 0.17(0.24+0.68) +0.0% 7820.7: perl grep '^how to' 0.16(0.23+0.66) 0.16(0.23+0.67) +0.0% 7820.11: perl grep '[how] to' 0.18(0.35+0.62) 0.18(0.33+0.65) +0.0% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.17(0.45+0.54) 0.17(0.49+0.50) +0.0% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.16(0.33+0.58) 0.16(0.29+0.62) +0.0% So, as expected there's no change, and running with valgrind reveals that we have fewer allocations now. As noted in [3] there are known regexes that will fail with the lower stack limit, the way GNU grep fixed it is interesting, although I believe the implementation is overly verbose, they could make PCRE v2 handle that gradual re-allocation, that's what min/max memory is for. So we might end up bringing this back, I'm more inclined to just kick such cases upstairs to PCRE maintainers as a bug, perhaps they'll add some overall "just allocate more then" flag to make this easier. In any case there's no functional change here, we didn't have a custom stack, so let's apply this first, we can always revert it later. 1. https://public-inbox.org/git/20190721194052.15440-1-carenas@gmail.com/ 2. I didn't really intend to start with 1 byte, looking at the PCRE v2 code again what happened is that I cargo-culted some of PCRE v2's own test code which was meant to test re-allocations. It's more sane to start with say 32 KB with a max of 1 MB, as pcre2grep.c does. 3. https://public-inbox.org/git/CAPUEspjj+fG8QDmf=bZXktfpLgkgiu34HTjKLhm-cmEE04FE-A@mail.gmail.com/ Reported-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-01grep: remove the kwset optimizationÆvar Arnfjörð Bjarmason1-2/+0
A later change will replace this optimization with optimistic use of PCRE v2. I'm completely removing it as an intermediate step, as opposed to replacing it with PCRE v2, to demonstrate that no grep semantics depend on this (or any other) optimization for the fixed backend anymore. For now this is mostly (but not entirely) a performance regression, as shown by this hacky one-liner: for opt in '' ' -i' do GIT_PERF_7821_GREP_OPTS=$opt GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3 USE_LIBPCRE=YesPlease' ./run origin/master HEAD -- p7821-grep-engines-fixed.sh done && for opt in '' ' -i' do GIT_PERF_4221_LOG_OPTS=$opt GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3 USE_LIBPCRE=YesPlease' ./run origin/master HEAD -- p4221-log-grep-engines-fixed.sh done Which produces: plain grep: Test origin/master HEAD ------------------------------------------------------------------------- 7821.1: fixed grep int 0.55(1.60+0.63) 0.82(3.11+0.51) +49.1% 7821.2: basic grep int 0.62(1.68+0.49) 0.85(3.02+0.52) +37.1% 7821.3: extended grep int 0.61(1.63+0.53) 0.91(3.09+0.44) +49.2% 7821.4: perl grep int 0.55(1.60+0.57) 0.41(0.93+0.57) -25.5% 7821.6: fixed grep uncommon 0.20(0.50+0.44) 0.35(1.27+0.42) +75.0% 7821.7: basic grep uncommon 0.20(0.49+0.45) 0.35(1.29+0.41) +75.0% 7821.8: extended grep uncommon 0.20(0.45+0.48) 0.35(1.25+0.44) +75.0% 7821.9: perl grep uncommon 0.20(0.53+0.41) 0.16(0.24+0.49) -20.0% 7821.11: fixed grep æ 0.35(1.27+0.40) 0.25(0.82+0.39) -28.6% 7821.12: basic grep æ 0.35(1.28+0.38) 0.25(0.75+0.44) -28.6% 7821.13: extended grep æ 0.36(1.21+0.46) 0.25(0.86+0.35) -30.6% 7821.14: perl grep æ 0.35(1.33+0.34) 0.16(0.26+0.47) -54.3% grep with -i: Test origin/master HEAD ----------------------------------------------------------------------------- 7821.1: fixed grep -i int 0.61(1.84+0.64) 1.11(4.12+0.64) +82.0% 7821.2: basic grep -i int 0.72(1.86+0.57) 1.15(4.48+0.49) +59.7% 7821.3: extended grep -i int 0.94(1.83+0.60) 1.53(4.12+0.58) +62.8% 7821.4: perl grep -i int 0.66(1.82+0.59) 0.55(1.08+0.58) -16.7% 7821.6: fixed grep -i uncommon 0.21(0.51+0.44) 0.44(1.74+0.34) +109.5% 7821.7: basic grep -i uncommon 0.21(0.55+0.41) 0.44(1.72+0.40) +109.5% 7821.8: extended grep -i uncommon 0.21(0.57+0.39) 0.42(1.64+0.45) +100.0% 7821.9: perl grep -i uncommon 0.21(0.48+0.48) 0.17(0.30+0.45) -19.0% 7821.11: fixed grep -i æ 0.25(0.73+0.45) 0.25(0.75+0.45) +0.0% 7821.12: basic grep -i æ 0.25(0.71+0.49) 0.26(0.77+0.44) +4.0% 7821.13: extended grep -i æ 0.25(0.75+0.44) 0.25(0.74+0.46) +0.0% 7821.14: perl grep -i æ 0.17(0.26+0.48) 0.16(0.20+0.52) -5.9% plain log: Test origin/master HEAD --------------------------------------------------------------------------------- 4221.1: fixed log --grep='int' 7.31(7.06+0.21) 8.11(7.85+0.20) +10.9% 4221.2: basic log --grep='int' 7.30(6.94+0.27) 8.16(7.89+0.19) +11.8% 4221.3: extended log --grep='int' 7.34(7.05+0.21) 8.08(7.76+0.25) +10.1% 4221.4: perl log --grep='int' 7.27(6.94+0.24) 7.05(6.76+0.25) -3.0% 4221.6: fixed log --grep='uncommon' 6.97(6.62+0.32) 7.86(7.51+0.30) +12.8% 4221.7: basic log --grep='uncommon' 7.05(6.69+0.29) 7.89(7.60+0.28) +11.9% 4221.8: extended log --grep='uncommon' 6.89(6.56+0.32) 7.99(7.66+0.24) +16.0% 4221.9: perl log --grep='uncommon' 7.02(6.66+0.33) 6.97(6.54+0.36) -0.7% 4221.11: fixed log --grep='æ' 7.37(7.03+0.33) 7.67(7.30+0.31) +4.1% 4221.12: basic log --grep='æ' 7.41(7.00+0.31) 7.60(7.28+0.26) +2.6% 4221.13: extended log --grep='æ' 7.35(6.96+0.38) 7.73(7.31+0.34) +5.2% 4221.14: perl log --grep='æ' 7.43(7.10+0.32) 6.95(6.61+0.27) -6.5% log with -i: Test origin/master HEAD ------------------------------------------------------------------------------------ 4221.1: fixed log -i --grep='int' 7.40(7.05+0.23) 8.66(8.38+0.20) +17.0% 4221.2: basic log -i --grep='int' 7.39(7.09+0.23) 8.67(8.39+0.20) +17.3% 4221.3: extended log -i --grep='int' 7.29(6.99+0.26) 8.69(8.31+0.26) +19.2% 4221.4: perl log -i --grep='int' 7.42(7.16+0.21) 7.14(6.80+0.24) -3.8% 4221.6: fixed log -i --grep='uncommon' 6.94(6.58+0.35) 8.43(8.04+0.30) +21.5% 4221.7: basic log -i --grep='uncommon' 6.95(6.62+0.31) 8.34(7.93+0.32) +20.0% 4221.8: extended log -i --grep='uncommon' 7.06(6.75+0.25) 8.32(7.98+0.31) +17.8% 4221.9: perl log -i --grep='uncommon' 6.96(6.69+0.26) 7.04(6.64+0.32) +1.1% 4221.11: fixed log -i --grep='æ' 7.92(7.55+0.33) 7.86(7.44+0.34) -0.8% 4221.12: basic log -i --grep='æ' 7.88(7.49+0.32) 7.84(7.46+0.34) -0.5% 4221.13: extended log -i --grep='æ' 7.91(7.51+0.32) 7.87(7.48+0.32) -0.5% 4221.14: perl log -i --grep='æ' 7.01(6.59+0.35) 6.99(6.64+0.28) -0.3% Some of those, as noted in [1] are because PCRE is faster at finding fixed strings. This looks bad for some engines, but in the next change we'll optimistically use PCRE v2 for all of these, so it'll look better. 1. https://public-inbox.org/git/87v9x793qi.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>"Ævar Arnfjörð Bjarmason1-0/+1
Fix a bug introduced in 18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was missed due to a blindspot in our tests, as discussed in the previous commit. I then blindly copied the same bug in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) when adding the PCRE v2 code. We should not tell PCRE that we're processing UTF-8 just because we're dealing with non-ASCII. In the case of e.g. "log --encoding=<...>" under is_utf8_locale() the haystack might be in ISO-8859-1, and the needle might be in a non-UTF-8 encoding. Maybe we should be more strict here and die earlier? Should we also be converting the needle to the encoding in question, and failing if it's not a string that's valid in that encoding? Maybe. But for now matching this as non-UTF8 at least has some hope of producing sensible results, since we know that our default heuristic of assuming the text to be matched is in the user locale encoding isn't true when we've explicitly encoded it to be in a different encoding. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-05*.[ch]: remove extern from function declarations using spatchDenton Liu1-11/+11
There has been a push to remove extern from function declarations. Remove some instances of "extern" for function declarations which are caught by Coccinelle. Note that Coccinelle has some difficulty with processing functions with `__attribute__` or varargs so some `extern` declarations are left behind to be dealt with in a future patch. This was the Coccinelle patch used: @@ type T; identifier f; @@ - extern T f(...); and it was run with: $ git ls-files \*.{c,h} | grep -v ^compat/ | xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place Files under `compat/` are intentionally excluded as some are directly copied from external sources and we should avoid churning them as much as possible. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05grep: remove #ifdef NO_PTHREADSNguyễn Thái Ngọc Duy1-6/+0
This is a faithful conversion without attempting to improve anything. That comes later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21userdiff.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-1/+2
[jc: squashed in missing forward decl in userdiff.h found by Ramsay] Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21grep.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-2/+5
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02Merge branch 'tb/grep-only-matching'Junio C Hamano1-0/+1
"git grep" learned the "--only-matching" option. * tb/grep-only-matching: grep.c: teach 'git grep --only-matching' grep.c: extract show_line_header()
2018-07-18Merge branch 'tb/grep-column'Junio C Hamano1-0/+2
"git grep" learned the "--column" option that gives not just the line number but the column number of the hit. * tb/grep-column: contrib/git-jump/git-jump: jump to exact location grep.c: add configuration variables to show matched option builtin/grep.c: add '--column' option to 'git-grep(1)' grep.c: display column number of first match grep.[ch]: extend grep_opt to allow showing matched column grep.c: expose {,inverted} match column in match_line() Documentation/config.txt: camel-case lineNumber for consistency
2018-07-09grep.c: teach 'git grep --only-matching'Taylor Blau1-0/+1
Teach 'git grep --only-matching', a new option to only print the matching part(s) of a line. For instance, a line containing the following (taken from README.md:27): (`man gitcvs-migration` or `git help cvs-migration` if git is Is printed as follows: $ git grep --line-number --column --only-matching -e git -- \ README.md | grep ":27" README.md:27:7:git README.md:27:16:git README.md:27:38:git The patch works mostly as one would expect, with the exception of a few considerations that are worth mentioning here. Like GNU grep, this patch ignores --only-matching when --invert (-v) is given. There is a sensible answer here, but parity with the behavior of other tools is preferred. Because a line might contain more than one match, there are special considerations pertaining to when to print line headers, newlines, and how to increment the match column offset. The line header and newlines are handled as a special case within the main loop to avoid polluting the surrounding code with conditionals that have large blocks. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-22grep.[ch]: extend grep_opt to allow showing matched columnTaylor Blau1-0/+2
To support showing the matched column when calling 'git-grep(1)', teach 'grep_opt' the normal set of options to configure the default behavior and colorization of this feature. Now that we have opt->columnnum, use it to disable short-circuiting over ORs and ANDs so that col and icol are always filled with the earliest matches on each line. In addition, don't return the first match from match_line(), for the same reason. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-29grep: keep all colors in an arrayNguyễn Thái Ngọc Duy1-8/+13
This is more inline with how we handle color slots in other code. It also allows us to get the list of configurable color slots later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-15Merge branch 'ab/pcre-v2'Junio C Hamano1-2/+3
Building with NO_LIBPCRE1_JIT did not disable it, which has been fixed. * ab/pcre-v2: grep: fix NO_LIBPCRE1_JIT to fully disable JIT
2017-11-13grep: fix NO_LIBPCRE1_JIT to fully disable JITCharles Bailey1-2/+3
If you have a pcre1 library which is compiled with JIT enabled then PCRE_STUDY_JIT_COMPILE will be defined whether or not the NO_LIBPCRE1_JIT configuration is set. This means that we enable JIT functionality when calling pcre_study even if NO_LIBPCRE1_JIT has been explicitly set and we just use plain pcre_exec later. Fix this by using own macro (GIT_PCRE_STUDY_JIT_COMPILE) which we set to PCRE_STUDY_JIT_COMPILE only if NO_LIBPCRE1_JIT is not set and define to 0 otherwise, as before. Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-02grep: recurse in-process using 'struct repository'Brandon Williams1-1/+0
Convert grep to use 'struct repository' which enables recursing into submodules to be handled in-process. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30grep: remove regflags from the public grep_opt APIÆvar Arnfjörð Bjarmason1-1/+0
Refactor calls to the grep machinery to always pass opt.ignore_case & opt.extended_regexp_option instead of setting the equivalent regflags bits. The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log: make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was really just plastering over the code smell which this change fixes. The reason for adding the extensive commentary here is that I discovered some subtle complexity in implementing this that really should be called out explicitly to future readers. Before this change we'd rely on the difference between `extended_regexp_option` and `regflags` to serve as a membrane between our preliminary parsing of grep.extendedRegexp and grep.patternType, and what we decided to do internally. Now that those two are the same thing, it's necessary to unset `extended_regexp_option` just before we commit in cases where both of those config variables are set. See 84befcd0a4 ("grep: add a grep.patternType configuration setting", 2012-08-03) for the code and documentation related to that. The explanation of why the if/else branches in grep_commit_pattern_type() are ordered the way they are exists in that commit message, but I think it's worth calling this subtlety out explicitly with a comment for future readers. Even though grep_commit_pattern_type() is the only caller of grep_set_pattern_type_option() it's simpler to reset the extended_regexp_option flag in the latter, since 2/3 branches in the former would otherwise need to reset it, this way we can do it in one place. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-19Merge branch 'bw/object-id'Junio C Hamano1-1/+1
Conversion from uchar[20] to struct object_id continues. * bw/object-id: (33 commits) diff: rename diff_fill_sha1_info to diff_fill_oid_info diffcore-rename: use is_empty_blob_oid tree-diff: convert path_appendnew to object_id tree-diff: convert diff_tree_paths to struct object_id tree-diff: convert try_to_follow_renames to struct object_id builtin/diff-tree: cleanup references to sha1 diff-tree: convert diff_tree_sha1 to struct object_id notes-merge: convert write_note_to_worktree to struct object_id notes-merge: convert verify_notes_filepair to struct object_id notes-merge: convert find_notes_merge_pair_ps to struct object_id notes-merge: convert merge_from_diffs to struct object_id notes-merge: convert notes_merge* to struct object_id tree-diff: convert diff_root_tree_sha1 to struct object_id combine-diff: convert find_paths_* to struct object_id combine-diff: convert diff_tree_combined to struct object_id diff: convert diff_flush_patch_id to struct object_id patch-ids: convert to struct object_id diff: finish conversion for prepare_temp_file to struct object_id diff: convert reuse_worktree_file to struct object_id diff: convert fill_filespec to struct object_id ...
2017-06-02grep: convert to struct object_idBrandon Williams1-1/+1
Convert the remaining parts of grep to use struct object_id. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02grep: add support for PCRE v2Ævar Arnfjörð Bjarmason1-0/+17
Add support for v2 of the PCRE API. This is a new major version of PCRE that came out in early 2015[1]. The regular expression syntax is the same, but while the API is similar, pretty much every function is either renamed or takes different arguments. Thus using it via entirely new functions makes sense, as opposed to trying to e.g. have one compile_pcre_pattern() that would call either PCRE v1 or v2 functions. Git can now be compiled with either USE_LIBPCRE1=YesPlease or USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a synonym for the former. Providing both is a compile-time error. With earlier patches to enable JIT for PCRE v1 the performance of the release versions of both libraries is almost exactly the same, with PCRE v2 being around 1% slower. However after I reported this to the pcre-dev mailing list[2] I got a lot of help with the API use from Zoltán Herczeg, he subsequently optimized some of the JIT functionality in v2 of the library. Running the p7820-grep-engines.sh performance test against the latest Subversion trunk of both, with both them and git compiled as -O3, and the test run against linux.git, gives the following results. Just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~5 HEAD~ HEAD p7820-grep-engines.sh [...] Test HEAD~5 HEAD~ HEAD ----------------------------------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.31(1.10+0.48) 0.21(0.35+0.56) -32.3% 0.21(0.34+0.55) -32.3% 7820.7: perl grep '^how to' 0.56(2.70+0.40) 0.24(0.64+0.52) -57.1% 0.20(0.28+0.60) -64.3% 7820.11: perl grep '[how] to' 0.56(2.66+0.38) 0.29(0.95+0.45) -48.2% 0.23(0.45+0.54) -58.9% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.02(5.77+0.42) 0.31(1.02+0.54) -69.6% 0.23(0.50+0.54) -77.5% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.38(1.57+0.42) 0.27(0.85+0.46) -28.9% 0.21(0.33+0.57) -44.7% See commit ("perf: add a comparison test of grep regex engines", 2017-04-19) for details on the machine the above test run was executed on. Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine mentioning p7820-grep-engines.sh for more details on the test setup. For ease of readability, a different run just of HEAD~ (PCRE v1 with JIT v.s. PCRE v2), again with just the /perl/ tests shown: [...] Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.21(0.42+0.52) 0.21(0.31+0.58) +0.0% 7820.7: perl grep '^how to' 0.25(0.65+0.50) 0.20(0.31+0.57) -20.0% 7820.11: perl grep '[how] to' 0.30(0.90+0.50) 0.23(0.46+0.53) -23.3% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.30(1.19+0.38) 0.23(0.51+0.51) -23.3% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.27(0.84+0.48) 0.21(0.34+0.57) -22.2% I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead, when it does it's around 20% faster. A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3) the compiled pattern can be shared between threads, but not some of the JIT context, however the grep threading support does all pattern & JIT compilation in separate threads, so this code doesn't need to concern itself with thread safety. See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the initial addition of PCRE v1. This change follows some of the same patterns it did (and which were discussed on list at the time), e.g. mocking up types with typedef instead of ifdef-ing them out when USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the program, but makes the code look nicer. 1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html 2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02grep: un-break building with PCRE >= 8.32 without --enable-jitÆvar Arnfjörð Bjarmason1-0/+2
Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions later than 8.31 compiled without --enable-jit. As explained in that change and a later compatibility change in this series ("grep: un-break building with PCRE < 8.32", 2017-05-10) the pcre_jit_exec() function is a faster path to execute the JIT. Unfortunately there's no compatibility stub for that function compiled into the library if pcre_config(PCRE_CONFIG_JIT, &ret) would return 0, and no macro that can be used to check for it, so the only portable option to support builds without --enable-jit is via a new NO_LIBPCRE1_JIT=UnfortunatelyYes Makefile option[1]. Another option would be to make the JIT opt-in via USE_LIBPCRE1_JIT=YesPlease, after all it's not a default option of PCRE v1. I think it makes more sense to make it opt-out since even though it's not a default option, most packagers of PCRE seem to turn it on by default, with the notable exception of the MinGW package. Make the MinGW platform work by default by changing the build defaults to turn on NO_LIBPCRE1_JIT=UnfortunatelyYes. It is the only platform that turns on USE_LIBPCRE=YesPlease by default, see commit df5218b4c3 ("config.mak.uname: support MSys2", 2016-01-13) for that change. 1. "How do I support pcre1 JIT on all versions?" (https://lists.exim.org/lurker/thread/20170601.103148.10253788.en.html) 2. https://github.com/Alexpux/MINGW-packages/blob/master/mingw-w64-pcre/PKGBUILD (referenced from "Re: PCRE v2 compile error, was Re: What's cooking in git.git (May 2017, #01; Mon, 1)"; <alpine.DEB.2.20.1705021756530.3480@virtualbox>) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: un-break building with PCRE < 8.20Ævar Arnfjörð Bjarmason1-0/+3
Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions earlier than 8.20. The 8.20 release was the first release to have JIT & pcre_jit_stack in the headers, so a mock type needs to be provided for it on those releases. Now git should compile with all PCRE versions that it supported before my JIT change. I've tested it as far back as version 7.5 released on 2008-01-10, once I got down to version 7.0 it wouldn't build anymore with GCC 7.1.1, and I couldn't be bothered to anything older than 7.5 as I'm confident that if the build breaks on those older versions it's not because of my JIT change. See the "un-break" change in this series ("grep: un-break building with PCRE < 8.32", 2017-05-10) for why this isn't squashed into the main PCRE JIT commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: un-break building with PCRE < 8.32Ævar Arnfjörð Bjarmason1-0/+5
Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions earlier than 8.32. The JIT support was added in version 8.20 released on 2011-10-21, but it wasn't until 8.32 released on 2012-11-30 that the fast code path to use the JIT via pcre_jit_exec() was added[1] (see also [2]). This means that versions 8.20 through 8.31 could still use the JIT, but supporting it on those versions would add to the already verbose macro soup around JIT support it, and I don't expect that the use-case of compiling a brand new git against a 5 year old PCRE is particularly common, and if someone does that they can just get the existing pre-JIT slow codepath. So just take the easy way out and disable the JIT on any version older than 8.32. The reason this change isn't part of the initial change PCRE JIT support is to have a cleaner history showing which parts of the implementation are only used for ancient PCRE versions. This also makes it easier to revert this change if we ever decide to stop supporting those old versions. 1. http://www.pcre.org/original/changelog.txt ("28. Introducing a native interface for JIT. Through this interface, the compiled[...]") 2. https://bugs.exim.org/show_bug.cgi?id=2121 Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: add support for the PCRE v1 JIT APIÆvar Arnfjörð Bjarmason1-0/+6
Change the grep PCRE v1 code to use JIT when available. When PCRE support was initially added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into 8.20 released on 2011-10-21. Enabling JIT support usually improves performance by more than 40%. The pattern compilation times are relatively slower, but those relative numbers are tiny, and are easily made back in all but the most trivial cases of grep. Detailed benchmarks & overview of compilation times is at: http://sljit.sourceforge.net/pcre.html With this change the difference in a t/perf/p7820-grep-engines.sh run is, with just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh Test HEAD~ HEAD --------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.35(1.11+0.43) 0.23(0.42+0.46) -34.3% 7820.7: perl grep '^how to' 0.64(2.71+0.36) 0.27(0.66+0.44) -57.8% 7820.11: perl grep '[how] to' 0.63(2.51+0.42) 0.33(0.98+0.39) -47.6% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.17(5.61+0.35) 0.34(1.08+0.46) -70.9% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.43(1.52+0.44) 0.30(0.88+0.42) -30.2% The conditional support for JIT is implemented as suggested in the pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's not present. The implementation is relatively verbose because even if PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine if the JIT is available, and if so the faster pcre_jit_exec() function should be called instead of pcre_exec(), and a different (but not complimentary!) function needs to be called to free pcre1_extra_info. There's no graceful fallback if pcre_jit_stack_alloc() fails under PCRE_CONFIG_JIT, instead the program will simply abort. I don't think this is worth handling gracefully, it'll only fail in cases where malloc() doesn't work, in which case we're screwed anyway. That there's no assignment of `p->pcre1_jit_on = 0` when PCRE_CONFIG_JIT isn't defined isn't a bug. The create_grep_pat() function allocates the grep_pat allocates it with calloc(), so it's guaranteed to be 0 when PCRE_CONFIG_JIT isn't defined. I you're bisecting and find this change, check that your PCRE isn't older than 8.32. This change intentionally broke really old versions of PCRE, but that's fixed in follow-up commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: change internal *pcre* variable & function names to be *pcre1*Ævar Arnfjörð Bjarmason1-4/+4
Change the internal PCRE variable & function names to have a "1" suffix. This is for preparation for libpcre2 support, where having non-versioned names would be confusing. An earlier change in this series ("grep: change the internal PCRE macro names to be PCRE1", 2017-04-07) elaborates on the motivations behind this change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: change the internal PCRE macro names to be PCRE1Ævar Arnfjörð Bjarmason1-1/+1
Change the internal USE_LIBPCRE define, & build options flag to use a naming convention ending in PCRE1, without changing the long-standing USE_LIBPCRE Makefile flag which enables this code. This is for preparation for libpcre2 support where having things like USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely need to for backwards compatibility with old Makefile arguments would be confusing. In some ways it would be better to change everything that now uses USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their build scripts. However I'd like to leave the door open to making USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease, i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it the default. This code and the USE_LIBPCRE Makefile argument was added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was no indication that the PCRE project would release an entirely new & incompatible API around 3 years later. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-22grep: add submodules as a grep source typeBrandon Williams1-0/+1
Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new type in the various switch statements in grep.c. When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the identifier can either be NULL (to indicate that the working tree will be used) or a SHA1 (the REV of the submodule to be grep'd). If the identifier is a SHA1 then we want to fall through to the `GREP_SOURCE_SHA1` case to handle the copying of the SHA1. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-04Merge branch 'jc/grep-commandline-vs-configuration'Junio C Hamano1-1/+0
"git -c grep.patternType=extended log --basic-regexp" misbehaved because the internal API to access the grep machinery was not designed well. * jc/grep-commandline-vs-configuration: grep: further simplify setting the pattern type
2016-07-25grep: further simplify setting the pattern typeJunio C Hamano1-1/+0
When c5c31d33 (grep: move pattern-type bits support to top-level grep.[ch], 2012-10-03) introduced grep_commit_pattern_type() helper function, the intention was to allow the users of grep API to having to fiddle only with .pattern_type_option (which can be set to "fixed", "basic", "extended", and "pcre"), and then immediately before compiling the pattern strings for use, call grep_commit_pattern_type() to have it prepare various bits in the grep_opt structure (like .fixed, .regflags, etc.). However, grep_set_pattern_type_option() helper function the grep API internally uses were left as an external function by mistake. This function shouldn't have been made callable by the users of the API. Later when the grep API was used in revision traversal machinery, the caller then mistakenly started calling the function around 34a4ae55 (log --grep: use the same helper to set -E/-F options as "git grep", 2012-10-03), instead of setting the .pattern_type_option field and letting the grep_commit_pattern_type() to take care of the details. This caused an unnecessary bug that made a configured grep.patternType take precedence over the command line options (e.g. --basic-regexp, --fixed-strings) in "git log" family of commands. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01grep/pcre: prepare locale-dependent tables for icase matchingNguyễn Thái Ngọc Duy1-0/+1
The default tables are usually built with C locale and only suitable for LANG=C or similar. This should make case insensitive search work correctly for all single-byte charsets. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-28grep: add color.grep.matchcontext and color.grep.matchselectedRené Scharfe1-1/+2
The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-10grep: allow to use textconv filtersJeff King1-0/+1
Recently and not so recently, we made sure that log/grep type operations use textconv filters when a userfacing diff would do the same: ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28) b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28) 0508fe5 (combine-diff: respect textconv attributes, 2011-05-23) "git grep" currently does not use textconv filters at all, that is neither for displaying the match and context nor for the actual grepping, even when requested by --textconv. Introduce an option "--textconv" which makes git grep use any configured textconv filters for grepping and output purposes. It is off by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-25fix clang -Wtautological-compare with unsigned enumAntoine Pelisse1-1/+2
Create a GREP_HEADER_FIELD_MIN so we can check that the field value is sane and silence the clang warning. Clang warning happens because the enum is unsigned (this is implementation-defined, and there is no negative fields) and the check is then tautological. Signed-off-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: John Keeping <john@keeping.me.uk> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-29Merge branch 'nd/grep-true-path'Jeff King1-1/+3
"git grep -e pattern <tree>" asked the attribute system to read "<tree>:.gitattributes" file in the working tree, which was nonsense. * nd/grep-true-path: grep: stop looking at random places for .gitattributes
2012-10-12grep: stop looking at random places for .gitattributesNguyễn Thái Ngọc Duy1-1/+3
grep searches for .gitattributes using "name" field in struct grep_source but that field is not real on-disk path name. For example, "grep pattern rev" fills the field with "rev:path", and Git looks for .gitattributes in the (non-existent but exploitable) path "rev:path" instead of "path". This patch passes real paths down to grep_source_load_driver() when: - grep on work tree - grep on the index - grep a commit (or a tag if it points to a commit) so that these cases look up .gitattributes at proper paths. .gitattributes lookup is disabled in all other cases. Initial-work-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09grep: move pattern-type bits support to top-level grep.[ch]Junio C Hamano1-0/+2
Switching between -E/-G/-P/-F correctly needs a lot more than just flipping opt->regflags bit these days, and we have a nice helper function buried in builtin/grep.c for the sole use of "git grep". Extract it so that "log --grep" family can also use it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09grep: move the configuration parsing logic to grep.[ch]Junio C Hamano1-0/+4
The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29log --grep-reflog: reject the option without -gJunio C Hamano1-0/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29revision: add --grep-reflog to filter commits by reflog messagesNguyễn Thái Ngọc Duy1-0/+1
Similar to --author/--committer which filters commits by author and committer header fields. --grep-reflog adds a fake "reflog" header to commit and a grep filter to search on that line. All rules to --author/--committer apply except no timestamp stripping. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29grep: prepare for new header field filterNguyễn Thái Ngọc Duy1-2/+4
grep supports only author and committer headers, which have the same special treatment that later headers may or may not have. Check for field type and only strip_timestamp() when the field is either author or committer. GREP_HEADER_FIELD_MAX is put in the grep_header_field enum to be calculated automatically, correctly, as long as it's at the end of the enum. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-18Merge branch 'jc/maint-log-grep-all-match'Junio C Hamano1-2/+2
Fix a long-standing bug in "git log --grep" when multiple "--grep" are used together with "--all-match" and "--author" or "--committer". * jc/maint-log-grep-all-match: t7810-grep: test --all-match with multiple --grep and --author options t7810-grep: test interaction of multiple --grep and --author options t7810-grep: test multiple --author with --all-match t7810-grep: test multiple --grep with and without --all-match t7810-grep: bring log --grep tests in common form grep.c: mark private file-scope symbols as static log: document use of multiple commit limiting options log --grep/--author: honor --all-match honored for multiple --grep patterns grep: show --debug output only once grep: teach --debug option to dump the parse tree
2012-09-15grep.c: mark private file-scope symbols as staticJunio C Hamano1-2/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-14grep: teach --debug option to dump the parse treeJunio C Hamano1-0/+1
Our "grep" allows complex boolean expressions to be formed to match each individual line with operators like --and, '(', ')' and --not. Introduce the "--debug" option to show the parse tree to help people who want to debug and enhance it. Also "log" learns "--grep-debug" option to do the same. The command line parser to the log family is a lot more limited than the general "git grep" parser, but it has special handling for header matching (e.g. "--author"), and a parse tree is valuable when working on it. Note that "--all-match" is *not* any individual node in the parse tree. It is an instruction to the evaluator to check all the nodes in the top-level backbone have matched and reject a document as non-matching otherwise. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-03grep: add a grep.patternType configuration settingJ Smith1-0/+10
The grep.extendedRegexp configuration setting enables the -E flag on grep by default but there are no equivalents for the -G, -F and -P flags. Rather than adding an additional setting for grep.fooRegexp for current and future pattern matching options, add a grep.patternType setting that can accept appropriate values for modifying the default grep pattern matching behavior. The current values are "basic", "extended", "fixed", "perl" and "default" for setting -G, -E, -F, -P and the default behavior respectively. When grep.patternType is set to a value other than "default", the grep.extendedRegexp setting is ignored. The value of "default" restores the current default behavior, including the grep.extendedRegexp behavior. Signed-off-by: J Smith <dark.panda@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-01Merge branch 'rs/maint-grep-F' into maintJunio C Hamano1-1/+1
"git grep -e '$pattern'", unlike the case where the patterns are read from a file, did not treat individual lines in the given pattern argument as separate regular expressions as it should. By René Scharfe * rs/maint-grep-F: grep: stop leaking line strings with -f grep: support newline separated pattern list grep: factor out do_append_grep_pat() grep: factor out create_grep_pat()
2012-05-25Merge branch 'rs/maint-grep-F'Junio C Hamano1-1/+1
"git grep -e '$pattern'", unlike the case where the patterns are read from a file, did not treat individual lines in the given pattern argument as separate regular expressions as it should.
2012-05-20grep: support newline separated pattern listRené Scharfe1-1/+1
Currently, patterns that contain newline characters don't match anything when given to git grep. Regular grep(1) interprets patterns as lists of newline separated search strings instead. Implement this functionality by creating and inserting extra grep_pat structures for patterns consisting of multiple lines when appending to the pattern lists. For simplicity, all pattern strings are duplicated. The original pattern is truncated in place to make it contain only the first line. Requested-by: Torne (Richard Coles) <torne@google.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: respect diff attributes for binary-nessJeff King1-0/+1
There is currently no way for users to tell git-grep that a particular path is or is not a binary file; instead, grep always relies on its auto-detection (or the user specifying "-a" to treat all binary-looking files like text). This patch teaches git-grep to use the same attribute lookup that is used by git-diff. We could add a new "grep" flag, but that is unnecessarily complex and unlikely to be useful. Despite the name, the "-diff" attribute (or "diff=foo" and the associated diff.foo.binary config option) are really about describing the contents of the path. It's simply historical that diff was the only thing that cared about these attributes in the past. And if this simple approach turns out to be insufficient, we still have a backwards-compatible path forward: we can add a separate "grep" attribute, and fall back to respecting "diff" if it is unset. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: cache userdiff_driver in grep_sourceJeff King1-0/+4
Right now, grep only uses the userdiff_driver for one thing: looking up funcname patterns for "-p" and "-W". As new uses for userdiff drivers are added to the grep code, we want to minimize attribute lookups, which can be expensive. It might seem at first that this would also optimize multiple lookups when the funcname pattern for a file is needed multiple times. However, the compiled funcname pattern is already cached in struct grep_opt's "priv" member, so multiple lookups are already suppressed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: drop grep_buffer's "name" parameterJeff King1-1/+1
Before the grep_source interface existed, grep_buffer was used by two types of callers: 1. Ones which pulled a file into a buffer, and then wanted to supply the file's name for the output (i.e., git grep). 2. Ones which really just wanted to grep a buffer (i.e., git log --grep). Callers in set (1) should now be using grep_source. Callers in set (2) always pass NULL for the "name" parameter of grep_buffer. We can therefore get rid of this now-useless parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: refactor the concept of "grep source" into an objectJeff King1-0/+22
The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: move sha1-reading mutex into low-level codeJeff King1-0/+17
The multi-threaded git-grep code needs to serialize access to the thread-unsafe read_sha1_file call. It does this with a mutex that is local to builtin/grep.c. Let's instead push this down into grep.c, where it can be used by both builtin/grep.c and grep.c. This will let us safely teach the low-level grep.c code tricks that involve reading from the object db. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: make locking flag globalJeff King1-1/+1
The low-level grep code traditionally didn't care about threading, as it doesn't do any threading itself and didn't call out to other non-thread-safe code. That changed with 0579f91 (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), which pushed the lookup of funcname attributes (which is not thread-safe) into the low-level grep code. As a result, the low-level code learned about a new global "grep_attr_mutex" to serialize access to the attribute code. A multi-threaded caller (e.g., builtin/grep.c) is expected to initialize the mutex and set "use_threads" in the grep_opt structure. The low-level code only uses the lock if use_threads is set. However, putting the use_threads flag into the grep_opt struct is not the most logical place. Whether threading is in use is not something that matters for each call to grep_buffer, but is instead global to the whole program (i.e., if any thread is doing multi-threaded grep, every other thread, even if it thinks it is doing its own single-threaded grep, would need to use the locking). In practice, this distinction isn't a problem for us, because the only user of multi-threaded grep is "git-grep", which does nothing except call grep. This patch turns the opt->use_threads flag into a global flag. More important than the nit-picking semantic argument above is that this means that the locking functions don't need to actually have access to a grep_opt to know whether to lock. Which in turn can make adding new locks simpler, as we don't need to pass around a grep_opt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-16grep: enable threading with -p and -W using lazy attribute lookupThomas Rast1-0/+10
Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-20Use kwset in grepFredrik Kuivinen1-0/+2
Benchmarks for the hot cache case: before: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 3,478,085 cache-misses # 2.322 M/sec ( +- 2.690% ) 11,356,177 cache-references # 7.582 M/sec ( +- 2.598% ) 3,872,184 branch-misses # 0.363 % ( +- 0.258% ) 1,067,367,848 branches # 712.673 M/sec ( +- 2.622% ) 3,828,370,782 instructions # 0.947 IPC ( +- 0.033% ) 4,043,832,831 cycles # 2700.037 M/sec ( +- 0.167% ) 8,518 page-faults # 0.006 M/sec ( +- 3.648% ) 847 CPU-migrations # 0.001 M/sec ( +- 3.262% ) 6,546 context-switches # 0.004 M/sec ( +- 2.292% ) 1497.695495 task-clock-msecs # 3.303 CPUs ( +- 2.550% ) 0.453394396 seconds time elapsed ( +- 0.912% ) after: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 2,989,918 cache-misses # 3.166 M/sec ( +- 5.013% ) 10,986,041 cache-references # 11.633 M/sec ( +- 4.899% ) (scaled from 95.06%) 3,511,993 branch-misses # 1.422 % ( +- 0.785% ) 246,893,561 branches # 261.433 M/sec ( +- 3.967% ) 1,392,727,757 instructions # 0.564 IPC ( +- 0.040% ) 2,468,142,397 cycles # 2613.494 M/sec ( +- 0.110% ) 7,747 page-faults # 0.008 M/sec ( +- 3.995% ) 897 CPU-migrations # 0.001 M/sec ( +- 2.383% ) 6,535 context-switches # 0.007 M/sec ( +- 1.993% ) 944.384228 task-clock-msecs # 3.177 CPUs ( +- 0.268% ) 0.297257643 seconds time elapsed ( +- 0.450% ) So we gain about 35% by using the kwset code. As a side effect of using kwset two grep tests are fixed by this patch. The first is fixed because kwset can deal with case-insensitive search containing NULs, something strcasestr cannot do. The second one is fixed because we consider patterns containing NULs as fixed strings (regcomp cannot accept patterns with NULs). Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-01grep: add option to show whole function as contextRené Scharfe1-0/+1
Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05grep: add --headingRené Scharfe1-0/+1
With --heading, the filename is printed once before matches from that file instead of at the start of each line, giving more screen space to the actual search results. This option is taken from ack (http://betterthangrep.com/). And now git grep can dress up like it: $ git config alias.ack "grep --break --heading --line-number" $ git ack -e --heading Documentation/git-grep.txt 154:--heading:: t/t7810-grep.sh 785:test_expect_success 'grep --heading' ' 786: git grep --heading -e char -e lo_w hello.c hello_world >actual && 808: git grep --break --heading -n --color \ Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05grep: add --breakRené Scharfe1-0/+1
With --break, an empty line is printed between matches from different files, increasing readability. This option is taken from ack (http://betterthangrep.com/). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09git-grep: Learn PCREMichał Kiedrowicz1-0/+9
This patch teaches git-grep the --perl-regexp/-P options (naming borrowed from GNU grep) in order to allow specifying PCRE regexes on the command line. PCRE has a number of features which make them more handy to use than POSIX regexes, like consistent escaping rules, extended character classes, ungreedy matching etc. git isn't build with PCRE support automatically. USE_LIBPCRE environment variable must be enabled (like `make USE_LIBPCRE=YesPlease`). Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-13log --author: take union of multiple "author" requestsJunio C Hamano1-0/+2
In the olden days, log --author=me --committer=him --grep=this --grep=that used to be turned into: (OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (PATTERN this) (PATTERN that)) showing my patches that do not have any "this" nor "that", which was totally useless. 80235ba ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) improved it greatly to turn the same into: (ALL-MATCH (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that))) That is, "show only patches by me and committed by him, that have either this or that", which is a lot more natural thing to ask. We however need to be a bit more clever when the user asks more than one "author" (or "committer"); because a commit has only one author (and one committer), they ought to be interpreted as asking for union to be useful. The current implementation simply added another author/committer pattern at the same top-level for ALL-MATCH to insist on matching all, finding nothing. Turn log --author=me --author=her \ --committer=him --committer=you \ --grep=this --grep=that into (ALL-MATCH (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her)) (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you)) (OR (PATTERN this) (PATTERN that))) instead. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-06-21Merge branch 'gv/portable'Junio C Hamano1-4/+4
* gv/portable: test-lib: use DIFF definition from GIT-BUILD-OPTIONS build: propagate $DIFF to scripts Makefile: Tru64 portability fix Makefile: HP-UX 10.20 portability fixes Makefile: HPUX11 portability fixes Makefile: SunOS 5.6 portability fix inline declaration does not work on AIX Allow disabling "inline" Some platforms lack socklen_t type Make NO_{INET_NTOP,INET_PTON} configured independently Makefile: some platforms do not have hstrerror anywhere git-compat-util.h: some platforms with mmap() lack MAP_FAILED definition test_cmp: do not use "diff -u" on platforms that lack one fixup: do not unconditionally disable "diff -u" tests: use "test_cmp", not "diff", when verifying the result Do not use "diff" found on PATH while building and installing enums: omit trailing comma for portability Makefile: -lpthread may still be necessary when libc has only pthread stubs Rewrite dynamic structure initializations to runtime assignment Makefile: pass CPPFLAGS through to fllow customization Conflicts: Makefile wt-status.h
2010-05-31enums: omit trailing comma for portabilityGary V. Vaughan1-4/+4
Without this patch at least IBM VisualAge C 5.0 (I have 5.0.2) on AIX 5.1 fails to compile git. enum style is inconsistent already, with some enums declared on one line, some over 3 lines with the enum values all on the middle line, sometimes with 1 enum value per line... and independently of that the trailing comma is sometimes present and other times absent, often mixing with/without trailing comma styles in a single file, and sometimes in consecutive enum declarations. Clearly, omitting the comma is the more portable style, and this patch changes all enum declarations to use the portable omitted dangling comma style consistently. Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24grep: support NUL chars in search strings for -FRené Scharfe1-0/+2
Search patterns in a file specified with -f can contain NUL characters. The current code ignores all characters on a line after a NUL. Pass the actual length of the line all the way from the pattern file to fixmatch() and use it for case-sensitive fixed string matching. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-20Merge branch 'ml/color-grep'Junio C Hamano1-0/+6
* ml/color-grep: grep: Colorize selected, context, and function lines grep: Colorize filename, line number, and separator Add GIT_COLOR_BOLD_* and GIT_COLOR_BG_*
2010-03-08grep: Colorize selected, context, and function linesMark Lodato1-0/+3
Colorize non-matching text of selected lines, context lines, and function name lines. The default for all three is no color, but they can be configured using color.grep.<slot>. The first two are similar to the corresponding options in GNU grep, except that GNU grep applies the color to the entire line, not just non-matching text. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-08grep: Colorize filename, line number, and separatorMark Lodato1-0/+3
Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-02Merge branch 'jc/grep-author-all-match-implicit'Junio C Hamano1-0/+2
* jc/grep-author-all-match-implicit: "log --author=me --grep=it" should find intersection, not union
2010-01-26Threaded grepFredrik Kuivinen1-0/+6
Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-25"log --author=me --grep=it" should find intersection, not unionJunio C Hamano1-0/+2
Historically, any grep filter in "git log" family of commands were taken as restricting to commits with any of the words in the commit log message. However, the user almost always want to find commits "done by this person on that topic". With "--all-match" option, a series of grep patterns can be turned into a requirement that all of them must produce a match, but that makes it impossible to ask for "done by me, on either this or that" with: log --author=me --committer=him --grep=this --grep=that because it will require both "this" and "that" to appear. Change the "header" parser of grep library to treat the headers specially, and parse it as: (all-match-OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that) ) ) Even though the "log" command line parser doesn't give direct access to the extended grep syntax to group terms with parentheses, this change will cover the majority of the case the users would want. This incidentally revealed that one test in t7002 was bogus. It ran: log --author=Thor --grep=Thu --format='%s' and expected (wrongly) "Thu" to match "Thursday" in the author/committer date, but that would never match, as the timestamp in raw commit buffer does not have the name of the day-of-the-week. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-13grep: rip out support for external grepJunio C Hamano1-1/+0
We still allow people to pass --[no-]ext-grep on the command line, but the option is ignored. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-16grep: Allow case insensitive search of fixed-stringsBrian Collins1-0/+2
"git grep" currently an error when you combine the -F and -i flags. This isn't in line with how GNU grep handles it. This patch allows the simultaneous use of those flags. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Brian Collins <bricollins@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-09-13Merge branch 'maint'Junio C Hamano1-0/+1
* maint: GIT 1.6.4.3 svn: properly escape arguments for authors-prog http.c: remove verification of remote packs grep: accept relative paths outside current working directory grep: fix exit status if external_grep() punts Conflicts: GIT-VERSION-GEN RelNotes
2009-09-13Merge branch 'cb/maint-1.6.3-grep-relative-up' into maintJunio C Hamano1-0/+1
* cb/maint-1.6.3-grep-relative-up: grep: accept relative paths outside current working directory grep: fix exit status if external_grep() punts Conflicts: t/t7002-grep.sh
2009-09-07grep: accept relative paths outside current working directoryClemens Buchacher1-0/+1
"git grep" would barf at relative paths pointing outside the current working directory (or subdirectories thereof). Use quote_path_relative(), which can handle such cases just fine. [jc: added tests.] Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-22grep: Add --max-depth option.Michał Kiedrowicz1-0/+1
It is useful to grep directories non-recursively, e.g. when one wants to look for all files in the toplevel directory, but not in any subdirectory, or in Documentation/, but not in Documentation/technical/. This patch adds support for --max-depth <depth> option to git-grep. If it is given, git-grep descends at most <depth> levels of directories below paths specified on the command line. Note that if path specified on command line contains wildcards, this option makes no sense, e.g. $ git grep -l --max-depth 0 GNU -- 'contrib/*' (note the quotes) will search all files in contrib/, even in subdirectories, because '*' matches all files. Documentation updates, bash-completion and simple test cases are also provided. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01grep -p: support user defined regular expressionsRené Scharfe1-0/+1
Respect the userdiff attributes and config settings when looking for lines with function definitions in git grep -p. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01grep: add option -p/--show-functionRené Scharfe1-0/+1
The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01grep: print context hunk marks between filesRené Scharfe1-0/+1
Print a hunk mark before matches from a new file are shown, in addition to the current behaviour of printing them if lines have been skipped. The result is easier to read, as (presumably unrelated) matches from different files are separated by a hunk mark. GNU grep does the same. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01grep: move context hunk mark handling into show_line()René Scharfe1-0/+1
Move last_shown into struct grep_opt, to make it available in show_line(), and then make the function handle the printing of hunk marks for context lines in a central place. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-09grep: use parseoptRené Scharfe1-14/+14
Convert git-grep to parseopt. The bitfields in struct grep_opt are converted to full ints, increasing its size. This shouldn't be a problem as there is only a single instance in memory. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07grep: add support for coloring with external grepsRené Scharfe1-0/+1
Add the config variable color.grep.external, which can be used to switch on coloring of external greps. To enable auto coloring with GNU grep, one needs to set color.grep.external to --color=always to defeat the pager started by git grep. The value of the config variable will be passed to the external grep only if it would colorize internal grep's output, so automatic terminal detected works. The default is to not pass any option, because the external grep command could be a program without color support. Also set the environment variables GREP_COLOR and GREP_COLORS to pass the configured color for matches to the external grep. This works with GNU grep; other variables could be added as needed. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07grep: color patterns in outputRené Scharfe1-0/+3
Coloring matches makes them easier to spot in the output. Add two options and two parameters: color.grep (to turn coloring on or off), color.grep.match (to set the color of matches), --color and --no-color (to turn coloring on or off, respectively). The output of external greps is not changed. This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and Thiago Alves. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07grep: remove grep_opt argument from match_expr_eval()René Scharfe1-0/+1
The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09grep: don't call regexec() for fixed stringsRené Scharfe1-0/+1
Add the new flag "fixed" to struct grep_pat and set it if the pattern is doesn't contain any regex control characters in addition to if the flag -F/--fixed-strings was specified. This gives a nice speed up on msysgit, where regexec() seems to be extra slow. Before (best of five runs): $ time git grep grep v1.6.1 >/dev/null real 0m0.552s user 0m0.000s sys 0m0.000s $ time git grep -F grep v1.6.1 >/dev/null real 0m0.170s user 0m0.000s sys 0m0.015s With the patch: $ time git grep grep v1.6.1 >/dev/null real 0m0.173s user 0m0.000s sys 0m0.000s The difference is much smaller on Linux, but still measurable. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-01git grep: Add "-z/--null" option as in GNU's grep.Raphael Zimmerer1-0/+1
Here's a trivial patch that adds "-z" and "--null" options to "git grep". It was discussed on the mailing-list that git's "-z" convention should be used instead of GNU grep's "-Z". So things like 'git grep -l -z "$FOO" | xargs -0 sed -i "s/$FOO/$BOO/"' do work now. Signed-off-by: Raphael Zimmerer <killekulla@rdrz.de> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-09-04log --author/--committer: really match only with name partJunio C Hamano1-0/+7
When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .*AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: * When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .*x" would never match anything; * To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2006-09-27grep --all-matchJunio C Hamano1-0/+2
This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27grep: free expressions and patterns when done.Junio C Hamano1-0/+1
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20Update grep internal for grepping only in head/bodyJunio C Hamano1-0/+7
This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20builtin-grep: make pieces of it available as library.Junio C Hamano1-0/+71
This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net>