aboutsummaryrefslogtreecommitdiffstats
path: root/grep.c
AgeCommit message (Collapse)AuthorFilesLines
2012-02-02grep: load file data after checking binary-nessJeff King1-3/+3
Usually we load each file to grep into memory, check whether it's binary, and then either grep it (the default) or not (if "-I" was given). In the "-I" case, we can skip loading the file entirely if it is marked as binary via gitattributes. On my giant 3-gigabyte media repository, doing "git grep -I foo" went from: real 0m0.712s user 0m0.044s sys 0m4.780s to: real 0m0.026s user 0m0.016s sys 0m0.020s Obviously this is an extreme example. The repo is almost entirely binary files, and you can see that we spent all of our time asking the kernel to read() the data. However, with a cold disk cache, even avoiding a few binary files can have an impact. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: respect diff attributes for binary-nessJeff King1-2/+14
There is currently no way for users to tell git-grep that a particular path is or is not a binary file; instead, grep always relies on its auto-detection (or the user specifying "-a" to treat all binary-looking files like text). This patch teaches git-grep to use the same attribute lookup that is used by git-diff. We could add a new "grep" flag, but that is unnecessarily complex and unlikely to be useful. Despite the name, the "-diff" attribute (or "diff=foo" and the associated diff.foo.binary config option) are really about describing the contents of the path. It's simply historical that diff was the only thing that cared about these attributes in the past. And if this simple approach turns out to be insufficient, we still have a backwards-compatible path forward: we can add a separate "grep" attribute, and fall back to respecting "diff" if it is unset. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: cache userdiff_driver in grep_sourceJeff King1-6/+16
Right now, grep only uses the userdiff_driver for one thing: looking up funcname patterns for "-p" and "-W". As new uses for userdiff drivers are added to the grep code, we want to minimize attribute lookups, which can be expensive. It might seem at first that this would also optimize multiple lookups when the funcname pattern for a file is needed multiple times. However, the compiled funcname pattern is already cached in struct grep_opt's "priv" member, so multiple lookups are already suppressed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: drop grep_buffer's "name" parameterJeff King1-2/+2
Before the grep_source interface existed, grep_buffer was used by two types of callers: 1. Ones which pulled a file into a buffer, and then wanted to supply the file's name for the output (i.e., git grep). 2. Ones which really just wanted to grep a buffer (i.e., git log --grep). Callers in set (1) should now be using grep_source. Callers in set (2) always pass NULL for the "name" parameter of grep_buffer. We can therefore get rid of this now-useless parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: refactor the concept of "grep source" into an objectJeff King1-34/+164
The main interface to the low-level grep code is grep_buffer, which takes a pointer to a buffer and a size. This is convenient and flexible (we use it to grep commit bodies, files on disk, and blobs by sha1), but it makes it hard to pass extra information about what we are grepping (either for correctness, like overriding binary auto-detection, or for optimizations, like lazily loading blob contents). Instead, let's encapsulate the idea of a "grep source", including the buffer, its size, and where the data is coming from. This is similar to the diff_filespec structure used by the diff code (unsurprising, since future patches will implement some of the same optimizations found there). The diffstat is slightly scarier than the actual patch content. Most of the modified lines are simply replacing access to raw variables with their counterparts that are now in a "struct grep_source". Most of the added lines were taken from builtin/grep.c, which partially abstracted the idea of grep sources (for file vs sha1 sources). Instead of dropping the now-redundant code, this patch leaves builtin/grep.c using the traditional grep_buffer interface (which now wraps the grep_source interface). That makes it easy to test that there is no change of behavior (yet). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: move sha1-reading mutex into low-level codeJeff King1-0/+6
The multi-threaded git-grep code needs to serialize access to the thread-unsafe read_sha1_file call. It does this with a mutex that is local to builtin/grep.c. Let's instead push this down into grep.c, where it can be used by both builtin/grep.c and grep.c. This will let us safely teach the low-level grep.c code tricks that involve reading from the object db. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: make locking flag globalJeff King1-8/+10
The low-level grep code traditionally didn't care about threading, as it doesn't do any threading itself and didn't call out to other non-thread-safe code. That changed with 0579f91 (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), which pushed the lookup of funcname attributes (which is not thread-safe) into the low-level grep code. As a result, the low-level code learned about a new global "grep_attr_mutex" to serialize access to the attribute code. A multi-threaded caller (e.g., builtin/grep.c) is expected to initialize the mutex and set "use_threads" in the grep_opt structure. The low-level code only uses the lock if use_threads is set. However, putting the use_threads flag into the grep_opt struct is not the most logical place. Whether threading is in use is not something that matters for each call to grep_buffer, but is instead global to the whole program (i.e., if any thread is doing multi-threaded grep, every other thread, even if it thinks it is doing its own single-threaded grep, would need to use the locking). In practice, this distinction isn't a problem for us, because the only user of multi-threaded grep is "git-grep", which does nothing except call grep. This patch turns the opt->use_threads flag into a global flag. More important than the nit-picking semantic argument above is that this means that the locking functions don't need to actually have access to a grep_opt to know whether to lock. Which in turn can make adding new locks simpler, as we don't need to pass around a grep_opt. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-16grep: enable threading with -p and -W using lazy attribute lookupThomas Rast1-29/+44
Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-12grep: load funcname patterns for -WThomas Rast1-3/+4
git-grep avoids loading the funcname patterns unless they are needed. ba8ea74 (grep: add option to show whole function as context, 2011-08-01) forgot to extend this test also to the new funcbody feature. Do so. The catch is that we also have to disable threading when using userdiff, as explained in grep_threads_ok(). So we must be careful to introduce the same test there. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-02Merge branch 'fk/use-kwset-pickaxe-grep-f'Junio C Hamano1-21/+45
* fk/use-kwset-pickaxe-grep-f: obstack: Fix portability issues Use kwset in grep Use kwset in pickaxe Adapt the kwset code to Git Add string search routines from GNU grep Add obstack.[ch] from EGLIBC 2.10
2011-08-28Merge branch 'jk/color-and-pager'Junio C Hamano1-1/+1
* jk/color-and-pager: want_color: automatically fallback to color.ui diff: don't load color config in plumbing config: refactor get_colorbool function color: delay auto-color decision until point of use git_config_colorbool: refactor stdout_is_tty handling diff: refactor COLOR_DIFF from a flag into an int setup_pager: set GIT_PAGER_IN_USE t7006: use test_config helpers test-lib: add helper functions for config t7006: modernize calls to unset Conflicts: builtin/commit.c parse-options.c
2011-08-20Use kwset in grepFredrik Kuivinen1-21/+45
Benchmarks for the hot cache case: before: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 3,478,085 cache-misses # 2.322 M/sec ( +- 2.690% ) 11,356,177 cache-references # 7.582 M/sec ( +- 2.598% ) 3,872,184 branch-misses # 0.363 % ( +- 0.258% ) 1,067,367,848 branches # 712.673 M/sec ( +- 2.622% ) 3,828,370,782 instructions # 0.947 IPC ( +- 0.033% ) 4,043,832,831 cycles # 2700.037 M/sec ( +- 0.167% ) 8,518 page-faults # 0.006 M/sec ( +- 3.648% ) 847 CPU-migrations # 0.001 M/sec ( +- 3.262% ) 6,546 context-switches # 0.004 M/sec ( +- 2.292% ) 1497.695495 task-clock-msecs # 3.303 CPUs ( +- 2.550% ) 0.453394396 seconds time elapsed ( +- 0.912% ) after: $ perf stat --repeat=5 git grep qwerty > /dev/null Performance counter stats for 'git grep qwerty' (5 runs): 2,989,918 cache-misses # 3.166 M/sec ( +- 5.013% ) 10,986,041 cache-references # 11.633 M/sec ( +- 4.899% ) (scaled from 95.06%) 3,511,993 branch-misses # 1.422 % ( +- 0.785% ) 246,893,561 branches # 261.433 M/sec ( +- 3.967% ) 1,392,727,757 instructions # 0.564 IPC ( +- 0.040% ) 2,468,142,397 cycles # 2613.494 M/sec ( +- 0.110% ) 7,747 page-faults # 0.008 M/sec ( +- 3.995% ) 897 CPU-migrations # 0.001 M/sec ( +- 2.383% ) 6,535 context-switches # 0.007 M/sec ( +- 1.993% ) 944.384228 task-clock-msecs # 3.177 CPUs ( +- 0.268% ) 0.297257643 seconds time elapsed ( +- 0.450% ) So we gain about 35% by using the kwset code. As a side effect of using kwset two grep tests are fixed by this patch. The first is fixed because kwset can deal with case-insensitive search containing NULs, something strcasestr cannot do. The second one is fixed because we consider patterns containing NULs as fixed strings (regcomp cannot accept patterns with NULs). Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-19color: delay auto-color decision until point of useJeff King1-1/+1
When we read a color value either from a config file or from the command line, we use git_config_colorbool to convert it from the tristate always/never/auto into a single yes/no boolean value. This has some timing implications with respect to starting a pager. If we start (or decide not to start) the pager before checking the colorbool, everything is fine. Either isatty(1) will give us the right information, or we will properly check for pager_in_use(). However, if we decide to start a pager after we have checked the colorbool, things are not so simple. If stdout is a tty, then we will have already decided to use color. However, the user may also have configured color.pager not to use color with the pager. In this case, we need to actually turn off color. Unfortunately, the pager code has no idea which color variables were turned on (and there are many of them throughout the code, and they may even have been manipulated after the colorbool selection by something like "--color" on the command line). This bug can be seen any time a pager is started after config and command line options are checked. This has affected "git diff" since 89d07f7 (diff: don't run pager if user asked for a diff style exit code, 2007-08-12). It has also affect the log family since 1fda91b (Fix 'git log' early pager startup error case, 2010-08-24). This patch splits the notion of parsing a colorbool and actually checking the configuration. The "use_color" variables now have an additional possible value, GIT_COLOR_AUTO. Users of the variable should use the new "want_color()" wrapper, which will lazily determine and cache the auto-color decision. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-01grep: add option to show whole function as contextRené Scharfe1-10/+22
Add a new option, -W, to show the whole surrounding function of a match. It uses the same regular expressions as -p and diff to find the beginning of sections. Currently it will not display comments in front of a function, but those that are following one. Despite this shortcoming it is already useful, e.g. to simply see a more complete applicable context or to extract whole functions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05grep: add --headingRené Scharfe1-1/+5
With --heading, the filename is printed once before matches from that file instead of at the start of each line, giving more screen space to the actual search results. This option is taken from ack (http://betterthangrep.com/). And now git grep can dress up like it: $ git config alias.ack "grep --break --heading --line-number" $ git ack -e --heading Documentation/git-grep.txt 154:--heading:: t/t7810-grep.sh 785:test_expect_success 'grep --heading' ' 786: git grep --heading -e char -e lo_w hello.c hello_world >actual && 808: git grep --break --heading -n --color \ Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05grep: add --breakRené Scharfe1-2/+5
With --break, an empty line is printed between matches from different files, increasing readability. This option is taken from ack (http://betterthangrep.com/). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05grep: fix coloring of hunk marks between filesRené Scharfe1-3/+12
Commit 431d6e7b (grep: enable threading for context line printing) split the printing of the "--\n" mark between results from different files out into two places: show_line() in grep.c for the non-threaded case and work_done() in builtin/grep.c for the threaded case. Commit 55f638bd (grep: Colorize filename, line number, and separator) updated the former, but not the latter, so the separators between files are not colored if threads are used. This patch merges the two. In the threaded case, hunk marks are now printed by show_line() for every file, including the first one, and the very first mark is simply skipped in work_done(). This ensures that the output is properly colored and works just as well. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09git-grep: Learn PCREMichał Kiedrowicz1-1/+74
This patch teaches git-grep the --perl-regexp/-P options (naming borrowed from GNU grep) in order to allow specifying PCRE regexes on the command line. PCRE has a number of features which make them more handy to use than POSIX regexes, like consistent escaping rules, extended character classes, ungreedy matching etc. git isn't build with PCRE support automatically. USE_LIBPCRE environment variable must be enabled (like `make USE_LIBPCRE=YesPlease`). Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09grep: Extract compile_regexp_failed() from compile_regexp()Michał Kiedrowicz1-9/+16
This simplifies compile_regexp() a little and allows re-using error handling code. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09grep: Fix a typo in a commentMichał Kiedrowicz1-1/+1
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-05grep: Put calls to fixmatch() and regmatch() into patmatch()Michał Kiedrowicz1-8/+15
Both match_one_pattern() and look_ahead() use fixmatch() and regmatch() in the same way. They really want to match a pattern againt a string, but now they need to know if the pattern is fixed or regexp. This change cleans this up by introducing patmatch() (from "pattern match") and also simplifies inserting other ways of matching a string. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-13log --author: take union of multiple "author" requestsJunio C Hamano1-12/+53
In the olden days, log --author=me --committer=him --grep=this --grep=that used to be turned into: (OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (PATTERN this) (PATTERN that)) showing my patches that do not have any "this" nor "that", which was totally useless. 80235ba ("log --author=me --grep=it" should find intersection, not union, 2010-01-17) improved it greatly to turn the same into: (ALL-MATCH (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that))) That is, "show only patches by me and committed by him, that have either this or that", which is a lot more natural thing to ask. We however need to be a bit more clever when the user asks more than one "author" (or "committer"); because a commit has only one author (and one committer), they ought to be interpreted as asking for union to be useful. The current implementation simply added another author/committer pattern at the same top-level for ALL-MATCH to insist on matching all, finding nothing. Turn log --author=me --author=her \ --committer=him --committer=you \ --grep=this --grep=that into (ALL-MATCH (OR (HEADER-AUTHOR me) (HEADER-AUTHOR her)) (OR (HEADER-COMMITTER him) (HEADER-COMMITTER you)) (OR (PATTERN this) (PATTERN that))) instead. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-12grep: move logic to compile header pattern into a separate helperJunio C Hamano1-23/+22
The callers should be queuing only GREP_PATTERN_HEAD elements to the header_list queue; simplify the switch and guard it with an assert. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24grep: support NUL chars in search strings for -FRené Scharfe1-13/+20
Search patterns in a file specified with -f can contain NUL characters. The current code ignores all characters on a line after a NUL. Pass the actual length of the line all the way from the pattern file to fixmatch() and use it for case-sensitive fixed string matching. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24grep: use REG_STARTEND for all matching if availableRené Scharfe1-10/+14
Refactor REG_STARTEND handling inlook_ahead() into a new helper, regmatch(), and use it for line matching, too. This allows regex matching beyond NUL characters if regexec() supports the flag. NUL characters themselves are not matched in any way, though. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24grep: continue case insensitive fixed string search after NUL charsRené Scharfe1-3/+9
Functions for C strings, like strcasestr(), can't see beyond NUL characters. Check if there is such an obstacle on the line and try again behind it. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24grep: use memmem() for fixed string searchRené Scharfe1-7/+9
Allow searching beyond NUL characters by using memmem() instead of strstr(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24grep: --name-only over binaryRené Scharfe1-4/+4
As with the option -c/--count, git grep with the option -l/--name-only should work the same with binary files as with text files because there is no danger of messing up the terminal with control characters from the contents of matching files. GNU grep does the same. Move the check for ->name_only before the one for binary_match_only, thus making the latter irrelevant for git grep -l. Reported-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24grep: --count over binaryRené Scharfe1-5/+4
The intent of showing the message "Binary file xyz matches" for binary files is to avoid annoying users by potentially messing up their terminals by printing control characters. In --count mode, this precaution isn't necessary. Display counts of matches if -c/--count was specified, even if -a was not given. GNU grep does the same. Moving the check for ->count before the code for handling binary file also avoids printing context lines if --count and -[ABC] were used together, so we can remove the part of the comment that mentions this behaviour. Again, GNU grep does the same. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-24grep: grep: refactor handling of binary mode optionsRené Scharfe1-9/+11
Turn the switch inside-out and add labels for each possible value of ->binary. This makes the code easier to read and avoids calling buffer_is_binary() if the option -a was given. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-03Merge branch 'rs/threaded-grep-context'Junio C Hamano1-12/+6
* rs/threaded-grep-context: grep: enable threading for context line printing Conflicts: grep.c
2010-03-20Merge branch 'ml/color-grep'Junio C Hamano1-26/+49
* ml/color-grep: grep: Colorize selected, context, and function lines grep: Colorize filename, line number, and separator Add GIT_COLOR_BOLD_* and GIT_COLOR_BG_*
2010-03-15grep: enable threading for context line printingRené Scharfe1-12/+5
If context lines are to be printed, grep separates them with hunk marks ("--\n"). These marks are printed between matches from different files, too. They are not printed before the first file, though. Threading was disabled when context line printing was enabled because avoiding to print the mark before the first line was an unsolved synchronisation problem. This patch separates the code for printing hunk marks for the threaded and the unthreaded case, allowing threading to be turned on together with the common -ABC options. ->show_hunk_mark, which controls printing of hunk marks between files in show_line(), is now set in grep_buffer_1(), but only if some results have already been printed and threading is disabled. The threaded case is handled in work_done(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-08grep: Colorize selected, context, and function linesMark Lodato1-2/+9
Colorize non-matching text of selected lines, context lines, and function name lines. The default for all three is no color, but they can be configured using color.grep.<slot>. The first two are similar to the corresponding options in GNU grep, except that GNU grep applies the color to the entire line, not just non-matching text. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-08grep: Colorize filename, line number, and separatorMark Lodato1-24/+40
Colorize the filename, line number, and separator in git grep output, as GNU grep does. The colors are customizable through color.grep.<slot>. The default is to only color the separator (in cyan), since this gives the biggest legibility increase without overwhelming the user with colors. GNU grep also defaults cyan for the separator, but defaults to magenta for the filename and to green for the line number, as well. There is one difference from GNU grep: When a binary file matches without -a, GNU grep does not color the <file> in "Binary file <file> matches", but we do. Like GNU grep, if --null is given, the null separators are not colored. For config.txt, use a a sub-list to describe the slots, rather than a single paragraph with parentheses, since this is much more readable. Remove the cast to int for `rm_eo - rm_so` since it is not necessary. Signed-off-by: Mark Lodato <lodatom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-02Merge branch 'jc/grep-author-all-match-implicit'Junio C Hamano1-6/+40
* jc/grep-author-all-match-implicit: "log --author=me --grep=it" should find intersection, not union
2010-02-03grep: simplify assignment of ->fixedRené Scharfe1-4/+1
After 885d211e, the value of the ->fixed pattern option only depends on the grep option of the same name. Regex flags don't matter any more, because fixed mode and regex mode are strictly separated. Thus we can simply copy the value from struct grep_opt to struct grep_pat, as we do already for ->word_regexp and ->ignore_case. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-28Merge branch 'fk/threaded-grep'Junio C Hamano1-18/+88
* fk/threaded-grep: Threaded grep grep: expose "status-only" feature via -q
2010-01-26grep: use REG_STARTEND (if available) to speed up regexecBenjamin Kramer1-1/+8
BSD and glibc have an extension to regexec which takes a buffer + length pair instead of a NUL-terminated string. Since we already have the length computed this can save us a strlen call inside regexec. Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-26Threaded grepFredrik Kuivinen1-18/+88
Make git grep use threads when it is available. The results below are best of five runs in the Linux repository (on a box with two cores). With the patch: git grep qwerty 1.58user 0.55system 0:01.16elapsed 183%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5774minor)pagefaults 0swaps Without: git grep qwerty 1.59user 0.43system 0:02.02elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3716minor)pagefaults 0swaps And with a pattern with quite a few matches: With the patch: $ /usr/bin/time git grep void 5.61user 0.56system 0:03.44elapsed 179%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+5587minor)pagefaults 0swaps Without: $ /usr/bin/time git grep void 5.36user 0.51system 0:05.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+800outputs (0major+3693minor)pagefaults 0swaps In either case we gain about 40% by the threading. Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-25"log --author=me --grep=it" should find intersection, not unionJunio C Hamano1-6/+40
Historically, any grep filter in "git log" family of commands were taken as restricting to commits with any of the words in the commit log message. However, the user almost always want to find commits "done by this person on that topic". With "--all-match" option, a series of grep patterns can be turned into a requirement that all of them must produce a match, but that makes it impossible to ask for "done by me, on either this or that" with: log --author=me --committer=him --grep=this --grep=that because it will require both "this" and "that" to appear. Change the "header" parser of grep library to treat the headers specially, and parse it as: (all-match-OR (HEADER-AUTHOR me) (HEADER-COMMITTER him) (OR (PATTERN this) (PATTERN that) ) ) Even though the "log" command line parser doesn't give direct access to the extended grep syntax to group terms with parentheses, this change will cover the majority of the case the users would want. This incidentally revealed that one test in t7002 was bogus. It ran: log --author=Thor --grep=Thu --format='%s' and expected (wrongly) "Thu" to match "Thursday" in the author/committer date, but that would never match, as the timestamp in raw commit buffer does not have the name of the day-of-the-week. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-13grep: rip out pessimization to use fixmatch()Junio C Hamano1-8/+1
Even when running without the -F (--fixed-strings) option, we checked the pattern and used fixmatch() codepath when it does not contain any regex magic. Finding fixed strings with strstr() surely must be faster than running the regular expression crud. Not so. It turns out that on some libc implementations, using the regcomp()/regexec() pair is a lot faster than running strstr() and strcasestr() the fixmatch() codepath uses. Drop the optimization and use the fixmatch() codepath only when the user explicitly asked for it with the -F option. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12Merge branch 'jc/maint-1.6.4-grep-lookahead' into jc/maint-grep-lookaheadJunio C Hamano1-0/+75
* jc/maint-1.6.4-grep-lookahead: grep: optimize built-in grep by skipping lines that do not hit This needs to be an evil merge as fixmatch() changed signature since 5183bf6 (grep: Allow case insensitive search of fixed-strings, 2009-11-06). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12grep: optimize built-in grep by skipping lines that do not hitJunio C Hamano1-0/+75
The internal "grep" engine we use checks for hits line-by-line, instead of letting the underlying regexec()/fixmatch() routines scan for the first match from the rest of the buffer. This was a major source of overhead compared to the external grep. Introduce a "look-ahead" mechanism to find the next line that would potentially match by using regexec()/fixmatch() in the remainder of the text to skip unmatching lines, and use it when the query criteria is simple enough (i.e. punt for an advanced grep boolean expression like "lines that have both X and Y but not Z" for now) and we are not running under "-v" (aka "--invert-match") option. Note that "-L" (aka "--files-without-match") is not a reason to disable this optimization. Under the option, we are interested if the file has any hit at all, and that is what we determine reliably with or without the optimization. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-16grep: Allow case insensitive search of fixed-stringsBrian Collins1-3/+10
"git grep" currently an error when you combine the -F and -i flags. This isn't in line with how GNU grep handles it. This patch allows the simultaneous use of those flags. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Brian Collins <bricollins@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-02grep: simplify -p outputRené Scharfe1-8/+4
It was found a bit too loud to show == separators between the function headers. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01grep -p: support user defined regular expressionsRené Scharfe1-3/+26
Respect the userdiff attributes and config settings when looking for lines with function definitions in git grep -p. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01grep: add option -p/--show-functionRené Scharfe1-8/+53
The new option -p instructs git grep to print the previous function definition as a context line, similar to diff -p. Such context lines are marked with an equal sign instead of a dash. This option complements the existing context options -A, -B, -C. Function definitions are detected using the same heuristic that diff uses. User defined regular expressions are not supported, yet. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01grep: handle pre context lines on demandRené Scharfe1-29/+32
Factor out pre context line handling into the new function show_pre_context() and change the algorithm to rewind by looking for newline characters and roll forward again, instead of maintaining an array of line beginnings and ends. This is slower for hits, but the cost for non-matching lines becomes zero. Normally, there are far more non-matching lines, so the time spent in total decreases. Before this patch (current Linux kernel repo, best of five runs): $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.134s user 0m1.932s sys 0m0.196s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m12.059s user 0m11.837s sys 0m0.224s The same with this patch: $ time git grep --no-ext-grep -B1 memset >/dev/null real 0m2.117s user 0m1.892s sys 0m0.228s $ time git grep --no-ext-grep -B1000 memset >/dev/null real 0m2.986s user 0m2.696s sys 0m0.288s Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01grep: print context hunk marks between filesRené Scharfe1-1/+6
Print a hunk mark before matches from a new file are shown, in addition to the current behaviour of printing them if lines have been skipped. The result is easier to read, as (presumably unrelated) matches from different files are separated by a hunk mark. GNU grep does the same. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-01grep: move context hunk mark handling into show_line()René Scharfe1-15/+11
Move last_shown into struct grep_opt, to make it available in show_line(), and then make the function handle the printing of hunk marks for context lines in a central place. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-03grep: fix empty word-regexp matchesRené Scharfe1-1/+5
The command "git grep -w ''" dies as soon as it encounters an empty line, reporting (wrongly) that "regexp returned nonsense". The first hunk of this patch relaxes the sanity check that is responsible for that, allowing matches to start at the end. The second hunk complements it by making sure that empty matches are rejected if -w was specified, as they are not really words. GNU grep does the same: $ echo foo | grep -c '' 1 $ echo foo | grep -c -w '' 0 Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-01grep: fix colouring of matches with zero lengthRené Scharfe1-0/+2
If a zero-length match is encountered, break out of loop and show the rest of the line uncoloured. Otherwise we'd be looping forever, trying to make progress by advancing the pointer by zero characters. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-23grep: fix word-regexp at the beginning of linesRené Scharfe1-0/+1
After bol is forwarded, it doesn't represent the beginning of the line any more. This means that the beginning-of-line marker (^) mustn't match, i.e. the regex flag REG_NOTBOL needs to be set. This bug was introduced by fb62eb7fab97cea880ea7fe4f341a4dfad14ab48 ("grep -w: forward to next possible position after rejected match"). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-20grep: fix word-regexp colouringRené Scharfe1-0/+5
As noticed by Dmitry Gryazin: When a pattern is found but it doesn't start and end at word boundaries, bol is forwarded to after the match and the pattern is searched again. When a pattern is finally found between word boundaries, the match offsets are off by the number of characters that have been skipped. This patch corrects the offsets to be relative to the value of bol as passed to match_one_pattern() by its caller. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-28Merge branch 'maint'Junio C Hamano1-2/+6
* maint: grep: fix segfault when "git grep '('" is given Documentation: fix a grammatical error in api-builtin.txt builtin-merge: fix a typo in an error message
2009-04-28Merge branch 'maint-1.6.1' into maintJunio C Hamano1-2/+6
* maint-1.6.1: grep: fix segfault when "git grep '('" is given Documentation: fix a grammatical error in api-builtin.txt builtin-merge: fix a typo in an error message
2009-04-28Merge branch 'maint-1.6.0' into maint-1.6.1Junio C Hamano1-2/+6
* maint-1.6.0: grep: fix segfault when "git grep '('" is given Documentation: fix a grammatical error in api-builtin.txt builtin-merge: fix a typo in an error message
2009-04-27grep: fix segfault when "git grep '('" is givenLinus Torvalds1-2/+6
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-18git log: avoid segfault with --all-matchMichele Ballabio1-1/+2
Avoid a segfault when the command git log --all-match was issued, by ignoring the option. Signed-off-by: Michele Ballabio <barra_cuda@katamail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-08grep: cast printf %.*s "precision" argument explicitly to intJunio C Hamano1-2/+2
On some systems, regoff_t that is the type of rm_so/rm_eo members are wider than int; %.*s precision specifier expects an int, so use an explicit cast. A breakage reported on Darwin by Brian Gernhardt should be fixed with this patch. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07grep: color patterns in outputRené Scharfe1-12/+78
Coloring matches makes them easier to spot in the output. Add two options and two parameters: color.grep (to turn coloring on or off), color.grep.match (to set the color of matches), --color and --no-color (to turn coloring on or off, respectively). The output of external greps is not changed. This patch is based on earlier ones by Nguyễn Thái Ngọc Duy and Thiago Alves. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07grep: add pmatch and eflags arguments to match_one_pattern()René Scharfe1-11/+10
Push pmatch and eflags to the callers of match_one_pattern(), which allows them to specify regex execution flags and to get the location of a match. Since we only use the first element of the matches array and aren't interested in submatches, no provision is made for callers to provide a larger array. eflags are ignored for fixed patterns, but that's OK, since they only have a meaning in connection with regular expressions containing ^ or $. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07grep: remove grep_opt argument from match_expr_eval()René Scharfe1-17/+17
The only use of the struct grep_opt argument of match_expr_eval() is to pass the option word_regexp to match_one_pattern(). By adding a pattern flag for it we can reduce the number of function arguments of these two functions, as a cleanup and preparation for adding more in the next patch. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07grep: micro-optimize hit collection for AND nodesRené Scharfe1-7/+3
In addition to returning if an expression matches a line, match_expr_eval() updates the expression's hit flag if the parameter collect_hits is set. It never sets collect_hits for children of AND nodes, though, so their hit flag will never be updated. Because of that we can return early if the first child didn't match, no matter if collect_hits is set or not. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17Add is_regex_special()René Scharfe1-8/+1
Add is_regex_special(), a character class macro for chars that have a special meaning in regular expressions. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17Change NUL char handling of isspecial()René Scharfe1-2/+3
Replace isspecial() by the new macro is_glob_special(), which is more, well, specialized. The former included the NUL char in its character class, while the letter only included characters that are special to file name globbing. The new name contains underscores because they enhance readability considerably now that it's made up of three words. Renaming the function is necessary to document its changed scope. The call sites of isspecial() are updated to check explicitly for NUL. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09grep: don't call regexec() for fixed stringsRené Scharfe1-4/+25
Add the new flag "fixed" to struct grep_pat and set it if the pattern is doesn't contain any regex control characters in addition to if the flag -F/--fixed-strings was specified. This gives a nice speed up on msysgit, where regexec() seems to be extra slow. Before (best of five runs): $ time git grep grep v1.6.1 >/dev/null real 0m0.552s user 0m0.000s sys 0m0.000s $ time git grep -F grep v1.6.1 >/dev/null real 0m0.170s user 0m0.000s sys 0m0.015s With the patch: $ time git grep grep v1.6.1 >/dev/null real 0m0.173s user 0m0.000s sys 0m0.000s The difference is much smaller on Linux, but still measurable. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09grep -w: forward to next possible position after rejected matchRené Scharfe1-4/+7
grep -w accepts matches between non-word characters, only. If a match from regexec() doesn't meet this criteria, grep continues its search after the first character of that match. We can be a bit smarter here and skip all positions that follow a word character first, as they can't match our criteria. This way we can consume characters quite cheaply and don't need to special-case the handling of the beginning of a line. Here's a contrived example command on msysgit (best of five runs): $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.611s user 0m0.000s sys 0m0.015s With the patch it's quite a bit faster: $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.179s user 0m0.000s sys 0m0.015s More common search patterns will gain a lot less, but it's a nice clean up anyway. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-05remove trailing LF in die() messagesAlexander Potashev1-1/+1
LF at the end of format strings given to die() is redundant because die already adds one on its own. Signed-off-by: Alexander Potashev <aspotashev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-11Merge branch 'maint'Junio C Hamano1-3/+3
* maint: Fix non-literal format in printf-style calls git-submodule: Avoid printing a spurious message. git ls-remote: make usage string match manpage Makefile: help people who run 'make check' by mistake
2008-11-11Fix non-literal format in printf-style callsDaniel Lowe1-3/+3
These were found using gcc 4.3.2-1ubuntu11 with the warning: warning: format not a string literal and no format arguments Incorporated suggestions from Brandon Casey <casey@nrlssc.navy.mil>. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-01git grep: Add "-z/--null" option as in GNU's grep.Raphael Zimmerer1-3/+11
Here's a trivial patch that adds "-z" and "--null" options to "git grep". It was discussed on the mailing-list that git's "-z" convention should be used instead of GNU grep's "-Z". So things like 'git grep -l -z "$FOO" | xargs -0 sed -i "s/$FOO/$BOO/"' do work now. Signed-off-by: Raphael Zimmerer <killekulla@rdrz.de> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-09-04log --author/--committer: really match only with name partJunio C Hamano1-0/+52
When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .*AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: * When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .*x" would never match anything; * To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-04Move buffer_is_binary() to xdiff-interface.hJohannes Schindelin1-11/+1
We already have two instances where we want to determine if a buffer contains binary data as opposed to text. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2006-12-20simplify inclusion of system header files.Junio C Hamano1-1/+0
This is a mechanical clean-up of the way *.c files include system header files. (1) sources under compat/, platform sha-1 implementations, and xdelta code are exempt from the following rules; (2) the first #include must be "git-compat-util.h" or one of our own header file that includes it first (e.g. config.h, builtin.h, pkt-line.h); (3) system headers that are included in "git-compat-util.h" need not be included in individual C source files. (4) "git-compat-util.h" does not have to include subsystem specific header files (e.g. expat.h). Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27grep --all-matchJunio C Hamano1-18/+96
This lets you say: git grep --all-match -e A -e B -e C to find lines that match A or B or C but limit the matches from the files that have all of A, B and C. This is different from git grep -e A --and -e B --and -e C in that the latter looks for a single line that has all of these at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27grep: fix --fixed-strings combined with expression.Junio C Hamano1-5/+2
"git grep --fixed-strings -e GIT --and -e VERSION .gitignore" misbehaved because we did not notice this needs to grab lines that have the given two fixed strings at the same time. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-27grep: free expressions and patterns when done.Junio C Hamano1-0/+42
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20Update grep internal for grepping only in head/bodyJunio C Hamano1-16/+35
This further updates the built-in grep engine so that we can say something like "this pattern should match only in head". This can be used to simplify grepping in the log messages. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20builtin-grep: make pieces of it available as library.Junio C Hamano1-0/+440
This makes three functions and associated option structures from builtin-grep available from other parts of the system. * options to drive built-in grep engine is stored in struct grep_opt; * pattern strings and extended grep expressions are added to struct grep_opt with append_grep_pattern(); * when finished calling append_grep_pattern(), call compile_grep_patterns() to prepare for execution; * call grep_buffer() to find matches in the in-core buffer. This also adds an internal option "status_only" to grep_opt, which suppresses any output from grep_buffer(). Callers of the function as library can use it to check if there is a match without producing any output. Signed-off-by: Junio C Hamano <junkio@cox.net>