| Age | Commit message (Collapse) | Author | Files | Lines |
|
Add userdiff patterns to support R programming language.
Also, add three userdiff tests for R programming language
files. These files define simple function and nested function,
with and without indentation.
Signed-off-by: Rodrigo Carvalho <rodrigorsdc@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
- It failed to capture valid functions where `{` was on the next line
due to line continuation (`\`).
- It did not recognize functions with single command body, such as
`x () echo hello`.
Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.
Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.
Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new builtin driver for generic INI files (e. g. the gitconfig
files), where:
- the funcname regular expression matches section names, i. e. any
string between brackets at the beginning of the line, with or without
indentation;
- word_regex matches any word with one or more non-whitespace
characters without checking if it is a valid variable name or value.
Also add tests for the new userdiff driver. These files define sections
and subsections, with and without indentation.
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: D. Ben Knoble <ben.knoble@gmail.com>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In "environment.h" we have quite a lot of functions and variables that
either explicitly or implicitly depend on `the_repository`.
The implicit set of stateful declarations includes for example variables
which get populated when parsing a repository's Git configuration. This
set of variables is broken by design, as their state often depends on
the last repository config that has been parsed. So they may or may not
represent the state of `the_repository`.
Fixing that is quite a big undertaking, and later patches in this series
will demonstrate a solution for a first small set of those variables. So
for now, let's guard these with `USE_THE_REPOSITORY_VARIABLE` so that
callers are aware of the implicit dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The userdiff structures may be initialized either statically on the
stack or dynamically via configuration keys. In the latter case we end
up leaking memory because we didn't have any infrastructure to discern
those strings which have been allocated statically and those which have
been allocated dynamically.
Refactor the code such that we have two pointers for each of these
strings: one that holds the value as accessed by other subsystems, and
one that points to the same string in case it has been allocated. Like
this, we can safely free the second pointer and thus plug those memory
leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git diff --exit-code --ext-diff" learned to take the exit status
of the external diff driver into account when deciding the exit
status of the overall "git diff" invocation when configured to do
so.
* rs/diff-exit-code-with-external-diff:
diff: let external diffs report that changes are uninteresting
userdiff: add and use struct external_diff
t4020: test exit code with external diffs
|
|
The options --exit-code and --quiet instruct git diff to indicate
whether it found any significant changes by exiting with code 1 if it
did and 0 if there were none. Currently this doesn't work if external
diff programs are involved, as we have no way to learn what they found.
Add that ability in the form of the new configuration options
diff.trustExitCode and diff.<driver>.trustExitCode and the environment
variable GIT_EXTERNAL_DIFF_TRUST_EXIT_CODE. They pair with the config
options diff.external and diff.<driver>.command and the environment
variable GIT_EXTERNAL_DIFF, respectively.
The new options are off by default, keeping the old behavior. Enabling
them indicates that the external diff returns exit code 1 if it finds
significant changes and 0 if it doesn't, like diff(1).
The name of the new options is taken from the git difftool and mergetool
options of similar purpose. (There they enable passing on the exit code
of a diff tool and to infer whether a merge done by a merge tool is
successful.)
The new feature sets the diff flag diff_from_contents in
diff_setup_done() if we need the exit code and are allowed to call
external diffs. This disables the optimization that avoids calling the
program with --quiet. Add it back by skipping the call if the external
diff is not able to report empty diffs. We can only do that check after
evaluating the file-specific attributes in run_external_diff().
If we do run the external diff with --quiet, send its output to
/dev/null.
I considered checking the output of the external diff to check whether
its empty. It was added as 11be65cfa4 (diff: fix --exit-code with
external diff, 2024-05-05) and quickly reverted, as it does not work
with external diffs that do not write to stdout. There's no reason why
a graphical diff tool would even need to write anything there at all.
I also considered using a non-zero exit code for empty diffs, which
could be done without adding new configuration options. We'd need to
disable the optimization that allows git diff --quiet to skip calling
external diffs, though -- that might be quite surprising if graphical
diff programs are involved. And assigning the opposite meaning of the
exit codes compared to diff(1) and git diff --exit-code to the external
diff can cause unnecessary confusion.
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Wrap the string specifying the external diff command in a new struct to
simplify adding attributes, which the next patch will do.
Make sure external_diff() still returns NULL if neither the environment
variable GIT_EXTERNAL_DIFF nor the configuration option diff.external is
set, to continue allowing its use in a boolean context.
Use a designated initializer for the default builtin userdiff driver to
adjust to the type change of the second struct member. Spelling out
only the non-zero members improves readability as a nice side-effect.
No functional change intended.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are multiple cases where we intentionally leak config strings:
- `struct gpg_format` is used to track programs that can be used for
signing commits, either via gpg(1), gpgsm(1) or ssh-keygen(1). The
user can override the commands via several config variables. As the
array is populated once, only, and the struct memers are never
written to or free'd.
- `struct ll_merge_driver` is used to track merge drivers. Same as
with the GPG format, these drivers are populated once and then
reused. Its data is never written to or free'd, either.
- `struct userdiff_funcname` and `struct userdiff_driver` can be
configured via `diff.<driver>.*` to add additional drivers. Again,
these have a global lifetime and are never written to or free'd.
All of these are intentionally kept alive and are never written to.
Furthermore, all of these are being assigned both string constants in
some places, and allocated strings in other places. This will cause
warnings once we enable `-Wwrite-strings`, so let's mark the respective
fields as `const char *` and cast away the constness when assigning
those values.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
- Support multi-line methods by not requiring closing parenthesis.
- Support multiple generics (comma was missing before).
- Add missing `foreach`, `lock` and `fixed` keywords to skip over.
- Remove `instanceof` keyword, which isn't C#.
- Also detect non-method keywords not positioned at the start of a line.
- Added tests; none existed before.
The overall strategy is to focus more on what isn't expected for
method/property definitions, instead of what is, but is fully optional.
Signed-off-by: Steven Jeuris <steven.jeuris@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code incorrectly attempted to use textconv cache when asked,
even when we are not running in a repository, which has been
corrected.
* jk/textconv-cache-outside-repo-fix:
userdiff: skip textconv caching when not in a repository
|
|
The textconv caching system uses git-notes to store its cache entries.
But if you're using "diff --no-index" outside of a repository, then
obviously that isn't going to work.
Since caching is just an optimization, it's OK for us to skip it.
However, the current behavior is much worse: we call notes_cache_init()
which tries to look up the ref, and the low-level ref code hits a BUG(),
killing the program. Instead, we should notice before setting up the
cache that it there's no repository, and just silently skip it.
Reported-by: Paweł Dominiak <dominiak.pawel@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add and apply a semantic patch for calling xstrncmpz() to compare a
NUL-terminated string with a buffer of a known length instead of using
strncmp() and checking the terminating NUL explicitly. This simplifies
callers by reducing code duplication.
I had to adjust remote.c manually because Coccinelle inexplicably
changed the indent of the else branches.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=<tree>" option to
allow it to read attribute files from a tree-ish, but did so only
for the command. Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.
Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=<tree>" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.
Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The userdiff regexp patterns for various filetypes that are built
into the system have been updated to avoid triggering regexp errors
from UTF-8 aware regex engines.
* rs/userdiff-multibyte-regex:
userdiff: support regexec(3) with multi-byte support
|
|
Since 1819ad327b (grep: fix multibyte regex handling under macOS,
2022-08-26) we use the system library for all regular expression
matching on macOS, not just for git grep. It supports multi-byte
strings and rejects invalid multi-byte characters.
This broke all built-in userdiff word regexes in UTF-8 locales because
they all include such invalid bytes in expressions that are intended to
match multi-byte characters without explicit support for that from the
regex engine.
"|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" is added to all built-in word
regexes to match a single non-space or multi-byte character. The \xNN
characters are invalid if interpreted as UTF-8 because they have their
high bit set, which indicates they are part of a multi-byte character,
but they are surrounded by single-byte characters.
Replace that expression with "|[^[:space:]]" if the regex engine
supports multi-byte matching, as there is no need to have an explicit
range for multi-byte characters then. Check for that capability at
runtime, because it depends on the locale and thus on environment
variables. Construct the full replacement expression at build time
and just switch it in if necessary to avoid string manipulation and
allocations at runtime.
Additionally the word regex for tex contains the expression
"[a-zA-Z0-9\x80-\xff]+" with a similarly invalid range. The best
replacement with only valid characters that I can come up with is
"([a-zA-Z0-9]|[^\x01-\x7f])+". Unlike the original it matches NUL
characters, though. Assuming that tex files usually don't contain NUL
this should be acceptable.
Reported-by: D. Ben Knoble <ben.knoble@gmail.com>
Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up to clarify the rule that "git-compat-util.h" must be
the first to be included.
* en/header-cleanup:
diff.h: remove unnecessary include of object.h
Remove unnecessary includes of builtin.h
treewide: replace cache.h with more direct headers, where possible
replace-object.h: move read_replace_refs declaration from cache.h to here
object-store.h: move struct object_info from cache.h
dir.h: refactor to no longer need to include cache.h
object.h: stop depending on cache.h; make cache.h depend on object.h
ident.h: move ident-related declarations out of cache.h
pretty.h: move has_non_ascii() declaration from commit.h
cache.h: remove dependence on hex.h; make other files include it explicitly
hex.h: move some hex-related declarations from cache.h
hash.h: move some oid-related declarations from cache.h
alloc.h: move ALLOC_GROW() functions from cache.h
treewide: remove unnecessary cache.h includes in source files
treewide: remove unnecessary cache.h includes
treewide: remove unnecessary git-compat-util.h includes in headers
treewide: ensure one of the appropriate headers is sourced first
|
|
The "diff" drivers specified by the "diff" attribute attached to
paths can now specify which algorithm (e.g. histogram) to use.
* jc/diff-algo-attribute:
diff: teach diff to read algorithm from diff driver
diff: consolidate diff algorithm option parsing
|
|
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It can be useful to specify diff algorithms per file type. For example,
one may want to use the minimal diff algorithm for .json files, another
for .c files, etc.
The diff machinery already checks attributes for a diff driver. Teach
the diff driver parser a new type "algorithm" to look for in the
config, which will be used if a driver has been specified through the
attributes.
Enforce precedence of the diff algorithm by favoring the command line
option, then looking at the driver attributes & config combination, then
finally the diff.algorithm config.
To enforce precedence order, use a new `ignore_driver_algorithm` member
during options parsing to indicate the diff algorithm was set via command
line args.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A new kind of class was added in Java 17 -- sealed classes.[1] This
feature includes several new keywords that may appear in a declaration
of a class. New modifiers before name of the class: "sealed" and
"non-sealed", and a clause after name of the class marked by keyword
"permits".
The current set of regular expressions in userdiff.c already allows the
modifier "sealed" and the "permits" clause, but not the modifier
"non-sealed", which is the first hyphenated keyword in Java.[2] Allow
hyphen in the words that precede the name of type to match the
"non-sealed" modifier.
In new input file "java-sealed" for the test t4018-diff-funcname.sh, use
a Java code comment for the marker "RIGHT". This workaround is needed,
because the name of the sealed class appears on the line of code that
has the "ChangeMe" marker.
[1] Detailed description in "JEP 409: Sealed Classes"
https://openjdk.org/jeps/409
[2] "JEP draft: Keyword Management for the Java Language"
https://openjdk.org/jeps/8223002
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A new kind of class was added in Java 16 -- records.[1] The syntax of
records is similar to regular classes with one important distinction:
the name of the record class is followed by a mandatory list of
components. The list is enclosed in parentheses, it may be empty, and
it may immediately follow the name of the class or type parameters, if
any, with or without separating whitespace. For example:
public record Example(int i, String s) {
}
public record WithTypeParameters<A, B>(A a, B b, String s) {
}
record SpaceBeforeComponents (String comp1, int comp2) {
}
Support records in the builtin userdiff pattern for Java. Add "record"
to the alternatives of keywords for kinds of class.
Allowing matching various possibilities for the type parameters and/or
list of the components of a record has already been covered by the
preceding patch.
[1] detailed description is available in "JEP 395: Records"
https://openjdk.org/jeps/395
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A class or interface in Java can have type parameters following the name
in the declared type, surrounded by angle brackets (paired less than and
greater than signs).[2] The type parameters -- `A` and `B` in the
examples -- may follow the class name immediately:
public class ParameterizedClass<A, B> {
}
or may be separated by whitespace:
public class SpaceBeforeTypeParameters <A, B> {
}
A part of the builtin userdiff pattern for Java matches declarations of
classes, enums, and interfaces. The regular expression requires at
least one whitespace character after the name of the declared type.
This disallows matching for opening angle bracket of type parameters
immediately after the name of the type. Mandatory whitespace after the
name of the type also disallows using the pattern in repositories with a
fairly common code style that puts braces for the body of a class on
separate lines:
class WithLineBreakBeforeOpeningBrace
{
}
Support matching Java code in more diverse code styles and declarations
of classes and interfaces with type parameters immediately following the
name of the type in the builtin userdiff pattern for Java. Do so by
just matching anything until the end of the line after the keywords for
the kind of type being declared.
[1] Since Java 5 released in 2004.
[2] Detailed description is available in the Java Language
Specification, sections "Type Variables" and "Parameterized Types":
https://docs.oracle.com/javase/specs/jls/se17/html/jls-4.html#jls-4.4
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The contents of the .gitattributes files may evolve over time, but "git
check-attr" always checks attributes against them in the working tree
and/or in the index. It may be beneficial to optionally allow the users
to check attributes taken from a commit other than HEAD against paths.
Add a new flag `--source` which will allow users to check the
attributes against a commit (actually any tree-ish would do). When the
user uses this flag, we go through the stack of .gitattributes files but
instead of checking the current working tree and/or in the index, we
check the blobs from the provided tree-ish object. This allows the
command to also be used in bare repositories.
Since we use a tree-ish object, the user can pass "--source
HEAD:subdirectory" and all the attributes will be looked up as if
subdirectory was the root directory of the repository.
We cannot simply use the `<rev>:<path>` syntax without the `--source`
flag, similar to how it is used in `git show` because any non-flag
parameter before `--` is treated as an attribute and any parameter after
`--` is treated as a pathname.
The change involves creating a new function `read_attr_from_blob`, which
given the path reads the blob for the path against the provided source and
parses the attributes line by line. This function is plugged into
`read_attr()` function wherein we go through the stack of attributes
files.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Co-authored-by: toon@iotcl.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since f12fa9ee6c (userdiff: add and use for_each_userdiff_driver(),
2021-04-08), lookup of userdiffs is done with a generic
for_each_userdiff_driver(). But the name lookup doesn't use the "type"
field, of course.
We can't get rid of that field from the generic interface because it is
used by t/helper/test-userdiff.c. So mark it as unused in this instance
to silence -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A new built-in userdiff driver for kotlin.
* jd/userdiff-kotlin:
userdiff: add builtin diff driver for kotlin language.
|
|
The xfuncname pattern finds func/class declarations
in diffs to display as a hunk header. The word_regex
pattern finds individual tokens in Kotlin code to generate
appropriate diffs.
This patch adds xfuncname regex and word_regex for Kotlin
language.
Signed-off-by: Jaydeep P Das <jaydeepjd.8914@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the "struct userdiff_driver" assignmentns to use designated
initializers, but let's keep the PATTERNS() and IPATTERN() convenience
macros to avoid churn, but have them defined in terms of designated
initializers.
For the "driver_true" and "driver_false" let's have the compiler
implicitly initialize most of the fields, but let's leave a redundant
".binary = 0" for "driver_true" to make it obvious that it's the
opposite of the the ".binary = 1" for "driver_false".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The implementation of digit-separating single-quotes introduced a
note-worthy regression: the change of a character literal with a
digit would splice the digit and the closing single-quote. For
example, the change from 'a' to '2' is now tokenized as
'[-a'-]{+2'+} instead of '[-a-]{+2+}'.
The options to fix the regression are:
- Tighten the regular expression such that the single-quote can only
occur between digits (that would match the official syntax).
- Remove support for digit separators.
I chose to remove support, because
- I have not seen a lot of code make use of digit separators.
- If code does use digit separators, then the numbers are typically
long. If a change in one of the segments occurs, it is actually
better visible if only that segment is highlighted as the word
that changed instead of the whole long number.
This choice does introduce another minor regression, though, which
is highlighted in the test case: when a change occurs in the second
or later segment of a hexadecimal number where the segment begins
with a digit, but also has letters, the segment is mistaken as
consisting of a number and an identifier. I can live with that.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since C++20, the language has a generalized comparison operator <=>.
Teach the cpp driver not to separate it into <= and > tokens.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since C++17, the single-quote can be used as digit separator:
3.141'592'654
1'000'000
0xdead'beaf
Make it known to the word regex of the cpp driver, so that numbers are
not split into separate tokens at the single-quotes.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Generally, word regex can be written such that they match tokens
liberally and need not model the actual syntax because it can be assumed
that the regex will only be applied to syntactically correct text.
The regex for cpp (C/C++) is too liberal, though. It regards these
sequences as single tokens:
1+2
1.5-e+2+f
and the following amalgams as one token:
.l as in str.length
.f as in str.find
.e as in str.erase
Tighten the regex in the following way:
- Accept + and - only in one position in the exponent. + and - are no
longer regarded as the sign of a number and are treated by the
catcher-all that is not visible in the driver's regex.
- Accept a leading decimal point only when it is followed by a digit.
For readability, factor hex- and binary numbers into an own term.
As a drive-by, this fixes that floating point numbers such as 12E5
(with upper-case E) were split into two tokens.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update the userdiff pattern for PHP.
* uk/userdiff-php-enum:
userdiff: support enum keyword in PHP hunk header
|
|
"enum" keyword will be introduced in PHP 8.1.
https://wiki.php.net/rfc/enumerations
Signed-off-by: USAMI Kenta <tadsan@zonu.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The userdiff pattern for "java" language has been updated.
* th/userdiff-more-java:
userdiff: improve java hunk header regex
|
|
Remind developers that the userdiff patterns should be kept simple
and permissive, assuming that the contents they apply are always
syntactically correct.
* jc/userdiff-pattern-hint:
userdiff: comment on the builtin patterns
|
|
Currently, the git diff hunk headers show the wrong method signature if the
method has a qualified return type, an array return type, or a generic return
type because the regex doesn't allow dots (.), [], or < and > in the return
type. Also, type parameter declarations couldn't be matched.
Add several t4018 tests asserting the right hunk headers for different cases:
- enum constant change
- change in generic method with bounded type parameters
- change in generic method with wildcard
- field change in a nested class
Signed-off-by: Tassilo Horn <tsdh@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remind developers that they do not need to go overboard to implement
patterns to prepare for invalid constructs. They only have to be
sufficiently permissive, assuming that the payload is syntactically
correct, and that may allow them to be simpler.
Text stolen mostly from, and further improved by, Johannes Sixt.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Records are added in C# 9
Code example :
public record Person(string FirstName, string LastName);
For more information, see:
* https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-9
Signed-off-by: Julian Verdurmen <julian.verdurmen@outlook.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A bit of code clean-up and a lot of test clean-up around userdiff
area.
* ab/userdiff-tests:
blame tests: simplify userdiff driver test
blame tests: don't rely on t/t4018/ directory
userdiff: remove support for "broken" tests
userdiff tests: list builtin drivers via test-tool
userdiff tests: explicitly test "default" pattern
userdiff: add and use for_each_userdiff_driver()
userdiff style: normalize pascal regex declaration
userdiff style: declare patterns with consistent style
userdiff style: re-order drivers in alphabetical order
|
|
Add a diff driver for Scheme-like languages which recognizes top level
and local `define` forms, whether it is a function definition, binding,
syntax definition or a user-defined `define-xyzzy` form.
Also supports R6RS `library` forms, `module` forms along with class and
struct declarations used in Racket (PLT Scheme).
Alternate "def" syntax such as those in Gerbil Scheme are also
supported, like defstruct, defsyntax and so on.
The rationale for picking `define` forms for the hunk headers is because
it is usually the only significant form for defining the structure of
the program, and it is a common pattern for schemers to have local
function definitions to hide their visibility, so it is not only the top
level `define`'s that are of interest. Schemers also extend the language
with macros to provide their own define forms (for example, something
like a `define-test-suite`) which is also captured in the hunk header.
Since it is common practice to extend syntax with variants of a form
like `module+`, `class*` etc, those have been supported as well.
The word regex is a best-effort attempt to conform to R7RS[1] valid
identifiers, symbols and numbers.
[1] https://small.r7rs.org/attachment/r7rs.pdf (section 2.1)
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor the userdiff_find_by_namelen() function so that a new
for_each_userdiff_driver() API function does most of the work.
This will be useful for the same reason we've got other for_each_*()
API functions as part of various APIs, and will be used in a follow-up
commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Declare the pascal pattern consistently with how we declare the
others, not having "\n" on one line by itself, but as part of the
pattern, and when there are alterations have the "|" at the start, not
end of the line.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change those patterns which were declared with a regex on the same
line as the "PATTERNS()" line to put that regex on the next line, and
add missing "/* -- */" separator comments between the pattern and
word_regex.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Address some old code smell and move around the built-in userdiff
drivers so they're both in alphabetical order, and now in the same
order they appear in the gitattributes(5) documentation.
The two started drifting in be58e70dba (diff: unify external diff and
funcname parsing code, 2008-10-05), and then even further in
80c49c3de2 (color-words: make regex configurable via attributes,
2009-01-17) when the "cpp" pattern was added.
There are no functional changes here, and as --color-moved will show
only moved existing lines.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The userdiff pattern learned to identify the function definition in
POSIX shells and bash.
* ve/userdiff-bash:
userdiff: support Bash
|
|
Userdiff for CSS update.
* sd/userdiff-css-update:
userdiff: expand detected chunk headers for css
|
|
Userdiff for Rust update.
* kb/userdiff-rust-macro-rules:
userdiff: recognize 'macro_rules!' as starting a Rust function block
|
|
Support POSIX, bashism and mixed function declarations, all four
compound command types, trailing comments and mixed whitespace.
Even though Bash allows locale-dependent characters in function names
<https://unix.stackexchange.com/a/245336/3645>, only detect function
names with characters allowed by POSIX.1-2017
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_235>
for simplicity. This should cover the vast majority of use cases, and
produces system-agnostic results.
Since a word pattern has to be specified, but there is no easy way to
know the default word pattern, use the default `IFS` characters for a
starter. A later patch can improve this.
Signed-off-by: Victor Engmark <victor@engmark.name>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The regex used for the CSS builtin diff driver in git is only
able to show chunk headers for lines that start with a number,
a letter or an underscore.
However, the regex fails to detect classes (starts with a .), ids
(starts with a #), :root and attribute-value based selectors (for
example [class*="col-"]), as well as @based block-level statements
like @page,@keyframes and @media since all of them, start with a
special character.
Allow the selectors and block level statements to begin with these
special characters.
Signed-off-by: Sohom Datta <sohom.datta@learner.manipal.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Konrad Borowski <konrad@borowski.pw>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
PHP permits functions to be defined like
final public function foo() { }
abstract protected function bar() { }
but our hunk header pattern does not recognize these decorations.
Add "final" and "abstract" to the list of function modifiers.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Javier Spagnoletti <phansys@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The third part of the Fortran xfuncname regex wants to match the
beginning of a subroutine or function, so it allows for all characters
except `'`, `"` or whitespace before the keyword 'function' or
'subroutine'. This is meant to match the 'recursive', 'elemental' or
'pure' keywords, as well as function return types, and to prevent
matches inside strings.
However, the negated set does not contain the `!` comment character,
so a line with an end-of-line comment containing the keyword 'function' or
'subroutine' followed by another word is mistakenly chosen as a hunk header.
Improve the regex by adding `!` to the negated set.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The Fortran userdiff patterns, introduced in 909a5494f8 (userdiff.c: add
builtin fortran regex patterns, 2010-09-10), predate the test
infrastructure for xfuncname patterns, introduced in bfa7d01413 (t4018:
an infrastructure to test hunk headers, 2014-03-21).
Add tests for the Fortran xfuncname patterns. The test
't/t4018/fortran-comment-keyword' documents a shortcoming of the regex
that is fixed in a subsequent commit.
While at it, add descriptive comments for the different parts of the
regex.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The userdiff patterns for Markdown documents have been added.
* ah/userdiff-markdown:
userdiff: support Markdown
|
|
It's typical to find Markdown documentation alongside source code, and
having better context for documentation changes is useful; see also
commit 69f9c87d4 (userdiff: add support for Fountain documents,
2015-07-21).
The pattern is based on the CommonMark specification 0.29, section 4.2
<https://spec.commonmark.org/> but doesn't match empty headings, as
seeing them in a hunk header is unlikely to be useful.
Only ATX headings are supported, as detecting setext headings would
require printing the line before a pattern matches, or matching a
multiline pattern. The word-diff pattern is the same as the pattern for
HTML, because many Markdown parsers accept inline HTML.
Signed-off-by: Ash Holland <ash@sorrel.sh>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We return the length to a subset of a string using an "int *"
out-parameter. This is fine most of the time, as we'd expect config keys
to be relatively short, but it could behave oddly if we had a gigantic
config key. A more appropriate type is size_t.
Let's switch over, which lets our callers use size_t as appropriate
(they are bound by our type because they must pass the out-parameter as
a pointer). This is mostly just a cleanup to make it clear this code
handles long strings correctly. In practice, our config parser already
chokes on long key names (because of a similar int/size_t mixup!).
When doing an int/size_t conversion, we have to be careful that nobody
was trying to assign a negative value to the variable. I manually
confirmed that for each case here. They tend to just feed the result to
xmemdupz() or similar; in a few cases I adjusted the parameter types for
helper functions to make sure the size_t is preserved.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Hotfix.
* ln/userdiff-elixir:
userdiff: remove empty subexpression from elixir regex
|
|
The regex failed to compile on FreeBSD.
Also add /* -- */ mark to separate the two regex entries given to
the PATTERNS() macro, to make it consistent with patterns for other
content types.
Signed-off-by: Ed Maste <emaste@FreeBSD.org>
Reviewed-by: Jeff King <peff@peff.net>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The userdiff machinery has been taught that "async def" is another
way to begin a "function" in Python.
* jh/userdiff-python-async:
userdiff: support Python async functions
|
|
Python's async functions (declared with "async def" rather than "def")
were not being displayed in hunk headers. This commit teaches git about
the async function syntax, and adds tests for the Python userdiff regex.
Signed-off-by: Josh Holland <anowlcalledjosh@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adds support for xfuncref in Elixir[1] language which is Ruby-like
language that runs on Erlang[3] Virtual Machine (BEAM).
[1]: https://elixir-lang.org
[2]: https://www.erlang.org
Signed-off-by: Łukasz Niemier <lukasz@niemier.pl>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While reviewing some dts diffs recently I noticed that the hunk header
logic was failing to find the containing node. This is because the regex
doesn't consider properties that may span multiple lines, i.e.
property = <something>,
<something_else>;
and it got hung up on comments inside nodes that look like the root node
because they start with '/*'. Add tests for these cases and update the
regex to find them. Maybe detecting the root node is too complicated but
forcing it to be a backslash with any amount of whitespace up to an open
bracket seemed OK. I tried to detect that a comment is in-between the
two parts but I wasn't happy so I just dropped it.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The Linux kernel receives many patches to the devicetree files each
release. The hunk header for those patches typically show nothing,
making it difficult to figure out what node is being modified without
applying the patch or opening the file and seeking to the context. Let's
add a builtin 'dts' pattern to git so that users can get better diff
output on dts files when they use the diff=dts driver.
The regex has been constructed based on the spec at devicetree.org[1]
and with some help from Johannes Sixt.
[1] https://github.com/devicetree-org/devicetree-specification/releases/latest
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The pattern "git diff/grep" use to extract funcname and words
boundary for Rust has been added.
* ml/userdiff-rust:
userdiff: two simplifications of patterns for rust
userdiff: add built-in pattern for rust
|
|
- Do not enforce (but assume) syntactic correctness of language
constructs that go into hunk headers: we only want to ensure that
the keywords actually are words and not just the initial part of
some identifier.
- In the word regex, match numbers only when they begin with a digit,
but then be liberal in what follows, assuming that the text that is
matched is syntactially correct.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Octave pattern is almost the same as matlab, except
that '%%%' and '##' can also be used to begin code sections,
in addition to '%%' that is understood by both. Octave
pattern is merged into Matlab pattern. Test cases for
the hunk header patterns of matlab and octave under
t/t4018 are added.
Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This adds xfuncname and word_regex patterns for Rust, a quite
popular programming language. It also includes test cases for the
xfuncname regex (t4018) and updated documentation.
The word_regex pattern finds identifiers, integers, floats and
operators, according to the Rust Reference Book.
Cc: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* nd/style-opening-brace:
style: the opening '{' of a function is in a separate line
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Various codepaths in the core-ish part learn to work on an
arbitrary in-core index structure, not necessarily the default
instance "the_index".
* nd/the-index: (23 commits)
revision.c: reduce implicit dependency the_repository
revision.c: remove implicit dependency on the_index
ws.c: remove implicit dependency on the_index
tree-diff.c: remove implicit dependency on the_index
submodule.c: remove implicit dependency on the_index
line-range.c: remove implicit dependency on the_index
userdiff.c: remove implicit dependency on the_index
rerere.c: remove implicit dependency on the_index
sha1-file.c: remove implicit dependency on the_index
patch-ids.c: remove implicit dependency on the_index
merge.c: remove implicit dependency on the_index
merge-blobs.c: remove implicit dependency on the_index
ll-merge.c: remove implicit dependency on the_index
diff-lib.c: remove implicit dependency on the_index
read-cache.c: remove implicit dependency on the_index
diff.c: remove implicit dependency on the_index
grep.c: remove implicit dependency on the_index
diff.c: remove the_index dependency in textconv() functions
blame.c: rename "repo" argument to "r"
combine-diff.c: remove implicit dependency on the_index
...
|
|
[jc: squashed in missing forward decl in userdiff.h found by Ramsay]
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
git_check_attr() returns always 0.
Remove all the error handling code of the callers, which is never executed.
Change git_check_attr() to be a void function.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the attr API take an index_state instead of assuming the_index in
attr code. All call sites are converted blindly to keep the patch
simple and retain current behavior. Individual call sites may receive
further updates to use the right index instead of the_index.
There is one ugly temporary workaround added in attr.c that needs some
more explanation.
Commit c24f3abace (apply: file commited with CRLF should roundtrip
diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
read the index at all. But what do you know, we read it anyway by
falling back to the_index. When "istate" from convert_to_git is now
propagated down to read_attr_from_array() we will hit segfault
somewhere inside read_blob_data_from_index.
The right way of dealing with this is to kill "use_index" variable and
only follow "istate" but at this stage we are not ready for that:
while most git_attr_set_direction() calls just passes the_index to be
assigned to use_index, unpack-trees passes a different one which is
used by entry.c code, which has no way to know what index to use if we
delete use_index. So this has to be done later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Recent version of PHP supports interface, trait, abstract class and
final class. This patch fixes the PHP hunk header regexp to support
all of these keywords.
Signed-off-by: Kana Natsuno <dev@whileimautomaton.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update funcname pattern used for C# to recognize "async" keyword.
* tl/userdiff-csharp-async:
userdiff.c: add C# async keyword in diff pattern
|
|
Currently C# async methods are not shown in diff hunk headers. I just
added the async keyword to the csharp method pattern so that they are
properly detected.
Signed-off-by: Thomas Levesque <thomas.levesque@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This adds xfuncname and word_regex patterns for golang, a quite
popular programming language. It also includes test cases for the
xfuncname regex (t4018) and updated documentation.
The xfuncname regex finds functions, structs and interfaces. Although
the Go language prohibits the opening brace from being on its own
line, the regex does not makes it mandatory, to be able to match
`func` statements like this:
func foo(bar int,
baz int) {
}
This is covered by the test case t4018/golang-long-func.
The word_regex pattern finds identifiers, integers, floats, complex
numbers and operators, according to the go specification.
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The built-in pattern to detect the "function header" for HTML did
not match <H1>..<H6> elements without any attributes, which has
been fixed.
* ik/userdiff-html-h-element-fix:
userdiff: fix HTML hunk header regexp
|
|
Current HTML header regexp doesn't match headers without attributes.
So it fails to match <h1>...</h1>, while <h1 class="smth">...</h1> matches.
Make attributes optional to fix this. The regexp is still far from
perfect, but now it at least handles the common case.
Signed-off-by: Ilya Kantor <iliakan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The remaining callers are all simple "I have N attributes I am
interested in. I'll ask about them with various paths one by one".
After this step, no caller to git_check_attrs() remains. After
removing it, we can extend "struct attr_check" struct with data
that can be used in optimizing the query for the specific N
attributes it contains.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The traditional API to check attributes is to prepare an N-element
array of "struct git_attr_check" and pass N and the array to the
function "git_check_attr()" as arguments.
In preparation to revamp the API to pass a single structure, in
which these N elements are held, rename the type used for these
individual array elements to "struct attr_check_item" and rename
the function to "git_check_attrs()".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
CSS is widely used, motivating it being included as a built-in pattern.
It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.
Logic behind the "pattern" regex is:
1. reject lines ending with a colon/semicolon (properties)
2. if a line begins with a name in column 1, pick the whole line
Credits to Johannes Sixt (j6t@kdbg.org) for the pattern regex and most
of the tests.
Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add support for Fountain, a plain text screenplay format. Git
facilitates not just programming specifically, but creative writing
in general, so it makes sense to also support other plain text
documents besides source code.
In the structure of a screenplay specifically, scenes are roughly
analogous to functions, in the sense that it makes your job easier
if you can see which ones were changed in a given range of patches.
More information about the Fountain format can be found on its
official website, at http://fountain.io .
Signed-off-by: Zoë Blade <zoe@bytenoise.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A few files include the same header file directly more than once.
As all these headers protect themselves against repeated inclusion
by the "#ifndef FOO_H / #define FOO_H / ... / #endif" idiom, leave
only the first inclusion and remove the later inclusion as a no-op
clean-up.
Signed-off-by: Дилян Палаузов <git-dpa@aegee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The hunk header pattern 'cpp' is intended for C and C++ source code, but
it is actually not particularly useful for the latter, and even misses
some use-cases for the former.
The parts of the pattern have the following flaws:
- The first part matches an identifier followed immediately by a colon
and arbitrary text and is intended to reject goto labels and C++
access specifiers (public, private, protected). But this pattern also
rejects C++ constructs, which look like this:
MyClass::MyClass()
MyClass::~MyClass()
MyClass::Item MyClass::Find(...
- The second part matches an identifier followed by a list of qualified
names (i.e. identifiers separated by the C++ scope operator '::')
separated by space or '*' followed by an opening parenthesis (with
space between the tokens). It matches function declarations like
struct item* get_head(...
int Outer::Inner::Func(...
Since the pattern requires at least two identifiers, GNU-style
function definitions are ignored:
void
func(...
Moreover, since the pattern does not allow punctuation other than '*',
the following C++ constructs are not recognized:
. template definitions:
template<class T> int func(T arg)
. functions returning references:
const string& get_message()
. functions returning templated types:
vector<int> foo()
. operator definitions:
Value operator+(Value l, Value r)
- The third part of the pattern finally matches compound definitions.
But it forgets about unions and namespaces, and also skips single-line
definitions
struct random_iterator_tag {};
because no semicolon can occur on the line.
Change the first pattern to require a colon at the end of the line
(except for trailing space and comments), so that it does not reject
constructor or destructor definitions.
Notice that all interesting anchor points begin with an identifier or
keyword. But since there is a large variety of syntactical constructs
after the first "word", the simplest is to require only this word and
accept everything else. Therefore, this boils down to a line that begins
with a letter or underscore (optionally preceded by the C++ scope
operator '::' to accept functions returning a type anchored at the
global namespace). Replace the second and third part by a single pattern
that picks such a line.
This has the following desirable consequence:
- All constructs mentioned above are recognized.
and the following likely desirable consequences:
- Definitions of global variables and typedefs are recognized:
int num_entries = 0;
extern const char* help_text;
typedef basic_string<wchar_t> wstring;
- Commonly used marco-ized boilerplate code is recognized:
BEGIN_MESSAGE_MAP(CCanvas,CWnd)
Q_DECLARE_METATYPE(MyStruct)
PATTERNS("tex",...)
(The last one is from this very patch.)
but also the following possibly undesirable consequence:
- When a label is not on a line by itself (except for a comment) it is
no longer rejected, but can appear as a hunk header if it occurs at
the beginning of a line:
next:;
IMO, the benefits of the change outweigh the (possible) regressions by a
large margin.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Do not split constants such as 123U, 456ll, 789UL at the first U or
second L.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The character sequences ->* and .* are valid C++ operators. Keep them
together in --word-diff mode.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
- Allow extra space in "is new" and "is separate"
- Fix bug in word regex for numbers
Signed-off-by: Adrian Johnson <ajohnson@redneon.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we parse userdiff config, we generally assume that
diff.name.key
will affect the "key" value of the "name" driver. However,
without confirming that the key is a valid userdiff key, we
may accidentally conflict with the ancient "diff.color.*"
namespace. The current code is careful not to even create a
driver struct if we do not see a key that is known by the
diff-driver code.
However, this carefulness is unnecessary; the default driver
with no keys set behaves exactly the same as having no
driver at all. We can simply set up the driver struct as
soon as we see we have a config key that looks like a
driver. This makes the code a bit more readable.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These callers can drop some inline pointer arithmetic and
magic offset constants, making them more readable and less
error-prone (those constants had to match the lengths of
strings, but there is no automatic verification of that
fact).
The "ep" pointer (presumably for "end pointer"), which
points to the final key segment of the config variable, is
given the more standard name "key" to describe its function
rather than its derivation.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add Ada xfuncname and wordRegex patterns to the list of builtin
patterns.
Signed-off-by: Adrian Johnson <ajohnson@redneon.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the userdiff_config function was introduced in be58e70
(diff: unify external diff and funcname parsing code,
2008-10-05), it used a return value convention unlike any
other config callback. Like other callbacks, it used "-1" to
signal error. But it returned "1" to indicate that it found
something, and "0" otherwise; other callbacks simply
returned "0" to indicate that no error occurred.
This distinction was necessary at the time, because the
userdiff namespace overlapped slightly with the color
configuration namespace. So "diff.color.foo" could mean "the
'foo' slot of diff coloring" or "the 'foo' component of the
"color" userdiff driver". Because the color-parsing code
would die on an unknown color slot, we needed the userdiff
code to indicate that it had matched the variable, letting
us bypass the color-parsing code entirely.
Later, in 8b8e862 (ignore unknown color configuration,
2009-12-12), the color-parsing code learned to silently
ignore unknown slots. This means we no longer need to
protect userdiff-matched variables from reaching the
color-parsing code.
We can therefore change the userdiff_config calling
convention to a more normal one. This drops some code from
each caller, which is nice. But more importantly, it reduces
the cognitive load for readers who may wonder why
userdiff_config is unlike every other config callback.
There's no need to add a new test confirming that this
works; t4020 already contains a test that sets
diff.color.external.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* tr/userdiff-c-returns-pointer:
userdiff: allow * between cpp funcname words
|
|
The cpp pattern, used for C and C++, would not match the start of a
declaration such as
static char *prepare_index(int argc,
because it did not allow for * anywhere between the various words that
constitute the modifiers, type and function name. Fix it.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
MATLAB is often used in industry and academia for scientific
computations motivating it being included as a built-in pattern.
Signed-off-by: Gustaf Hendeby <hendeby@isy.liu.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Suggested by: Junio Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jk/combine-diff-binary-etc:
combine-diff: respect textconv attributes
refactor get_textconv to not require diff_filespec
combine-diff: handle binary files as binary
combine-diff: calculate mode_differs earlier
combine-diff: split header printing into its own function
|
|
This function actually does two things:
1. Load the userdiff driver for the filespec.
2. Decide whether the driver has a textconv component, and
initialize the textconv cache if applicable.
Only part (1) requires the filespec object, and some callers
may not have a filespec at all. So let's split them it into
two functions, and put part (2) with the userdiff code,
which is a better fit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A naive method of treating BEGIN/END blocks with a brace on the second
line as diff/grep funcname context involves also matching unrelated
lines that consist of all-caps letters:
sub foo {
print <<'EOF'
text goes here
...
EOF
... rest of foo ...
}
That's not so great, because it means that "git diff" and "git grep
--show-function" would write "=EOF" or "@@ EOF" as context instead of
a more useful reminder like "@@ sub foo {".
To avoid this, tighten the pattern to only match the special block
names that perl accepts (namely BEGIN, END, INIT, CHECK, UNITCHECK,
AUTOLOAD, and DESTROY). The list is taken from perl's toke.c.
Suggested-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Accept
sub foo
{
}
as an alternative to a more common style that introduces perl
functions with a brace on the first line (and likewise for BEGIN/END
blocks). The new regex is a little hairy to avoid matching
# forward declaration
sub foo;
while continuing to match "sub foo($;@) {" and
sub foo { # This routine is interesting;
# in fact, the lines below explain how...
While at it, pay attention to Perl 5.14's "package foo {" syntax as an
alternative to the traditional "package foo;".
Requested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The builtin perl userdiff driver is not greedy enough about catching
POD header lines. Capture the whole line, so instead of just
declaring that we are in some "@@ =head1" section, diff/grep output
can explain that the enclosing section is about "@@ =head1 OPTIONS".
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The userdiff funcname mechanism has no concept of nested scopes ---
instead, "git diff" and "git grep --show-function" simply label the
diff header with the most recent matching line. Unfortunately that
means text following a subroutine in a POD section:
=head1 DESCRIPTION
You might use this facility like so:
sub example {
foo;
}
Now, having said that, let's say more about the facility.
Blah blah blah ... etc etc.
gets the subroutine name instead of the POD header in its diff/grep
funcname header, making it harder to get oriented when reading a
diff without enough context.
The fix is simple: anchor the funcname syntax to the left margin so
nested subroutines and packages like this won't get picked up. (The
builtin C++ funcname pattern already does the same thing.) This means
the userdiff driver will misparse the idiom
{
my $static;
sub foo {
... use $static ...
}
}
but I think that's worth it; we can revisit this later if the userdiff
mechanism learns to keep track of the beginning and end of nested
scopes.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* tr/diff-words-test:
t4034 (diff --word-diff): add a minimum Perl drier test vector
t4034 (diff --word-diff): style suggestions
userdiff: simplify word-diff safeguard
t4034: bulk verify builtin word regex sanity
|
|
* as/userdiff-pascal:
userdiff: match Pascal class methods
|
|
git's diff-words support has a detail that can be a little dangerous:
any text not matched by a given language's tokenization pattern is
treated as whitespace and changes in such text would go unnoticed.
Therefore each of the built-in regexes allows a special token type
consisting of a single non-whitespace character [^[:space:]].
To make sure UTF-8 sequences remain human readable, the builtin
regexes also have a special token type for runs of bytes with the high
bit set. In English, non-ASCII characters are usually isolated so
this is analogous to the [^[:space:]] pattern, except it matches a
single _multibyte_ character despite use of the C locale.
Unfortunately it is easy to make typos or forget entirely to include
these catch-all token types when adding support for new languages (see
v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes,
2010-12-18). Avoid this by including them automatically within the
PATTERNS and IPATTERN macros.
While at it, change the UTF-8 sequence token type to match exactly one
non-ASCII multi-byte character, rather than an arbitrary run of them.
Suggested-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Class declarations were already covered by the second pattern, but class
methods have the 'class' keyword in front too. Account for it.
Signed-off-by: Alexey Shumkin <zapped@mail.ru>
Acked-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The default function name discovery already works quite well for Perl
code... with the exception of here-documents (or rather their ending).
sub foo {
print <<END
here-document
END
return 1;
}
The default funcname pattern treats the unindented END line as a
function declaration and puts it in the @@ line of diff and "grep
--show-function" output.
With a little knowledge of perl syntax, we can do better. You can
try it out by adding "*.perl diff=perl" to the gitattributes file.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Both had an unclosed ] that ruined the safeguard against not matching
a non-space char.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This adds fortran xfuncname and wordRegex patterns to the list of builtin
patterns. The intention is for the patterns to be appropriate for all
versions of fortran including 77, 90, 95. The patterns can be enabled by
adding the diff=fortran attribute to the .gitattributes file for the
desired file glob.
This also adds a new macro named IPATTERN which is just like the PATTERNS
macro except it sets the REG_ICASE flag so that case will be ignored.
The test code in t4018 and the docs were updated as appropriate.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add userdiff patterns for C#. This code is an improved version of
code by Adam Petaccia from 21 June 2009 mail to the list.
Signed-off-by: Petr Onderka <gsvick@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* bs/userdiff-php:
diff: Support visibility modifiers in the PHP hunk header regexp
|
|
Starting with PHP5, class methods can have a visibility modifier, which
caused the methods not to be matched by the existing regexp, so extend
the regexp to match those modifiers. And while we're at it, allow the
"static" modifier as well.
Since the "static" modifier can appear either before or after the
visibility modifier, let's just allow any number of modifiers to appear
in any order, as that simplifies the regexp and shouldn't cause any
false positives.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Running a textconv filter can take a long time. It's
particularly bad for a large file which needs to be spooled
to disk, but even for small files, the fork+exec overhead
can add up for something like "git log -p".
This patch uses the notes-cache mechanism to keep a fast
cache of textconv output. Caches are stored in
refs/notes/textconv/$x, where $x is the userdiff driver
defined in gitattributes.
Caching is enabled only if diff.$x.cachetextconv is true.
In my test repo, on a commit with 45 jpg and avi files
changed and a textconv to show their exif tags:
[before]
$ time git show >/dev/null
real 0m13.724s
user 0m12.057s
sys 0m1.624s
[after, first run]
$ git config diff.mfo.cachetextconv true
$ time git show >/dev/null
real 0m14.252s
user 0m12.197s
sys 0m1.800s
[after, subsequent runs]
$ time git show >/dev/null
real 0m0.352s
user 0m0.148s
sys 0m0.200s
So for a slight (3.8%) cost on the first run, we achieve an
almost 40x speed up on subsequent runs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function took (name, namelen) as its arguments, but all the public
callers wanted to pass a full string.
Demote the counted-string interface to an internal API status, and allow
public callers to just pass the string to the function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the old regex
^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
you can backtrack arbitrarily from [A-Za-z_0-9]* into [A-Za-z_], thus
causing an exponential number of backtracks. Ironically it also causes
the regex not to work as intended; for example "catch" can match the
underlined part of the regex, the first repetition matching "c" and
the second matching "atch".
The replacement regex avoids this problem, because it makes sure that
at least a space/tab is eaten on each repetition. In other words,
a suffix of a repetition can never be a prefix of the next repetition.
Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Use "wordRegex" for configuration variable names. Use "word_regex" for C
language tokens.
Signed-off-by: Boyd Stephen Smith Jr. <bss@iguanasuicide.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute. The user can then set the
driver on a file in .gitattributes. If a regex is given on the
command line, it overrides the driver's setting.
We also provide built-in regexes for the languages that already had
funcname patterns, and add an appropriate diff driver entry for C/++.
(The patterns are designed to run UTF-8 sequences into a single chunk
to make sure they remain readable.)
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Diffs that have been produced with textconv almost certainly
cannot be applied, so we want to be careful not to generate
them in things like format-patch.
This introduces a new diff options, ALLOW_TEXTCONV, which
controls this behavior. It is off by default, but is
explicitly turned on for the "log" family of commands, as
well as the "diff" porcelain (but not diff-* plumbing).
Because both text conversion and external diffing are
controlled by these diff options, we can get rid of the
"plumbing versus porcelain" distinction when reading the
config. This was an attempt to control the same thing, but
suffered from being too coarse-grained.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When diffing binary files, it is sometimes nice to see the
differences of a canonical text form rather than either a
binary patch or simply "binary files differ."
Until now, the only option for doing this was to define an
external diff command to perform the diff. This was a lot of
work, since the external command needed to take care of
doing the diff itself (including mode changes), and lost the
benefit of git's colorization and other options.
This patch adds a text conversion option, which converts a
file to its canonical format before performing the diff.
This is less flexible than an arbitrary external diff, but
is much less work to set up. For example:
$ echo '*.jpg diff=exif' >>.gitattributes
$ git config diff.exif.textconv exiftool
$ git config diff.exif.binary false
allows one to see jpg diffs represented by the text output
of exiftool.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
The "diff" gitattribute is somewhat overloaded right now. It
can say one of three things:
1. this file is definitely binary, or definitely not
(i.e., diff or !diff)
2. this file should use an external diff engine (i.e.,
diff=foo, diff.foo.command = custom-script)
3. this file should use particular funcname patterns
(i.e., diff=foo, diff.foo.(x?)funcname = some-regex)
Most of the time, there is no conflict between these uses,
since using one implies that the other is irrelevant (e.g.,
an external diff engine will decide for itself whether the
file is binary).
However, there is at least one conflicting situation: there
is no way to say "use the regular rules to determine whether
this file is binary, but if we do diff it textually, use
this funcname pattern." That is, currently setting diff=foo
indicates that the file is definitely text.
This patch introduces a "binary" config option for a diff
driver, so that one can explicitly set diff.foo.binary. We
default this value to "don't know". That is, setting a diff
attribute to "foo" and using "diff.foo.funcname" will have
no effect on the binaryness of a file. To get the current
behavior, one can set diff.foo.binary to true.
This patch also has one additional advantage: it cleans up
the interface to the userdiff code a bit. Before, calling
code had to know more about whether attributes were false,
true, or unset to determine binaryness. Now that binaryness
is a property of a driver, we can represent these situations
just by passing back a driver struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
Both sets of code assume that one specifies a diff profile
as a gitattribute via the "diff=foo" attribute. They then
pull information about that profile from the config as
diff.foo.*.
The code for each is currently completely separate from the
other, which has several disadvantages:
- there is duplication as we maintain code to create and
search the separate lists of external drivers and
funcname patterns
- it is difficult to add new profile options, since it is
unclear where they should go
- the code is difficult to follow, as we rely on the
"check if this file is binary" code to find the funcname
pattern as a side effect. This is the first step in
refactoring the binary-checking code.
This patch factors out these diff profiles into "userdiff"
drivers. A file with "diff=foo" uses the "foo" driver, which
is specified by a single struct.
Note that one major difference between the two pieces of
code is that the funcname patterns are always loaded,
whereas external drivers are loaded only for the "git diff"
porcelain; the new code takes care to retain that situation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|