| Age | Commit message (Collapse) | Author | Files | Lines |
|
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.
* ps/use-the-repository:
hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
t/helper: remove dependency on `the_repository` in "proc-receive"
t/helper: fix segfault in "oid-array" command without repository
t/helper: use correct object hash in partial-clone helper
compat/fsmonitor: fix socket path in networked SHA256 repos
replace-object: use hash algorithm from passed-in repository
protocol-caps: use hash algorithm from passed-in repository
oidset: pass hash algorithm when parsing file
http-fetch: don't crash when parsing packfile without a repo
hash-ll: merge with "hash.h"
refs: avoid include cycle with "repository.h"
global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
hash: require hash algorithm in `empty_tree_oid_hex()`
hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
hash: make `is_null_oid()` independent of `the_repository`
hash: convert `oidcmp()` and `oideq()` to compare whole hash
global: ensure that object IDs are always padded
hash: require hash algorithm in `oidread()` and `oidclr()`
hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
|
|
Memory sanitizer (msan) is detecting a use of an uninitialized variable
(`size`) in `read_attr_from_index`:
==2268==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x5651f3416504 in read_attr_from_index git/attr.c:868:11
#1 0x5651f3415530 in read_attr git/attr.c
#2 0x5651f3413d74 in bootstrap_attr_stack git/attr.c:968:6
#3 0x5651f3413d74 in prepare_attr_stack git/attr.c:1004:2
#4 0x5651f3413d74 in collect_some_attrs git/attr.c:1199:2
#5 0x5651f3413144 in git_check_attr git/attr.c:1345:2
#6 0x5651f34728da in convert_attrs git/convert.c:1320:2
#7 0x5651f3473425 in would_convert_to_git_filter_fd git/convert.c:1373:2
#8 0x5651f357a35e in index_fd git/object-file.c:2630:34
#9 0x5651f357aa15 in index_path git/object-file.c:2657:7
#10 0x5651f35db9d9 in add_to_index git/read-cache.c:766:7
#11 0x5651f35dc170 in add_file_to_index git/read-cache.c:799:9
#12 0x5651f321f9b2 in add_files git/builtin/add.c:346:7
#13 0x5651f321f9b2 in cmd_add git/builtin/add.c:565:18
#14 0x5651f321d327 in run_builtin git/git.c:474:11
#15 0x5651f321bc9e in handle_builtin git/git.c:729:3
#16 0x5651f321a792 in run_argv git/git.c:793:4
#17 0x5651f321a792 in cmd_main git/git.c:928:19
#18 0x5651f33dde1f in main git/common-main.c:62:11
The issue exists because `size` is an output parameter from
`read_blob_data_from_index`, but it's only modified if
`read_blob_data_from_index` returns non-NULL. The read of `size` when
calling `read_attr_from_buf` unconditionally may read from an
uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
before reading from `size`, but by then it's already too late: the
uninitialized read will have happened already. Furthermore, there's no
guarantee that the compiler won't reorder things so that it checks
`size` before checking `!buf`.
Make the call to `read_attr_from_buf` conditional on `buf` being
non-NULL, ensuring that `size` is not read if it's never set.
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Leakfixes.
* ps/leakfixes:
builtin/mv: fix leaks for submodule gitfile paths
builtin/mv: refactor to use `struct strvec`
builtin/mv duplicate string list memory
builtin/mv: refactor `add_slash()` to always return allocated strings
strvec: add functions to replace and remove strings
submodule: fix leaking memory for submodule entries
commit-reach: fix memory leak in `ahead_behind()`
builtin/credential: clear credential before exit
config: plug various memory leaks
config: clarify memory ownership in `git_config_string()`
builtin/log: stop using globals for format config
builtin/log: stop using globals for log config
convert: refactor code to clarify ownership of check_roundtrip_encoding
diff: refactor code to clarify memory ownership of prefixes
config: clarify memory ownership in `git_config_pathname()`
http: refactor code to clarify memory ownership
checkout: clarify memory ownership in `unique_tracking_name()`
strbuf: fix leak when `appendwholeline()` fails with EOF
transport-helper: fix leaking helper name
|
|
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.
* ps/refs-without-the-repository-updates:
refs/packed: remove references to `the_hash_algo`
refs/files: remove references to `the_hash_algo`
refs/files: use correct repository
refs: remove `dwim_log()`
refs: drop `git_default_branch_name()`
refs: pass repo when peeling objects
refs: move object peeling into "object.c"
refs: pass ref store when detecting dangling symrefs
refs: convert iteration over replace refs to accept ref store
refs: retrieve worktree ref stores via associated repository
refs: refactor `resolve_gitlink_ref()` to accept a repository
refs: pass repo when retrieving submodule ref store
refs: track ref stores via strmap
refs: implement releasing ref storages
refs: rename `init_db` callback to avoid confusion
refs: adjust names for `init` and `init_db` callbacks
|
|
Before discovering the repository details, We used to assume SHA-1
as the "default" hash function, which has been corrected. Hopefully
this will smoke out codepaths that rely on such an unwarranted
assumptions.
* ps/undecided-is-not-necessarily-sha1:
repository: stop setting SHA1 as the default object hash
oss-fuzz/commit-graph: set up hash algorithm
builtin/shortlog: don't set up revisions without repo
builtin/diff: explicitly set hash algo when there is no repo
builtin/bundle: abort "verify" early when there is no repository
builtin/blame: don't access potentially unitialized `the_hash_algo`
builtin/rev-parse: allow shortening to more than 40 hex characters
remote-curl: fix parsing of detached SHA256 heads
attr: fix BUG() when parsing attrs outside of repo
attr: don't recompute default attribute source
parse-options-cb: only abbreviate hashes when hash algo is known
path: move `validate_headref()` to its only user
path: harden validation of HEAD with non-standard hashes
|
|
The out parameter of `git_config_string()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `resolve_gitlink_ref()` we implicitly rely on `the_repository` to
look up the submodule ref store. Now that we can look up submodule ref
stores for arbitrary repositories we can improve this function to
instead accept a repository as parameter for which we want to resolve
the gitlink.
Do so and adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git 2.43 started using the tree of HEAD as the source of attributes
in a bare repository, which has severe performance implications.
For now, revert the change, without ripping out a more explicit
support for the attr.tree configuration variable.
* jc/no-default-attr-tree-in-bare:
stop using HEAD for attributes in bare repository by default
|
|
If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are
set, then `compute_default_attr_source()` will try to look up the value
as a treeish. It is possible to hit that function while outside of a Git
repository though, for example when using `git grep --no-index`. In that
case, Git will hit a bug because we try to look up the main ref store
outside of a repository.
Handle the case gracefully and detect when we try to look up an attr
source without a repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `default_attr_source()` function lazily computes the attr source
supposedly once, only. This is done via a static variable `attr_source`
that contains the resolved object ID of the attr source's tree. If the
variable is the null object ID then we try to look up the attr source,
otherwise we skip over it.
This approach is flawed though: the variable will never be set to
anything else but the null object ID in case there is no attr source.
Consequently, we re-compute the information on every call. And in the
worst case, when we silently ignore bad trees, this will cause us to try
and look up the treeish every single time.
Improve this by introducing a separate variable `has_attr_source` to
track whether we already computed the attr source and, if so, whether we
have an attr source or not.
This also allows us to convert the `ignore_bad_attr_tree` to not be
static anymore as the code will only be executed once anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
ps/undecided-is-not-necessarily-sha1
* jc/no-default-attr-tree-in-bare:
stop using HEAD for attributes in bare repository by default
|
|
Commit 3c50032ff52 (attr: ignore overly large gitattributes files,
2022-12-01) added a defense-in-depth check to ensure that .gitattributes
blobs read from the index do not exceed ATTR_MAX_FILE_SIZE (100 MB).
But there were two cases added shortly after 3c50032ff52 was written
which do not apply similar protections:
- 47cfc9bd7d0 (attr: add flag `--source` to work with tree-ish,
2023-01-14)
- 4723ae1007f (attr.c: read attributes in a sparse directory,
2023-08-11) added a similar
Ensure that we refuse to process a .gitattributes blob exceeding
ATTR_MAX_FILE_SIZE when reading from either an arbitrary tree object or
a sparse directory. This is done by pushing the ATTR_MAX_FILE_SIZE check
down into the low-level `read_attr_from_buf()`.
In doing so, plug a leak in `read_attr_from_index()` where we would
accidentally leak the large buffer upon detecting it is too large to
process.
(Since `read_attr_from_buf()` handles a NULL buffer input, we can remove
a NULL check before calling it in `read_attr_from_index()` as well).
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With 23865355 (attr: read attributes from HEAD when bare repo,
2023-10-13), we started to use the HEAD tree as the default
attribute source in a bare repository. One argument for such a
behaviour is that it would make things like "git archive" run in
bare and non-bare repositories for the same commit consistent.
This changes was merged to Git 2.43 but without an explicit mention
in its release notes.
It turns out that this change destroys performance of shallowly
cloning from a bare repository. As the "server" installations are
expected to be mostly bare, and "git pack-objects", which is the
core of driving the other side of "git clone" and "git fetch" wants
to see if a path is set not to delta with blobs from other paths via
the attribute system, the change forces the server side to traverse
the tree of the HEAD commit needlessly to find if each and every
paths the objects it sends out has the attribute that controls the
deltification. Given that (1) most projects do not configure such
an attribute, and (2) it is dubious for the server side to honor
such an end-user supplied attribute anyway, this was a poor choice
of the default.
To mitigate the current situation, let's revert the change that uses
the tree of HEAD in a bare repository by default as the attribute
source. This will help most people who have been happy with the
behaviour of Git 2.42 and before.
Two things to note:
* If you are stuck with versions of Git 2.43 or newer, that is
older than the release this fix appears in, you can explicitly
set the attr.tree configuration variable to point at an empty
tree object, i.e.
$ git config attr.tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
* If you like the behaviour we are reverting, you can explicitly
set the attr.tree configuration variable to HEAD, i.e.
$ git config attr.tree HEAD
The right fix for this is to optimize the code paths that allow
accesses to attributes in tree objects, but that is a much more
involved change and is left as a longer-term project, outside the
scope of this "first step" fix.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Gives all paths builtin objectmode values based on the paths' modes
(one of 100644, 100755, 120000, 040000, 160000). Users may use
this feature to filter by file types. For example a pathspec such as
':(attr:builtin_objectmode=160000)' could filter for submodules without
needing to have `builtin_objectmode=160000` to be set in .gitattributes
for every submodule path.
These values are also reflected in `git check-attr` results.
If the git_attr_direction is set to GIT_ATTR_INDEX or GIT_ATTR_CHECKIN
and a path is not found in the index, the value will be unspecified.
This patch also reserves the builtin_* attribute namespace for objectmode
and any future builtin attributes. Any user defined attributes using this
reserved namespace will result in a warning. This is a breaking change for
any existing builtin_* attributes.
Pathspecs with some builtin_* attribute name (excluding builtin_objectmode)
will behave like any attribute where there are no user specified values.
Signed-off-by: Joanna Wang <jojwang@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The attribute subsystem learned to honor `attr.tree` configuration
that specifies which tree to read the .gitattributes files from.
* jc/attr-tree-config:
attr: add attr.tree for setting the treeish to read attributes from
attr: read attributes from HEAD when bare repo
|
|
44451a2 (attr: teach "--attr-source=<tree>" global option to "git",
2023-05-06) provided the ability to pass in a treeish as the attr
source. In the context of serving Git repositories as bare repos like we
do at GitLab however, it would be easier to point --attr-source to HEAD
for all commands by setting it once.
Add a new config attr.tree that allows this.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The motivation for 44451a2e5e (attr: teach "--attr-source=<tree>" global
option to "git" , 2023-05-06), was to make it possible to use
gitattributes with bare repositories.
To make it easier to read gitattributes in bare repositories however,
let's just make HEAD:.gitattributes the default. This is in line with
how mailmap works, 8c473cecfd (mailmap: default mailmap.blob in bare
repositories, 2012-12-13).
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The files config.{h,c} contain functions that have to do with parsing,
but not config.
In order to further reduce all-in-one headers, separate out functions in
config.c that do not operate on config into its own file, parse.h,
and update the include directives in the .c files that need only such
functions accordingly.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach "git check-attr" work better with sparse-index.
* sl/sparse-check-attr:
check-attr: integrate with sparse-index
attr.c: read attributes in a sparse directory
t1092: add tests for 'git check-attr'
|
|
Before this patch, git check-attr was unable to read the attributes from
a .gitattributes file within a sparse directory. The original comment
was operating under the assumption that users are only interested in
files or directories inside the cones. Therefore, in the original code,
in the case of a cone-mode sparse-checkout, we didn't load the
.gitattributes file.
However, this behavior can lead to missing attributes for files inside
sparse directories, causing inconsistencies in file handling.
To resolve this, revise 'git check-attr' to allow attribute reading for
files in sparse directories from the corresponding .gitattributes files:
1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse
to check if a path falls within a sparse directory.
2.If path is inside a sparse directory, employ the value of
index_name_pos_sparse() to find the sparse directory containing path and
path relative to sparse directory. Proceed to read attributes from the
tree OID of the sparse directory using read_attr_from_blob().
3.If path is not inside a sparse directory,ensure that attributes are
fetched from the index blob with read_blob_data_from_index().
Change the test 'check-attr with pathspec outside sparse definition' to
'test_expect_success' to reflect that the attributes inside a sparse
directory can now be read. Ensure that the sparse index case works
correctly for git check-attr to illustrate the successful handling of
attributes within sparse directories.
Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Further shuffling of declarations across header files to streamline
file dependencies.
* cw/compat-util-header-cleanup:
git-compat-util: move alloc macros to git-compat-util.h
treewide: remove unnecessary includes for wrapper.h
kwset: move translation table from ctype
sane-ctype.h: create header for sane-ctype macros
git-compat-util: move wrapper.c funcs to its header
git-compat-util: move strbuf.c funcs to its header
|
|
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add more "git var" for toolsmiths to learn various locations Git is
configured with either via the configuration or hardcoded defaults.
* bc/more-git-var:
var: add config file locations
var: add attributes files locations
attr: expose and rename accessor functions
var: adjust memory allocation for strings
var: format variable structure with C99 initializers
var: add support for listing the shell
t: add a function to check executable bit
var: mark unused parameters in git_var callbacks
|
|
Right now, the functions which determine the current system and global
gitattributes files are not exposed. We'd like to use them in a future
commit, but they're not ideally named. Rename them to something more
suitable as a public interface, expose them, and document them.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This also made it clear that several .c files that depended upon path.h
were missing a #include for it; add the missing includes while at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h. Also move some related inline
functions from cache.h to read-cache.h. The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git --attr-source=<tree> cmd $args" is a new way to have any
command to read attributes not from the working tree but from the
given tree object.
* jc/attr-source-tree:
attr: teach "--attr-source=<tree>" global option to "git"
|
|
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=<tree>" option to
allow it to read attribute files from a tree-ish, but did so only
for the command. Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.
Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=<tree>" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.
Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git check-attr" learned to take an optional tree-ish to read the
.gitattributes file from.
* kn/attr-from-tree:
attr: add flag `--source` to work with tree-ish
t0003: move setup for `--all` into new block
|
|
Code cleaning.
* rs/dup-array:
use DUP_ARRAY
add DUP_ARRAY
do full type check in BARF_UNLESS_COPYABLE
factor out BARF_UNLESS_COPYABLE
mingw: make argv2 in try_shell_exec() non-const
|
|
On platforms where `size_t` does not have the same width as `unsigned
long`, passing a pointer to the former when a pointer to the latter is
expected can lead to problems.
Windows and 32-bit Linux are among the affected platforms.
In this instance, we want to store the size of the blob that was read in
that variable. However, `read_blob_data_from_index()` passes that
pointer to `read_object_file()` which expects an `unsigned long *`.
Which means that on affected platforms, the variable is not fully
populated and part of its value is left uninitialized. (On Big-Endian
platforms, this problem would be even worse.)
The consequence is that depending on the uninitialized memory's
contents, we may erroneously reject perfectly fine attributes.
Let's address this by passing a pointer to a variable of the expected
data type.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The contents of the .gitattributes files may evolve over time, but "git
check-attr" always checks attributes against them in the working tree
and/or in the index. It may be beneficial to optionally allow the users
to check attributes taken from a commit other than HEAD against paths.
Add a new flag `--source` which will allow users to check the
attributes against a commit (actually any tree-ish would do). When the
user uses this flag, we go through the stack of .gitattributes files but
instead of checking the current working tree and/or in the index, we
check the blobs from the provided tree-ish object. This allows the
command to also be used in bare repositories.
Since we use a tree-ish object, the user can pass "--source
HEAD:subdirectory" and all the attributes will be looked up as if
subdirectory was the root directory of the repository.
We cannot simply use the `<rev>:<path>` syntax without the `--source`
flag, similar to how it is used in `git show` because any non-flag
parameter before `--` is treated as an attribute and any parameter after
`--` is treated as a pathname.
The change involves creating a new function `read_attr_from_blob`, which
given the path reads the blob for the path against the provided source and
parses the attributes line by line. This function is plugged into
`read_attr()` function wherein we go through the stack of attributes
files.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Co-authored-by: toon@iotcl.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a semantic patch for replace ALLOC_ARRAY+COPY_ARRAY with DUP_ARRAY
to reduce code duplication and apply its results.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Similar as with the preceding commit, start ignoring gitattributes files
that are overly large to protect us against out-of-bounds reads and
writes caused by integer overflows. Unfortunately, we cannot just define
"overly large" in terms of any preexisting limits in the codebase.
Instead, we choose a very conservative limit of 100MB. This is plenty of
room for specifying gitattributes, and incidentally it is also the limit
for blob sizes for GitHub. While we don't want GitHub to dictate limits
here, it is still sensible to use this fact for an informed decision
given that it is hosting a huge set of repositories. Furthermore, over
at GitLab we scanned a subset of repositories for their root-level
attribute files. We found that 80% of them have a gitattributes file
smaller than 100kB, 99.99% have one smaller than 1MB, and only a single
repository had one that was almost 3MB in size. So enforcing a limit of
100MB seems to give us ample of headroom.
With this limit in place we can be reasonably sure that there is no easy
way to exploit the gitattributes file via integer overflows anymore.
Furthermore, it protects us against resource exhaustion caused by
allocating the in-memory data structures required to represent the
parsed attributes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are two different code paths to read gitattributes: once via a
file, and once via the index. These two paths used to behave differently
because when reading attributes from a file, we used fgets(3P) with a
buffer size of 2kB. Consequentially, we silently truncate line lengths
when lines are longer than that and will then parse the remainder of the
line as a new pattern. It goes without saying that this is entirely
unexpected, but it's even worse that the behaviour depends on how the
gitattributes are parsed.
While this is simply wrong, the silent truncation saves us with the
recently discovered vulnerabilities that can cause out-of-bound writes
or reads with unreasonably long lines due to integer overflows. As the
common path is to read gitattributes via the worktree file instead of
via the index, we can assume that any gitattributes file that had lines
longer than that is already broken anyway. So instead of lifting the
limit here, we can double down on it to fix the vulnerabilities.
Introduce an explicit line length limit of 2kB that is shared across all
paths that read attributes and ignore any line that hits this limit
while printing a warning.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When reading attributes from a file we use fgets(3P) with a buffer size
of 2048 bytes. This means that as soon as a line exceeds the buffer size
we split it up into multiple parts and parse each of them as a separate
pattern line. This is of course not what the user intended, and even
worse the behaviour is inconsistent with how we read attributes from the
index.
Fix this bug by converting the code to use `strbuf_getline()` instead.
This will indeed read in the whole line, which may theoretically lead to
an out-of-memory situation when the gitattributes file is huge. We're
about to reject any gitattributes files larger than 100MB in the next
commit though, which makes this less of a concern.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When parsing an attributes line, we need to allocate an array that holds
all attributes specified for the given file pattern. The calculation to
determine the number of bytes that need to be allocated was prone to an
overflow though when there was an unreasonable amount of attributes.
Harden the allocation by instead using the `st_` helper functions that
cause us to die when we hit an integer overflow.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Attributes have a field that tracks the position in the `all_attrs`
array they're stored inside. This field gets set via `hashmap_get_size`
when adding the attribute to the global map of attributes. But while the
field is of type `int`, the value returned by `hashmap_get_size` is an
`unsigned int`. It can thus happen that the value overflows, where we
would now dereference teh `all_attrs` array at an out-of-bounds value.
We do have a sanity check for this overflow via an assert that verifies
the index matches the new hashmap's size. But asserts are not a proper
mechanism to detect against any such overflows as they may not in fact
be compiled into production code.
Fix this by using an `unsigned int` to track the index and convert the
assert to a call `die()`.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `struct attr_stack` tracks the stack of all patterns together with
their attributes. When parsing a gitattributes file that has more than
2^31 such patterns though we may trigger multiple out-of-bounds reads on
64 bit platforms. This is because while the `num_matches` variable is an
unsigned integer, we always use a signed integer to iterate over them.
I have not been able to reproduce this issue due to memory constraints
on my systems. But despite the out-of-bounds reads, the worst thing that
can seemingly happen is to call free(3P) with a garbage pointer when
calling `attr_stack_free()`.
Fix this bug by using unsigned integers to iterate over the array. While
this makes the iteration somewhat awkward when iterating in reverse, it
is at least better than knowingly running into an out-of-bounds read.
While at it, convert the call to `ALLOC_GROW` to use `ALLOC_GROW_BY`
instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is possible to trigger an integer overflow when parsing attribute
names when there are more than 2^31 of them for a single pattern. This
can either lead to us dying due to trying to request too many bytes:
blob=$(perl -e 'print "f" . " a=" x 2147483649' | git hash-object -w --stdin)
git update-index --add --cacheinfo 100644,$blob,.gitattributes
git attr-check --all file
=================================================================
==1022==ERROR: AddressSanitizer: requested allocation size 0xfffffff800000032 (0xfffffff800001038 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0)
#0 0x7fd3efabf411 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
#1 0x5563a0a1e3d3 in xcalloc wrapper.c:150
#2 0x5563a058d005 in parse_attr_line attr.c:384
#3 0x5563a058e661 in handle_attr_line attr.c:660
#4 0x5563a058eddb in read_attr_from_index attr.c:769
#5 0x5563a058ef12 in read_attr attr.c:797
#6 0x5563a058f24c in bootstrap_attr_stack attr.c:867
#7 0x5563a058f4a3 in prepare_attr_stack attr.c:902
#8 0x5563a05905da in collect_some_attrs attr.c:1097
#9 0x5563a059093d in git_all_attrs attr.c:1128
#10 0x5563a02f636e in check_attr builtin/check-attr.c:67
#11 0x5563a02f6c12 in cmd_check_attr builtin/check-attr.c:183
#12 0x5563a02aa993 in run_builtin git.c:466
#13 0x5563a02ab397 in handle_builtin git.c:721
#14 0x5563a02abb2b in run_argv git.c:788
#15 0x5563a02ac991 in cmd_main git.c:926
#16 0x5563a05432bd in main common-main.c:57
#17 0x7fd3ef82228f (/usr/lib/libc.so.6+0x2328f)
==1022==HINT: if you don't care about these errors you may set allocator_may_return_null=1
SUMMARY: AddressSanitizer: allocation-size-too-big /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77 in __interceptor_calloc
==1022==ABORTING
Or, much worse, it can lead to an out-of-bounds write because we
underallocate and then memcpy(3P) into an array:
perl -e '
print "A " . "\rh="x2000000000;
print "\rh="x2000000000;
print "\rh="x294967294 . "\n"
' >.gitattributes
git add .gitattributes
git commit -am "evil attributes"
$ git clone --quiet /path/to/repo
=================================================================
==15062==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000002550 at pc 0x5555559884d5 bp 0x7fffffffbc60 sp 0x7fffffffbc58
WRITE of size 8 at 0x602000002550 thread T0
#0 0x5555559884d4 in parse_attr_line attr.c:393
#1 0x5555559884d4 in handle_attr_line attr.c:660
#2 0x555555988902 in read_attr_from_index attr.c:784
#3 0x555555988902 in read_attr_from_index attr.c:747
#4 0x555555988a1d in read_attr attr.c:800
#5 0x555555989b0c in bootstrap_attr_stack attr.c:882
#6 0x555555989b0c in prepare_attr_stack attr.c:917
#7 0x555555989b0c in collect_some_attrs attr.c:1112
#8 0x55555598b141 in git_check_attr attr.c:1126
#9 0x555555a13004 in convert_attrs convert.c:1311
#10 0x555555a95e04 in checkout_entry_ca entry.c:553
#11 0x555555d58bf6 in checkout_entry entry.h:42
#12 0x555555d58bf6 in check_updates unpack-trees.c:480
#13 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
#14 0x555555785ab7 in checkout builtin/clone.c:724
#15 0x555555785ab7 in cmd_clone builtin/clone.c:1384
#16 0x55555572443c in run_builtin git.c:466
#17 0x55555572443c in handle_builtin git.c:721
#18 0x555555727872 in run_argv git.c:788
#19 0x555555727872 in cmd_main git.c:926
#20 0x555555721fa0 in main common-main.c:57
#21 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308
#22 0x555555723f39 in _start (git+0x1cff39)
0x602000002552 is located 0 bytes to the right of 2-byte region [0x602000002550,0x602000002552) allocated by thread T0 here:
#0 0x7ffff768c037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x555555d7fff7 in xcalloc wrapper.c:150
#2 0x55555598815f in parse_attr_line attr.c:384
#3 0x55555598815f in handle_attr_line attr.c:660
#4 0x555555988902 in read_attr_from_index attr.c:784
#5 0x555555988902 in read_attr_from_index attr.c:747
#6 0x555555988a1d in read_attr attr.c:800
#7 0x555555989b0c in bootstrap_attr_stack attr.c:882
#8 0x555555989b0c in prepare_attr_stack attr.c:917
#9 0x555555989b0c in collect_some_attrs attr.c:1112
#10 0x55555598b141 in git_check_attr attr.c:1126
#11 0x555555a13004 in convert_attrs convert.c:1311
#12 0x555555a95e04 in checkout_entry_ca entry.c:553
#13 0x555555d58bf6 in checkout_entry entry.h:42
#14 0x555555d58bf6 in check_updates unpack-trees.c:480
#15 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
#16 0x555555785ab7 in checkout builtin/clone.c:724
#17 0x555555785ab7 in cmd_clone builtin/clone.c:1384
#18 0x55555572443c in run_builtin git.c:466
#19 0x55555572443c in handle_builtin git.c:721
#20 0x555555727872 in run_argv git.c:788
#21 0x555555727872 in cmd_main git.c:926
#22 0x555555721fa0 in main common-main.c:57
#23 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-buffer-overflow attr.c:393 in parse_attr_line
Shadow bytes around the buggy address:
0x0c047fff8450: fa fa 00 02 fa fa 00 07 fa fa fd fd fa fa 00 00
0x0c047fff8460: fa fa 02 fa fa fa fd fd fa fa 00 06 fa fa 05 fa
0x0c047fff8470: fa fa fd fd fa fa 00 02 fa fa 06 fa fa fa 05 fa
0x0c047fff8480: fa fa 07 fa fa fa fd fd fa fa 00 01 fa fa 00 02
0x0c047fff8490: fa fa 00 03 fa fa 00 fa fa fa 00 01 fa fa 00 03
=>0x0c047fff84a0: fa fa 00 01 fa fa 00 02 fa fa[02]fa fa fa fa fa
0x0c047fff84b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff84c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff84d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff84e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff84f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==15062==ABORTING
Fix this bug by using `size_t` instead to count the number of attributes
so that this value cannot reasonably overflow without running out of
memory before already.
Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is possible to trigger an integer overflow when parsing attribute
names that are longer than 2^31 bytes because we assign the result of
strlen(3P) to an `int` instead of to a `size_t`. This can lead to an
abort in vsnprintf(3P) with the following reproducer:
blob=$(perl -e 'print "A " . "B"x2147483648 . "\n"' | git hash-object -w --stdin)
git update-index --add --cacheinfo 100644,$blob,.gitattributes
git check-attr --all path
BUG: strbuf.c:400: your vsnprintf is broken (returned -1)
But furthermore, assuming that the attribute name is even longer than
that, it can cause us to silently truncate the attribute and thus lead
to wrong results.
Fix this integer overflow by using a `size_t` instead. This fixes the
silent truncation of attribute names, but it only partially fixes the
BUG we hit: even though the initial BUG is fixed, we can still hit a BUG
when parsing invalid attribute lines via `report_invalid_attr()`.
This is due to an underlying design issue in vsnprintf(3P) which only
knows to return an `int`, and thus it may always overflow with large
inputs. This issue is benign though: the worst that can happen is that
the error message is misreported to be either truncated or too long, but
due to the buffer being NUL terminated we wouldn't ever do an
out-of-bounds read here.
Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There is an out-of-bounds read possible when parsing gitattributes that
have an attribute that is 2^31+1 bytes long. This is caused due to an
integer overflow when we assign the result of strlen(3P) to an `int`,
where we use the wrapped-around value in a subsequent call to
memcpy(3P). The following code reproduces the issue:
blob=$(perl -e 'print "a" x 2147483649 . " attr"' | git hash-object -w --stdin)
git update-index --add --cacheinfo 100644,$blob,.gitattributes
git check-attr --all file
AddressSanitizer:DEADLYSIGNAL
=================================================================
==8451==ERROR: AddressSanitizer: SEGV on unknown address 0x7f93efa00800 (pc 0x7f94f1f8f082 bp 0x7ffddb59b3a0 sp 0x7ffddb59ab28 T0)
==8451==The signal is caused by a READ memory access.
#0 0x7f94f1f8f082 (/usr/lib/libc.so.6+0x176082)
#1 0x7f94f2047d9c in __interceptor_strspn /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:752
#2 0x560e190f7f26 in parse_attr_line attr.c:375
#3 0x560e190f9663 in handle_attr_line attr.c:660
#4 0x560e190f9ddd in read_attr_from_index attr.c:769
#5 0x560e190f9f14 in read_attr attr.c:797
#6 0x560e190fa24e in bootstrap_attr_stack attr.c:867
#7 0x560e190fa4a5 in prepare_attr_stack attr.c:902
#8 0x560e190fb5dc in collect_some_attrs attr.c:1097
#9 0x560e190fb93f in git_all_attrs attr.c:1128
#10 0x560e18e6136e in check_attr builtin/check-attr.c:67
#11 0x560e18e61c12 in cmd_check_attr builtin/check-attr.c:183
#12 0x560e18e15993 in run_builtin git.c:466
#13 0x560e18e16397 in handle_builtin git.c:721
#14 0x560e18e16b2b in run_argv git.c:788
#15 0x560e18e17991 in cmd_main git.c:926
#16 0x560e190ae2bd in main common-main.c:57
#17 0x7f94f1e3c28f (/usr/lib/libc.so.6+0x2328f)
#18 0x7f94f1e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#19 0x560e18e110e4 in _start ../sysdeps/x86_64/start.S:115
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/usr/lib/libc.so.6+0x176082)
==8451==ABORTING
Fix this bug by converting the variable to a `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function `git_attr_internal()` is called to upsert attributes into
the global map. And while all callers pass a `size_t`, the function
itself accepts an `int` as the attribute name's length. This can lead to
an integer overflow in case the attribute name is longer than `INT_MAX`.
Now this overflow seems harmless as the first thing we do is to call
`attr_name_valid()`, and that function only succeeds in case all chars
in the range of `namelen` match a certain small set of chars. We thus
can't do an out-of-bounds read as NUL is not part of that set and all
strings passed to this function are NUL-terminated. And furthermore, we
wouldn't ever read past the current attribute name anyway due to the
same reason. And if validation fails we will return early.
On the other hand it feels fragile to rely on this behaviour, even more
so given that we pass `namelen` to `FLEX_ALLOC_MEM()`. So let's instead
just do the correct thing here and accept a `size_t` as line length.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since its inception in d0bfd026a8 (Add basic infrastructure to assign
attributes to paths, 2007-04-12), the attribute code carries a little
bit of debug code that is conditionally compiled only when DEBUG_ATTR is
set. But since you have to know about it and make a special build of Git
to use it, it's not clear that it's helping anyone (and there are very
few mentions of it on the list over the years).
Meanwhile, it causes slight headaches. Since it's not built as part of a
regular compile, it's subject to bitrot. E.g., this was dealt with in
712efb1a42 (attr: make it build with DEBUG_ATTR again, 2013-01-15), and
it currently fails to build with DEVELOPER=1 since e810e06357 (attr:
tighten const correctness with git_attr and match_attr, 2017-01-27).
And it causes confusion with -Wunused-parameter; the "what" parameter of
fill_one() is unused in a normal build, but needed in a debug build.
Let's just get rid of this code (and the now-useless parameter).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.
* ab/unused-annotation:
git-compat-util.h: use "deprecated" for UNUSED variables
git-compat-util.h: use "UNUSED", not "UNUSED(var)"
|
|
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.
* jk/unused-annotation:
is_path_owned_by_current_uid(): mark "report" parameter as unused
run-command: mark unused async callback parameters
mark unused read_tree_recursive() callback parameters
hashmap: mark unused callback parameters
config: mark unused callback parameters
streaming: mark unused virtual method parameters
transport: mark bundle transport_options as unused
refs: mark unused virtual method parameters
refs: mark unused reflog callback parameters
refs: mark unused each_ref_fn parameters
git-compat-util: add UNUSED macro
|
|
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b240347543 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This field has not been used since the function was introduced in
b559263216 (exclude: split pathname matching code into a separate
function, 2012-10-15), though there was a brief period where it was
erroneously used and then reverted in ed4958477b (dir: fix pattern
matching on dirs, 2021-09-24) and 5ceb663e92 (dir: fix
directory-matching bug, 2021-11-02).
It's possible we'd eventually add a flag that makes it useful here, but
there are only a handful of callers. It would be easy to add back if
necessary, and in the meantime this makes the function interface less
misleading.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Hashmap comparison functions must conform to a particular callback
interface, but many don't use all of their parameters. Especially the
void cmp_data pointer, but some do not use keydata either (because they
can easily form a full struct to pass when doing lookups). Let's mark
these to make -Wunused-parameter happy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* ab/refs-various-fixes:
refs debug: add a wrapper for "read_symbolic_ref"
packed-backend: remove stub BUG(...) functions
misc *.c: use designated initializers for struct assignments
refs: use designated initializers for "struct ref_iterator_vtable"
refs: use designated initializers for "struct ref_storage_be"
|
|
Change a few miscellaneous non-designated initializer assignments to
use designated initializers.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
dir.h is included more than once
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix typos in documentation, code comments, and RelNotes which repeat
various words. In trivial cases, just delete the duplicated word and
rewrap text, if needed. Reword the affected sentence in
Documentation/RelNotes/1.8.4.txt for it to make sense.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.
* ds/sparse-index-protections: (47 commits)
name-hash: use expand_to_path()
sparse-index: expand_to_path()
name-hash: don't add directories to name_hash
revision: ensure full index
resolve-undo: ensure full index
read-cache: ensure full index
pathspec: ensure full index
merge-recursive: ensure full index
entry: ensure full index
dir: ensure full index
update-index: ensure full index
stash: ensure full index
rm: ensure full index
merge-index: ensure full index
ls-files: ensure full index
grep: ensure full index
fsck: ensure full index
difftool: ensure full index
commit: ensure full index
checkout: ensure full index
...
|
|
Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.
This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It does not make sense to make ".gitattributes", ".gitignore" and
".mailmap" symlinks, as they are supposed to be usable from the
object store (think: bare repositories where HEAD:.mailmap etc. are
used). When these files are symbolic links, we used to read the
contents of the files pointed by them by mistake, which has been
corrected.
* jk/open-dotgitx-with-nofollow:
mailmap: do not respect symlinks for in-tree .mailmap
exclude: do not respect symlinks for in-tree .gitignore
attr: do not respect symlinks for in-tree .gitattributes
exclude: add flags parameter to add_patterns()
attr: convert "macro_ok" into a flags field
add open_nofollow() helper
|
|
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The attributes system may sometimes read in-tree files from the
filesystem, and sometimes from the index. In the latter case, we do not
resolve symbolic links (and are not likely to ever start doing so).
Let's open filesystem links with O_NOFOLLOW so that the two cases behave
consistently.
As a bonus, this means that git will not follow such symlinks to read
and parse out-of-tree paths. In some cases this could have security
implications, as a malicious repository can cause Git to open and read
arbitrary files. It could already feed arbitrary content to the parser,
but in certain setups it might be able to exfiltrate data from those
paths (e.g., if an automated service operating on the malicious repo
reveals its stderr to an attacker).
Note that O_NOFOLLOW only prevents following links for the path itself,
not intermediate directories in the path. At first glance, it seems
like
ln -s /some/path in-repo
might still look at "in-repo/.gitattributes", following the symlink to
"/some/path/.gitattributes". However, if "in-repo" is a symbolic link,
then we know that it has no git paths below it, and will never look at
its .gitattributes file.
We will continue to support out-of-tree symbolic links (e.g., in
$GIT_DIR/info/attributes); this just affects in-tree links. When a
symbolic link is encountered, the contents are ignored and a warning is
printed. POSIX specifies ELOOP in this case, so the user would generally
see something like:
warning: unable to access '.gitattributes': Too many levels of symbolic links
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The attribute code can have a rather deep callstack, through
which we have to pass the "macro_ok" flag. In anticipation
of adding other flags, let's convert this to a generic
bit-field.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
hashmaps allocated on the stack can be initialized without a call to
hashmap_init() and in some cases makes the code a bit shorter. Convert
some callsites over to take advantage of this.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the documentation from Documentation/technical/api-gitattributes.txt
to attr.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Also documentation/technical/api-gitattributes.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up of the hashmap API, both users and implementation.
* ew/hashmap:
hashmap_entry: remove first member requirement from docs
hashmap: remove type arg from hashmap_{get,put,remove}_entry
OFFSETOF_VAR macro to simplify hashmap iterators
hashmap: introduce hashmap_free_entries
hashmap: hashmap_{put,remove} return hashmap_entry *
hashmap: use *_entry APIs for iteration
hashmap_cmp_fn takes hashmap_entry params
hashmap_get{,_from_hash} return "struct hashmap_entry *"
hashmap: use *_entry APIs to wrap container_of
hashmap_get_next returns "struct hashmap_entry *"
introduce container_of macro
hashmap_put takes "struct hashmap_entry *"
hashmap_remove takes "const struct hashmap_entry *"
hashmap_get takes "const struct hashmap_entry *"
hashmap_add takes "struct hashmap_entry *"
hashmap_get_next takes "const struct hashmap_entry *"
hashmap_entry_init takes "struct hashmap_entry *"
packfile: use hashmap_entry in delta_base_cache_entry
coccicheck: detect hashmap_entry.hash assignment
diff: use hashmap_entry_init on moved_entry.ent
|
|
Comments stating that "struct hashmap_entry" must be the first
member in a struct are no longer valid.
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since these macros already take a `keyvar' pointer of a known type,
we can rely on OFFSETOF_VAR to get the correct offset without
relying on non-portable `__typeof__' and `offsetof'.
Argument order is also rearranged, so `keyvar' and `member' are
sequential as they are used as: `keyvar->member'
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While we cannot rely on a `__typeof__' operator being portable
to use with `offsetof'; we can calculate the pointer offset
using an existing pointer and the address of a member using
pointer arithmetic for compilers without `__typeof__'.
This allows us to simplify usage of hashmap iterator macros
by not having to specify a type when a pointer of that type
is already given.
In the future, list iterator macros (e.g. list_for_each_entry)
may also be implemented using OFFSETOF_VAR to save hackers the
trouble of using container_of/list_entry macros and without
relying on non-portable `__typeof__'.
v3: use `__typeof__' to avoid clang warnings
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Inspired by list_for_each_entry in the Linux kernel.
Once again, these are somewhat compromised usability-wise
by compilers lacking __typeof__ support.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Another step in eliminating the requirement of hashmap_entry
being the first member of a struct.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash
or container_of as appropriate.
This is another step towards eliminating the requirement of
hashmap_entry being the first field in a struct.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is less error-prone than "void *" as the compiler now
detects invalid types being passed.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
C compilers do type checking to make life easier for us. So
rely on that and update all hashmap_entry_init callers to take
"struct hashmap_entry *" to avoid future bugs while improving
safety and readability.
Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list' makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!
It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.
This commit renames several methods defined in dir.h to make
more sense with the renamed 'struct exclude_list' to 'struct
pattern_list' and 'struct exclude' to 'struct path_pattern':
* last_exclude_matching() -> last_matching_pattern()
* parse_exclude() -> parse_path_pattern()
In addition, the word 'exclude' was replaced with 'pattern'
in the methods below:
* add_exclude_list()
* add_excludes_from_file_to_list()
* add_excludes_from_file()
* add_excludes_from_blob_to_list()
* add_exclude()
* clear_exclude_list()
A few methods with the word "exclude" remain. These will
be handled seperately. In particular, the method
"is_excluded()" is concretely about the .gitignore file
relative to a specific directory. This is the important
boundary between library and consumer: is_excluded() cares
about .gitignore, but is_excluded() calls
last_matching_pattern() to make that decision.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list' makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!
It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.
This commit replaces 'EXCL_FLAG_' to 'PATTERN_FLAG_' in the
names of the flags used on 'struct path_pattern'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In-code comment typofix.
* rd/attr.c-comment-typofix:
attr.c: ".gitattribute" -> ".gitattributes" (comments)
|
|
Correct misspelled ".gitattribute" in comments only, so no functional
change.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The assumption to work on the single "in-core index" instance has
been reduced from the library-ish part of the codebase.
* nd/the-index-final:
cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
read-cache.c: remove the_* from index_has_changes()
merge-recursive.c: remove implicit dependency on the_repository
merge-recursive.c: remove implicit dependency on the_index
sha1-name.c: remove implicit dependency on the_index
read-cache.c: replace update_index_if_able with repo_&
read-cache.c: kill read_index()
checkout: avoid the_index when possible
repository.c: replace hold_locked_index() with repo_hold_locked_index()
notes-utils.c: remove the_repository references
grep: use grep_opt->repo instead of explict repo argument
|
|
Asking "git check-attr" about a macro (e.g. "binary") on a specific
path did not work correctly, even though "git check-attr -a" listed
such a macro correctly. This has been corrected.
* jk/attr-macro-fix:
attr: do not mark queried macros as unset
|
|
By default, index compat macros are off from now on, because they
could hide the_index dependency.
Only those in builtin can use it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since 60a12722ac (attr: remove maybe-real, maybe-macro from git_attr,
2017-01-27), we will always mark an attribute macro (e.g., "binary")
that is specifically queried for as "unspecified", even though listing
_all_ attributes would display it at set. E.g.:
$ echo "* binary" >.gitattributes
$ git check-attr -a file
file: binary: set
file: diff: unset
file: merge: unset
file: text: unset
$ git check-attr binary file
file: binary: unspecified
The problem stems from an incorrect conversion of the optimization from
06a604e670 (attr: avoid heavy work when we know the specified attr is
not defined, 2014-12-28). There we tried in collect_some_attrs() to
avoid even looking at the attr_stack when the user has asked for "foo"
and we know that "foo" did not ever appear in any .gitattributes file.
It used a flag "maybe_real" in each attribute struct, where "real" meant
that the attribute appeared in an actual file (we have to make this
distinction because we also create an attribute struct for any names
that are being queried). But as explained in that commit message, the
meaning of "real" was tangled with some special cases around macros.
When 60a12722ac later refactored the macro code, it dropped maybe_real
entirely. This missed the fact that "maybe_real" could be unset for two
reasons: because of a macro, or because it was never found during
parsing. This had two results:
- the optimization in collect_some_attrs() ceased doing anything
meaningful, since it no longer kept track of "was it found during
parsing"
- worse, it actually kicked in when the caller _did_ ask about a macro
by name, causing us to mark it as unspecified
It should be possible to salvage this optimization, but let's start with
just removing the remnants. It hasn't been doing anything (except
creating bugs) since 60a12722ac, and nobody seems to have noticed the
performance regression. It's more important to fix the correctness
problem clearly first.
I've added two tests here. The second one actually shows off the bug.
The test of "check-attr -a" is not strictly necessary, but we currently
do not test attribute macros much, and the builtin "binary" not at all.
So this increases our general test coverage, as well as making sure we
didn't mess up this related case.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More _("i18n") markings.
* nd/i18n:
fsck: mark strings for translation
fsck: reduce word legos to help i18n
parse-options.c: mark more strings for translation
parse-options.c: turn some die() to BUG()
parse-options: replace opterror() with optname()
repack: mark more strings for translation
remote.c: mark messages for translation
remote.c: turn some error() or die() to BUG()
reflog: mark strings for translation
read-cache.c: add missing colon separators
read-cache.c: mark more strings for translation
read-cache.c: turn die("internal error") to BUG()
attr.c: mark more string for translation
archive.c: mark more strings for translation
alias.c: mark split_cmdline_strerror() strings for translation
git.c: mark more strings for translation
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
git_check_attr() returns always 0.
Remove all the error handling code of the callers, which is never executed.
Change git_check_attr() to be a void function.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since attr checking API now take the index, there's no need to set an
index in advance with this call. Most call sites are straightforward
because they either pass the_index or NULL (which defaults back to
the_index previously). There's only one suspicious call site in
unpack-trees.c where it sets a different index.
This code in unpack-trees is about to check out entries from the
new/temporary index after merging is done in it. The attributes will
be used by entry.c code to do crlf conversion if needed. entry.c now
respects struct checkout's istate field, and this field is correctly
set in unpack-trees.c, there should be no regression from this change.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the attr API take an index_state instead of assuming the_index in
attr code. All call sites are converted blindly to keep the patch
simple and retain current behavior. Individual call sites may receive
further updates to use the right index instead of the_index.
There is one ugly temporary workaround added in attr.c that needs some
more explanation.
Commit c24f3abace (apply: file commited with CRLF should roundtrip
diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
read the index at all. But what do you know, we read it anyway by
falling back to the_index. When "istate" from convert_to_git is now
propagated down to read_attr_from_array() we will hit segfault
somewhere inside read_blob_data_from_index.
The right way of dealing with this is to kill "use_index" variable and
only follow "istate" but at this stage we are not ready for that:
while most git_attr_set_direction() calls just passes the_index to be
assigned to use_index, unpack-trees passes a different one which is
used by entry.c code, which has no way to know what index to use if we
delete use_index. So this has to be done later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae55
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.
Let's just convert all remaining ones in one fell swoop.
This trick was performed by this invocation:
sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is more consistent with the project style. The majority of Git's
source files use dashes in preference to underscores in their file names.
Signed-off-by: Stefan Beller <sbeller@google.com>
|
|
This is to address concerns raised by ThreadSanitizer on the mailing list
about threaded unprotected R/W access to map.size with my previous "disallow
rehash" change (0607e10009ee4e37cb49b4cec8d28a9dda1656a4).
See:
https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.agren@gmail.com/
Add API to hashmap to disable item counting and thus automatic rehashing.
Also include API to later re-enable them.
When item counting is disabled, the map.size field is invalid. So to
prevent accidents, the field has been renamed and an accessor function
hashmap_get_size() has been added. All direct references to this
field have been been updated. And the name of the field changed
to map.private_size to communicate this.
Here is the relevant output from ThreadSanitizer showing the problem:
WARNING: ThreadSanitizer: data race (pid=10554)
Read of size 4 at 0x00000082d488 by thread T2 (mutexes: write M16):
#0 hashmap_add hashmap.c:209
#1 hash_dir_entry_with_parent_and_prefix name-hash.c:302
#2 handle_range_dir name-hash.c:347
#3 handle_range_1 name-hash.c:415
#4 lazy_dir_thread_proc name-hash.c:471
#5 <null> <null>
Previous write of size 4 at 0x00000082d488 by thread T1 (mutexes: write M31):
#0 hashmap_add hashmap.c:209
#1 hash_dir_entry_with_parent_and_prefix name-hash.c:302
#2 handle_range_dir name-hash.c:347
#3 handle_range_1 name-hash.c:415
#4 handle_range_dir name-hash.c:380
#5 handle_range_1 name-hash.c:415
#6 lazy_dir_thread_proc name-hash.c:471
#7 <null> <null>
Martin gives instructions for running TSan on test t3008 in this post:
https://public-inbox.org/git/CAN0heSoJDL9pWELD6ciLTmWf-a=oyxe4EXXOmCKvsG5MSuzxsA@mail.gmail.com/
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
MAke the code more readable and less error prone by avoiding the cast
of the compare function pointer in hashmap_init, but instead have the
correctly named void pointers to casted to the specific data structure.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When using the hashmap a common need is to have access to caller provided
data in the compare function. A couple of times we abuse the keydata field
to pass in the data needed. This happens for example in patch-ids.c.
This patch changes the function signature of the compare function
to have one more void pointer available. The pointer given for each
invocation of the compare function must be defined in the init function
of the hashmap and is just passed through.
Documentation of this new feature is deferred to a later patch.
This is a rather mechanical conversion, just adding the new pass-through
parameter. However while at it improve the naming of the fields of all
compare functions used by hashmaps by ensuring unused parameters are
prefixed with 'unused_' and naming the parameters what they are (instead
of 'unused' make it 'unused_keydata').
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
|
|
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
|
|
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When fopen() returns NULL, it could be because the given path does not
exist, but it could also be some other errors and the caller has to
check. Add a wrapper so we don't have to repeat the same error check
everywhere.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The pathspec mechanism is extended via the new
":(attr:eol=input)pattern/to/match" syntax to filter paths so that it
requires paths to not just match the given pattern but also have the
specified attrs attached for them to be chosen.
Based on a patch by Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the 'git_attr_set_direction()' up to be closer to the variables
that it modifies as well as a small formatting by renaming the variable
'new' to 'new_direction' so that it is more descriptive.
Update the comment about how 'direction' is used to read the state of
the world. It should be noted that callers of
'git_attr_set_direction()' should ensure that other threads are not
making calls into the attribute system until after the call to
'git_attr_set_direction()' completes. This function essentially acts as
reset button for the attribute system and should be handled with care.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Push the bare repository check into the 'read_attr()' function. This
avoids needing to have extra logic which creates an empty stack frame
when inside a bare repo as a similar bit of logic already exists in the
'read_attr()' function.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The last big hurdle towards a thread-safe API for the attribute system
is the reliance on a global attribute stack that is modified during each
call into the attribute system.
This patch removes this global stack and instead a stack is stored
locally in each attr_check instance. This opens up the opportunity for
future optimizations to customize the attribute stack for the attributes
that a particular attr_check struct is interested in.
One caveat with pushing the attribute stack into the attr_check
structure is that the attribute system now needs to keep track of all
active attr_check instances. Due to the direction mechanism the stack
needs to be dropped when the direction is switched. In order to ensure
correctness when the direction is changed the attribute system needs to
iterate through all active attr_check instances and drop each of their
stacks.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Whether or not a git attribute is real or a macro isn't a property of
the attribute but rather it depends on the attribute stack (which
.gitattribute files were read).
This patch removes the 'maybe_real' and 'maybe_macro' fields in a
git_attr and instead adds the 'macro' field to a attr_check_item. The
'macro' indicates (if non-NULL) that a particular attribute is a macro
for the given attribute stack. It's populated, through a quick scan of
the attribute stack, with the match_attr that corresponds to the macro's
definition. This way the attribute stack only needs to be scanned a
single time prior to attribute collection instead of each time a macro
needs to be expanded.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Currently there is a reliance on 'check_all_attr' which is a global
array of 'attr_check_item' items which is used to store the value of
each attribute during the collection process.
This patch eliminates this global and instead creates an array per
'attr_check' instance which is then used in the attribute collection
process. This brings the attribute system one step closer to being
thread-safe.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The current implementation of the attribute dictionary uses a custom
hashtable. This modernizes the dictionary by converting it to the builtin
'hashmap' structure.
Also, in order to enable a threaded API in the future add an
accompanying mutex which must be acquired prior to accessing the
dictionary of interned attributes.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert 'invalid_attr_name()' to 'attr_name_valid()' and use positive
logic for the return value. In addition create a helper function that
prints out an error message when an invalid attribute name is used.
We could later update the message to exactly spell out what the
rules for a good attribute name are, etc.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The old callchain used to take an array of attr_check_item items.
Instead pass the 'attr_check' container object to 'collect_some_attrs()'
and access the fields in the data structure directly.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since nobody uses the old API, make it file-scope static, and update
the documentation to describe the new API.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This updates the other two ways the attribute check is done via an
array of "struct attr_check_item" elements. These two niches
appear only in "git check-attr".
* The caller does not know offhand what attributes it wants to ask
about and cannot use attr_check_initl() to prepare the
attr_check structure.
* The caller may not know what attributes it wants to ask at all,
and instead wants to learn everything that the given path has.
Such a caller can call attr_check_alloc() to allocate an empty
attr_check, and then call attr_check_append() to add attribute names
one by one.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A common pattern to check N attributes for many paths is to
(1) prepare an array A of N attr_check_item items;
(2) call git_attr() to intern the N attribute names and fill A;
(3) repeatedly call git_check_attrs() for path with N and A;
A look-up for these N attributes for a single path P scans the
entire attr_stack, starting from the .git/info/attributes file and
then .gitattributes file in the directory the path P is in, going
upwards to find .gitattributes file found in parent directories.
An earlier commit 06a604e6 (attr: avoid heavy work when we know the
specified attr is not defined, 2014-12-28) tried to optimize out
this scanning for one trivial special case: when the attribute being
sought is known not to exist, we do not have to scan for it. While
this may be a cheap and effective heuristic, it would not work well
when N is (much) more than 1.
What we would want is a more customized way to skip irrelevant
entries in the attribute stack, and the definition of irrelevance
is tied to the set of attributes passed to git_check_attrs() call,
i.e. the set of attributes being sought. The data necessary for
this optimization needs to live alongside the set of attributes, but
a simple array of git_attr_check_elem simply does not have any place
for that.
Introduce "struct attr_check" that contains N, the number of
attributes being sought, and A, the array that holds N
attr_check_item items, and a function git_check_attr() that
takes a path P and this structure as its parameters. This structure
can later be extended to hold extra data necessary for optimization.
Also, to make it easier to write the first two steps in common
cases, introduce git_attr_check_initl() helper function, which takes
a NULL-terminated list of attribute names and initialize this
structure.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The traditional API to check attributes is to prepare an N-element
array of "struct git_attr_check" and pass N and the array to the
function "git_check_attr()" as arguments.
In preparation to revamp the API to pass a single structure, in
which these N elements are held, rename the type used for these
individual array elements to "struct attr_check_item" and rename
the function to "git_check_attrs()".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are too many repetitious "I have this new attr_stack element;
push it at the top of the stack" sequence. The new helper function
push_stack() gives us a way to express what is going on at these
places, and as a side effect, halves the number of times we mention
the attr_stack global variable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Full pattern must be quoted. So 'pat"t"ern attr' will give exactly
'pat"t"ern', not 'pattern'. Also clarify that leading whitespaces are
not part of the pattern and document comment syntax.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If any error is noticed after the match_attr structure is allocated,
we shouldn't just return NULL from this function.
Add a fail_return label that frees the allocated structure and
returns NULL, and consistently jump there when we want to return
NULL after cleaning up.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It holds an interned string, and git_attr_name() is a way to peek
into it. Make sure the involved pointer types are pointer-to-const.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The double-loop wants to do an early return immediately when one
matching macro is found. Eliminate the extra variable 'a' used for
that purpose and rewrite the "assign the found item to 'a' to make
it non-NULL and force the loop(s) to terminate" with a direct return
from there.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When 82dce998 (attr: more matching optimizations from .gitignore,
2012-10-15) changed a pointer to a string "*pattern" into an
embedded "struct pattern" in struct match_attr, it forgot to update
the comment that describes the structure.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The low-level attribute and gitignore code will try to look
in $GIT_DIR/info for any repo-level configuration files,
even if we have not actually determined that we are in a
repository (e.g., running "git grep --no-index"). In such a
case they end up looking for ".git/info/attributes", etc.
This is generally harmless, as such a file is unlikely to
exist outside of a repository, but it's still conceptually
the wrong thing to do.
Let's detect this situation explicitly and skip reading the
file (i.e., the same behavior we'd get if we were in a
repository and the file did not exist).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* ss/exc-flag-is-a-collection-of-bits:
dir: store EXC_FLAG_* values in unsigned integers
|
|
The values defined by the macro EXC_FLAG_* (1, 4, 8, 16) are stored
in fields of the structs "pattern" and "exclude", some functions
arguments and a local variable. None of these uses its most
significant bit in any special way and there is no good reason to
use a signed integer for them.
And while we're at it, document "flags" of "exclude" to explicitly
state the values it's supposed to take on.
Signed-off-by: Saurav Sachidanand <sauravsachidanand@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:
1. It automatically checks the array-size multiplication
for overflow.
2. It always uses sizeof(*array) for the element-size,
so that it can never go out of sync with the declared
type of the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
One of the most common uses of git_path() is to pass a
constant, like git_path("MERGE_MSG"). This has two
drawbacks:
1. The return value is a static buffer, and the lifetime
is dependent on other calls to git_path, etc.
2. There's no compile-time checking of the pathname. This
is OK for a one-off (after all, we have to spell it
correctly at least once), but many of these constant
strings appear throughout the code.
This patch introduces a series of functions to "memoize"
these strings, which are essentially globals for the
lifetime of the program. We compute the value once, take
ownership of the buffer, and return the cached value for
subsequent calls. cache.h provides a helper macro for
defining these functions as one-liners, and defines a few
common ones for global use.
Using a macro is a little bit gross, but it does nicely
document the purpose of the functions. If we need to touch
them all later (e.g., because we learned how to change the
git_dir variable at runtime, and need to invalidate all of
the stored values), it will be much easier to have the
complete list.
Note that the shared-global functions have separate, manual
declarations. We could do something clever with the macros
(e.g., expand it to a declaration in some places, and a
declaration _and_ a definition in path.c). But there aren't
that many, and it's probably better to stay away from
too-magical macros.
Likewise, if we abandon the C preprocessor in favor of
generating these with a script, we could get much fancier.
E.g., normalizing "FOO/BAR-BAZ" into "git_path_foo_bar_baz".
But the small amount of saved typing is probably not worth
the resulting confusion to readers who want to grep for the
function's definition.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up for xdg configuration path support.
* pt/xdg-config-path:
path.c: remove home_config_paths()
git-config: replace use of home_config_paths()
git-commit: replace use of home_config_paths()
credential-store.c: replace home_config_paths() with xdg_config_home()
dir.c: replace home_config_paths() with xdg_config_home()
attr.c: replace home_config_paths() with xdg_config_home()
path.c: implement xdg_config_home()
t0302: "unreadable" test needs POSIXPERM
t0302: test credential-store support for XDG_CONFIG_HOME
git-credential-store: support XDG_CONFIG_HOME
git-credential-store: support multiple credential files
|
|
Teach the codepaths that read .gitignore and .gitattributes files
that these files encoded in UTF-8 may have UTF-8 BOM marker at the
beginning; this makes it in line with what we do for configuration
files already.
* cn/bom-in-gitignore:
attr: skip UTF8 BOM at the beginning of the input file
config: use utf8_bom[] from utf.[ch] in git_parse_source()
utf8-bom: introduce skip_utf8_bom() helper
add_excludes_from_file: clarify the bom skipping logic
dir: allow a BOM at the beginning of exclude files
|
|
Code clean-up for xdg configuration path support.
* pt/xdg-config-path:
path.c: remove home_config_paths()
git-config: replace use of home_config_paths()
git-commit: replace use of home_config_paths()
credential-store.c: replace home_config_paths() with xdg_config_home()
dir.c: replace home_config_paths() with xdg_config_home()
attr.c: replace home_config_paths() with xdg_config_home()
path.c: implement xdg_config_home()
|
|
Since only the xdg attributes file path is required, simplify the code
by using xdg_config_home() instead of home_config_paths().
Signed-off-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach the codepaths that read .gitignore and .gitattributes files
that these files encoded in UTF-8 may have UTF-8 BOM marker at the
beginning; this makes it in line with what we do for configuration
files already.
* cn/bom-in-gitignore:
attr: skip UTF8 BOM at the beginning of the input file
config: use utf8_bom[] from utf.[ch] in git_parse_source()
utf8-bom: introduce skip_utf8_bom() helper
add_excludes_from_file: clarify the bom skipping logic
dir: allow a BOM at the beginning of exclude files
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If we have never seen attr 'X' in any .gitattributes file we have
examined so far, we can be sure that 'X' is not defined. So no need to
go over all the attr stack to look for attr 'X'. This is the purpose
behind this new field maybe_real.
This optimization breaks down if macros are involved because we can't
know for sure what macro would expand to 'X' at attr parsing time. But
if we go the pessimistic way and assume all macros are expanded, we hit
the builtin "binary" macro. At least the "diff" attr defined in this
macro will disable this optimization for git-grep. So we wait until
any attr lines _may_ reference to a macro before we turn this off.
In git.git, this reduces the number of fill_one() call for "git grep
abcdefghi" from ~5348 to 2955. The optimization stops when it reads
t/.gitattributes, which uses 'binary' macro. We could probably reduce
it further by limiting the 'binary' reference to t/ and subdirs only
in this case.
"git grep" is actually a good example to justify this patch. The
command checks "diff" attribute on every file. People usually don't
define this attribute. But they pay the attr lookup penalty anyway
without this patch, proportional to the number of attr lines they have
in repo.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Keep track of all recognized macros in the new "maybe_macro" field.
If this field is true, it _may_ be a macro (depending on what's in the
current attr stack). But if the field is false, it's definitely not a
macro, no need to go through the whole attr stack in macroexpand_one()
to search for one.
Without this, "git grep abcdefghi" on git.git hits the inner loop in
macroexpand_one() 2481 times. With this, it's 66 times.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reduce duplicated code between convert.c and attr.c.
* lf/read-blob-data-from-index:
convert.c: remove duplicate code
read_blob_data_from_index(): optionally return the size of blob data
attr.c: extract read_index_data() as read_blob_data_from_index()
|
|
This allows for optionally getting the size of the returned data and
will be used in a follow-up patch.
Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Extract the read_index_data() function from attr.c and move it to
read-cache.c; rename it to read_blob_data_from_index() and update
the function signature of it to align better with index/cache API
functions.
This allows for reusing the function in convert.c later.
Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A pattern "dir" (without trailing slash) in the attributes file
stopped matching a directory "dir" by mistake with an earlier change
that wanted to allow pattern "dir/" to also match.
* jc/directory-attrs-regression-fix:
t: check that a pattern without trailing slash matches a directory
dir.c::match_pathname(): pay attention to the length of string parameters
dir.c::match_pathname(): adjust patternlen when shifting pattern
dir.c::match_basename(): pay attention to the length of string parameters
attr.c::path_matches(): special case paths that end with a slash
attr.c::path_matches(): the basename is part of the pathname
|
|
Fix 1.8.1.x regression that stopped matching "dir" (without
trailing slash) to a directory "dir".
* jc/directory-attrs-regression-fix:
t: check that a pattern without trailing slash matches a directory
dir.c::match_pathname(): pay attention to the length of string parameters
dir.c::match_pathname(): adjust patternlen when shifting pattern
dir.c::match_basename(): pay attention to the length of string parameters
attr.c::path_matches(): special case paths that end with a slash
attr.c::path_matches(): the basename is part of the pathname
|
|
The function is given a string that ends with a slash to signal that
the path is a directory to make sure that a pattern that ends with a
slash (i.e. MUSTBEDIR) can tell directories and non-directories
apart. However, the pattern itself (pat->pattern and
pat->patternlen) that came from such a MUSTBEDIR pattern is
represented as a string that ends with a slash, but patternlen does
not count that trailing slash. A MUSTBEDIR pattern "element/" is
represented as a counted string <"element/", 7> and this must match
match pathname "element/".
Because match_basename() and match_pathname() want to see pathname
"element" to match against the pattern <"element/", 7>, reduce the
length of the path to exclude the trailing slash when calling
these functions.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function takes two strings (pathname and basename) as if they
are independent strings, but in reality, the latter is always
pointing into a substring in the former.
Clarify this relationship by expressing the latter as an offset into
the former.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
Before 82dce99 (attr: more matching optimizations from .gitignore,
2012-10-15), .gitattributes did not have any special treatment of a
leading '!'. The docs, however, always said
The rules how the pattern matches paths are the same as in
`.gitignore` files; see linkgit:gitignore[5].
By those rules, leading '!' means pattern negation. So 82dce99
correctly determined that this kind of line makes no sense and should
be disallowed.
However, users who actually had a rule for files starting with a '!'
are in a bad position: before 82dce99 '!' matched that literal
character, so it is conceivable that users have .gitattributes with
such lines in them. After 82dce99 the unescaped version was
disallowed in such a way that git outright refuses to run(!) most
commands in the presence of such a .gitattributes. It therefore
becomes very hard to fix, let alone work with, such repositories.
Let's at least allow the users to fix their repos: change the fatal
error into a warning.
Reported-by: mathstuf@gmail.com
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The attribute mechanism didn't allow limiting attributes to be
applied to only a single directory itself with "path/" like the
exclude mechanism does. The initial implementation of this that was
merged to 'maint' and 1.8.1.1 had severe performance degradations.
* nd/fix-directory-attrs-off-by-one:
attr: avoid calling find_basename() twice per path
attr: fix off-by-one directory component length calculation
|
|
* nd/attr-debug-fix:
attr: make it build with DEBUG_ATTR again
|
|
Fix performance regression introduced by an earlier change to let
attributes apply to directories.
Needs to be merged to maint, as 94bc671a was merged there already.
* nd/fix-directory-attrs-off-by-one:
attr: avoid calling find_basename() twice per path
attr: fix off-by-one directory component length calculation
|
|
Fix debugging support that was broken in earlier change.
* nd/attr-debug-fix:
attr: make it build with DEBUG_ATTR again
|
|
find_basename() is only used inside collect_all_attrs(), called once
in prepare_attr_stack, then again after prepare_attr_stack()
returns. Both calls return exact same value. Reorder the code to do
the same task once. Also avoid strlen() because we knows the length
after finding basename.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 82dce99 (attr: more matching optimizations from .gitignore -
2012-10-15) changed match_attr structure but it did not update
DEBUG_ATTR-specific code. This fixes it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
94bc671 (Add directory pattern matching to attributes - 2012-12-08)
uses find_basename() to calculate the length of directory part in
prepare_attr_stack. This function expects the directory without the
trailing slash (as "origin" field in match_attr struct is without the
trailing slash). find_basename() includes the trailing slash and
confuses push/pop algorithm.
Consider path = "abc/def" and the push down code:
while (1) {
len = strlen(attr_stack->origin);
if (dirlen <= len)
break;
cp = memchr(path + len + 1, '/', dirlen - len - 1);
if (!cp)
cp = path + dirlen;
dirlen is 4, not 3, without this patch. So when attr_stack->origin is
"abc", it'll miss the exit condition because 4 <= 3 is wrong. It'll
then try to push "abc/" down the attr stack (because "cp" would be
NULL). So we have both "abc" and "abc/" in the stack.
Next time when "abc/ghi" is checked, "abc/" is popped out because of
the off-by-one dirlen, only to be pushed back in again by the above
code. This repeats for all files in the same directory. Which means
at least one failed open syscall per file, or more if .gitattributes
exists.
This is the perf result with 10 runs on git.git:
Test 94bc671^ 94bc671 HEAD
----------------------------------------------------------------------------------------------------------
7810.1: grep worktree, cheap regex 0.02(0.01+0.04) 0.05(0.03+0.05) +150.0% 0.02(0.01+0.04) +0.0%
7810.2: grep worktree, expensive regex 0.25(0.94+0.01) 0.26(0.94+0.02) +4.0% 0.25(0.93+0.02) +0.0%
7810.3: grep --cached, cheap regex 0.11(0.10+0.00) 0.12(0.10+0.02) +9.1% 0.10(0.10+0.00) -9.1%
7810.4: grep --cached, expensive regex 0.61(0.60+0.01) 0.62(0.61+0.01) +1.6% 0.61(0.60+0.00) +0.0%
Reported-by: Ross Lagerwall <rosslagerwall@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor and generally clean up the directory traversal API
implementation.
* as/dir-c-cleanup:
dir.c: rename free_excludes() to clear_exclude_list()
dir.c: refactor is_path_excluded()
dir.c: refactor is_excluded()
dir.c: refactor is_excluded_from_list()
dir.c: rename excluded() to is_excluded()
dir.c: rename excluded_from_list() to is_excluded_from_list()
dir.c: rename path_excluded() to is_path_excluded()
dir.c: rename cryptic 'which' variable to more consistent name
Improve documentation and comments regarding directory traversal API
api-directory-listing.txt: update to match code
|
|
Continue adopting clearer names for exclude functions. This is_*
naming pattern for functions returning booleans was discussed here:
http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The manpage of gitattributes says: "The rules how the pattern
matches paths are the same as in .gitignore files" and the gitignore
pattern matching has a pattern ending with / for directory matching.
This rule is specifically relevant for the 'export-ignore' rule used
for git archive.
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Start laying the foundation to build the "wildmatch" after we can
agree on its desired semantics.
* nd/attr-match-optim-more:
attr: more matching optimizations from .gitignore
gitignore: make pattern parsing code a separate function
exclude: split pathname matching code into a separate function
exclude: fix a bug in prefix compare optimization
exclude: split basename matching code into a separate function
exclude: stricten a length check in EXC_FLAG_ENDSWITH case
|
|
Trivial and obvious optimization for finding attributes that match
a given path.
* nd/attr-match-optim:
attr: avoid searching for basename on every match
attr: avoid strlen() on every match
|
|
.gitattributes and .gitignore share the same pattern syntax but has
separate matching implementation. Over the years, ignore's
implementation accumulates more optimizations while attr's stays the
same.
This patch reuses the core matching functions that are also used by
excluded_from_list. excluded_from_list and path_matches can't be
merged due to differences in exclude and attr, for example:
* "!pattern" syntax is forbidden in .gitattributes. As an attribute
can be unset (i.e. set to a special value "false") or made back to
unspecified (i.e. not even set to "false"), "!pattern attr" is unclear
which one it means.
* we support attaching attributes to directories, but git-core
internally does not currently make use of attributes on
directories.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The attribute system may be asked for a path that itself or its
leading directories no longer exists in the working tree. Failure
to open per-directory .gitattributes with error status other than
ENOENT and ENOTDIR are diagnosed.
* jk/config-warn-on-inaccessible-paths:
attr: failure to open a .gitattributes file is OK with ENOTDIR
|
|
"git merge -Xtheirs" did not help content-level merge of binary
files; it should just take their version. Also "*.jpg binary" in
the attributes did not imply they should use the binary ll-merge
driver.
* jc/ll-merge-binary-ours:
ll-merge: warn about inability to merge binary files only when we can't
attr: "binary" attribute should choose built-in "binary" merge driver
merge: teach -Xours/-Xtheirs to binary ll-merge driver
|
|
Often we consult an in-tree .gitattributes file that exists per
directory. Majority of directories do not usually have such a file,
and it is perfectly fine if we cannot open it because there is no
such file, but we do want to know when there is an I/O or permission
error. Earlier, we made the codepath warn when we fail to open it
for reasons other than ENOENT for that reason.
We however sometimes have to attempt to open the .gitattributes file
from a directory that does not exist in the commit that is currently
checked out. "git pack-objects" wants to know if a path is marked
with "-delta" attributes, and "git archive" wants to know about
export-ignore and export-subst attributes. Both commands may and do
need to ask the attributes system about paths in an arbitrary
commit. "git diff", after removing an entire directory, may want to
know textconv on paths that used to be in that directory.
Make sure we also ignore a failure to open per-directory attributes
file due to ENOTDIR.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The built-in "binary" attribute macro expands to "-diff -text", so
that textual diff is not produced, and the contents will not go
through any CR/LF conversion ever. During a merge, it should also
choose the "binary" low-level merge driver, but it didn't.
Make it expand to "-diff -merge -text".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous series introduced warnings to multiple places, but it
could become tiring to see the warning on the same path over and
over again during a single run of Git. Making just one function
responsible for issuing this warning, we could later choose to keep
track of which paths we issued a warning (it would involve a hash
table of paths after running them through real_path() or something)
in order to reduce noise.
Right now we do not know if the noise reduction is necessary, but it
still would be a good code reduction/sharing anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Just like config and gitignore files, we silently ignore
missing or inaccessible attribute files. An existent but
inaccessible file is probably a configuration error, so
let's warn the user.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If we don't have a core.attributesfile configured, we fall
back to checking XDG config, which is usually
$HOME/.config/git/attributes.
However, if $HOME is unset, then home_config_paths will return
NULL, and we end up calling fopen(NULL).
Depending on your system, this may or may not cause the
accompanying test to fail (e.g., on Linux and glibc, the
address will go straight to open, which will return EFAULT).
However, valgrind will reliably notice the error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This gives the default value for the core.attributesfile variable
following the exact same logic of the previous change for the
core.excludesfile setting.
Signed-off-by: Huynh Khoi Nguyen Nguyen <Huynh-Khoi-Nguyen.Nguyen@ensimag.imag.fr>
Signed-off-by: Valentin Duperray <Valentin.Duperray@ensimag.imag.fr>
Signed-off-by: Franck Jonas <Franck.Jonas@ensimag.imag.fr>
Signed-off-by: Lucien Kong <Lucien.Kong@ensimag.imag.fr>
Signed-off-by: Thomas Nguy <Thomas.Nguy@ensimag.imag.fr>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-1.7.6:
attr: fix leak in free_attr_elem
t2203: fix wrong commit command
|
|
This function frees the individual "struct match_attr"s we
have allocated, but forgot to free the array holding their
pointers, leading to a minor memory leak (but it can add up
after checking attributes for paths in many directories).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Thas would de-dent the body of a function that has grown rather large over
time, making it a bit easier to read.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In prepare_attr_stack, we pop the old elements of the stack
(which were left from a previous lookup and may or may not
be useful to us). Our loop to do so checks that we never
reach the top of the stack. However, the code immediately
afterwards will segfault if we did actually reach the top of
the stack.
Fortunately, this is not an actual bug, since we will never
pop all of the stack elements (we will always keep the root
gitattributes, as well as the builtin ones). So the extra
check in the loop condition simply clutters the code and
makes the intent less clear. Let's get rid of it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we prepare the attribute stack for a lookup on a path,
we start with the cached stack from the previous lookup
(because it is common to do several lookups in the same
directory hierarchy). So the first thing we must do in
preparing the stack is to pop any entries that point to
directories we are no longer interested in.
For example, if our stack contains gitattributes for:
foo/bar/baz
foo/bar
foo
but we want to do a lookup in "foo/bar/bleep", then we want
to pop the top element, but retain the others.
To do this we walk down the stack from the top, popping
elements that do not match our lookup directory. However,
the test do this simply checked strncmp, meaning we would
mistake "foo/bar/baz" as a leading directory of
"foo/bar/baz_plus". We must also check that the character
after our match is '/', meaning we matched the whole path
component.
There are two special cases to consider:
1. The top of our attr stack has the empty path. So we
must not check for '/', but rather special-case the
empty path, which always matches.
2. Typically when matching paths in this way, you would
also need to check for a full string match (i.e., the
character after is '\0'). We don't need to do so in
this case, though, because our path string is actually
just the directory component of the path to a file
(i.e., we know that it terminates with "/", because the
filename comes after that).
Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When core.ignorecase is true, the file globs configured in the
.gitattributes file should be matched case-insensitively against the paths
in the working directory. Let's do so.
Plus, add some tests.
The last set of tests is performed only on a case-insensitive filesystem.
Those tests make sure that git handles the case where the .gitignore file
resides in a subdirectory and the user supplies a path that does not match
the case in the filesystem. In that case^H^H^H^Hsituation, part of the
path supplied by the user is effectively interpreted case-insensitively,
and part of it is dependent on the setting of core.ignorecase. git will
currently only match the portion of the path below the directory holding
the .gitignore file according to the setting of core.ignorecase.
This is also partly future-proofing. Currently, git builds the attr stack
based on the path supplied by the user, so we don't have to do anything
special (like use strcmp_icase) to handle the parts of that path that don't
match the filesystem with respect to case. If git instead built the attr
stack by scanning the repository, then the paths in the origin field would
not necessarily match the paths supplied by the user. If someone makes a
change like that in the future, these tests will notice.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This code calls git_config from a helper function to parse the config entry
it is interested in. Calling git_config in this way may cause a problem if
the helper function can be called after a previous call to git_config by
another function since the second call to git_config may reset some
variable to the value in the config file which was previously overridden.
The above is not a problem in this case since the function passed to
git_config only parses one config entry and the variable it sets is not
assigned outside of the parsing function. But a programmer who desires
all of the standard config options to be parsed may be tempted to modify
git_attr_config() so that it falls back to git_default_config() and then it
_would_ be vulnerable to the above described behavior.
So, move the call to git_config up into the top-level cmd_* function and
move the responsibility for parsing core.attributesfile into the main
config file parser.
Which is only the logical thing to do ;-)
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "x"-prefixed versions of strdup, malloc, etc. will check whether the
allocation was successful and terminate the process otherwise.
A few uses of malloc were left alone since they already implemented a
graceful path of failure or were in a quasi external library like xdiff.
Additionally, the call to malloc in compat/win32/syslog.c was not modified
since the syslog() implemented there is a die handler and a call to the
x-wrappers within a die handler could result in recursion should memory
allocation fail. This will have to be addressed separately.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This code sequence performs a strcpy into the buf member of a strbuf
struct. The strcpy may move the position of the terminating nul of the
string and effectively change the length of string so that it does not
match the len member of the strbuf struct.
Currently, this sequence works since the strbuf was given a hint when it
was initialized to allocate enough space to accomodate the string that will
be strcpy'ed, but this is an implementation detail of strbufs, not a
guarantee.
So, lets rework this sequence so that the strbuf is only manipulated by
strbuf functions, and direct modification of its "buf" member is not
necessary.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* mh/attr:
Unroll the loop over passes
Change while loop into for loop
Determine the start of the states outside of the pass loop
Change parse_attr() to take a pointer to struct attr_state
Increment num_attr in parse_attr_line(), not parse_attr()
Document struct match_attr
Add a file comment
|
|
The passes no longer share much code, and the unrolled code is easier
to understand.
Use a new index variable instead of num_attr for the second loop, as
we are no longer counting attributes but rather indexing through them.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|