aboutsummaryrefslogtreecommitdiffstats
path: root/refs/files-backend.c
AgeCommit message (Collapse)AuthorFilesLines
11 daysMerge branch 'kn/maintenance-is-needed'Junio C Hamano1-0/+11
"git maintenance" command learned "is-needed" subcommand to tell if it is necessary to perform various maintenance tasks. * kn/maintenance-is-needed: maintenance: add 'is-needed' subcommand maintenance: add checking logic in `pack_refs_condition()` refs: add a `optimize_required` field to `struct ref_storage_be` reftable/stack: add function to check if optimization is required reftable/stack: return stack segments directly
13 daysMerge branch 'kn/refs-optim-cleanup'Junio C Hamano1-16/+6
Code clean-up. * kn/refs-optim-cleanup: t/pack-refs-tests: move the 'test_done' to callees refs: rename 'pack_refs_opts' to 'refs_optimize_opts' refs: move to using the '.optimize' functions
13 daysMerge branch 'ps/ref-peeled-tags'Junio C Hamano1-48/+23
Some ref backend storage can hold not just the object name of an annotated tag, but the object name of the object the tag points at. The code to handle this information has been streamlined. * ps/ref-peeled-tags: t7004: do not chdir around in the main process ref-filter: fix stale parsed objects ref-filter: parse objects on demand ref-filter: detect broken tags when dereferencing them refs: don't store peeled object IDs for invalid tags object: add flag to `peel_object()` to verify object type refs: drop infrastructure to peel via iterators refs: drop `current_ref_iter` hack builtin/show-ref: convert to use `reference_get_peeled_oid()` ref-filter: propagate peeled object ID upload-pack: convert to use `reference_get_peeled_oid()` refs: expose peeled object ID via the iterator refs: refactor reference status flags refs: fully reset `struct ref_iterator::ref` on iteration refs: introduce `.ref` field for the base iterator refs: introduce wrapper struct for `each_ref_fn`
2025-11-10refs: add a `optimize_required` field to `struct ref_storage_be`Karthik Nayak1-0/+11
To allow users of the refs namespace to check if the reference backend requires optimization, add a new field `optimize_required` field to `struct ref_storage_be`. This field is of type `optimize_required_fn` which is also introduced in this commit. Modify the debug, files, packed and reftable backend to implement this field. A following commit will expose this via 'git pack-refs' and 'git refs optimize'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06Merge branch 'pk/reflog-migrate-message-fix'Junio C Hamano1-1/+1
Message fix. * pk/reflog-migrate-message-fix: refs: add missing space in messages
2025-11-05refs: add missing space in messagesPeter Krefting1-1/+1
Signed-off-by: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04Merge branch 'kn/refs-optim-cleanup' into kn/maintenance-is-neededJunio C Hamano1-16/+6
* kn/refs-optim-cleanup: t/pack-refs-tests: move the 'test_done' to callees refs: rename 'pack_refs_opts' to 'refs_optimize_opts' refs: move to using the '.optimize' functions
2025-11-04Merge branch 'ps/ref-peeled-tags' into kn/maintenance-is-neededJunio C Hamano1-48/+23
* ps/ref-peeled-tags: (23 commits) t7004: do not chdir around in the main process ref-filter: fix stale parsed objects ref-filter: parse objects on demand ref-filter: detect broken tags when dereferencing them refs: don't store peeled object IDs for invalid tags object: add flag to `peel_object()` to verify object type refs: drop infrastructure to peel via iterators refs: drop `current_ref_iter` hack builtin/show-ref: convert to use `reference_get_peeled_oid()` ref-filter: propagate peeled object ID upload-pack: convert to use `reference_get_peeled_oid()` refs: expose peeled object ID via the iterator refs: refactor reference status flags refs: fully reset `struct ref_iterator::ref` on iteration refs: introduce `.ref` field for the base iterator refs: introduce wrapper struct for `each_ref_fn` builtin/repo: add progress meter for structure stats builtin/repo: add keyvalue and nul format for structure stats builtin/repo: add object counts in structure output builtin/repo: introduce structure subcommand ...
2025-11-04refs: rename 'pack_refs_opts' to 'refs_optimize_opts'Karthik Nayak1-5/+5
The previous commit removed all references to 'pack_refs()' within the refs subsystem. Continue this cleanup by also renaming 'pack_refs_opts' to 'refs_optimize_opts' and the respective flags accordingly. Keeping the naming consistent will make the code easier to maintain. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: move to using the '.optimize' functionsKarthik Nayak1-12/+2
The `struct ref_store` variable exposes two ways to optimize a reftable backend: 1. pack_refs 2. optimize The former was specific to the 'files' + 'packed' refs backend. The latter is more generic and covers all backends. While the naming is different, both of these functions perform the same functionality. Consolidate this code to only maintain the 'optimize' functions. Do this by modifying the backends so that they exclusively implement the `optimize` callback, only. All users of the refs subsystem already use the 'optimize' function so there is no changes needed on the callee side. Finally, cleanup all references to the 'pack_refs' field of the structure and code around it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: drop infrastructure to peel via iteratorsPatrick Steinhardt1-17/+0
Now that the peeled object ID gets propagated via the `struct reference` there is no need anymore to call into the reference iterator itself to dereference an object. Remove this infrastructure. Most of the changes are straight-forward deletions of code. There is one exception though in `refs/packed-backend.c::write_with_updates()`. Here we stop peeling the iterator and instead just pass the peeled object ID of that iterator directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce `.ref` field for the base iteratorPatrick Steinhardt1-26/+21
The base iterator has a couple of fields that tracks the name, target, object ID and flags for the current reference. Due to this design we have to create a new `struct reference` whenever we want to hand over that reference to the callback function, which is tedious and not very efficient. Convert the structure to instead contain a `struct reference` as member. This member is expected to be populated by the implementations of the iterator and is handed over to the callback directly. While at it, simplify `should_pack_ref()` to take a `struct reference` directly instead of passing its respective fields. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt1-5/+2
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30Merge branch 'ps/symlink-symref-deprecation'Junio C Hamano1-2/+17
"Symlink symref" has been added to the list of things that will disappear at Git 3.0 boundary. * ps/symlink-symref-deprecation: refs/files: deprecate writing symrefs as symbolic links
2025-10-26Merge branch 'js/unreachable-workaround-for-no-symlink-head' into maint-2.51Junio C Hamano1-1/+7
Code clean-up. * js/unreachable-workaround-for-no-symlink-head: refs: forbid clang to complain about unreachable code
2025-10-20Merge branch 'js/unreachable-workaround-for-no-symlink-head'Junio C Hamano1-1/+7
Code clean-up. * js/unreachable-workaround-for-no-symlink-head: refs: forbid clang to complain about unreachable code
2025-10-15Merge branch 'kn/refs-files-case-insensitive' into maint-2.51Junio C Hamano1-12/+66
Deal more gracefully with directory / file conflicts when the files backend is used for ref storage, by failing only the ones that are involved in the conflict while allowing others. * kn/refs-files-case-insensitive: refs/files: handle D/F conflicts during locking refs/files: handle F/D conflicts in case-insensitive FS refs/files: use correct error type when lock exists refs/files: catch conflicts on case-insensitive file-systems
2025-10-15Merge branch 'ps/reflog-migrate-fixes' into maint-2.51Junio C Hamano1-7/+58
"git refs migrate" to migrate the reflog entries from a refs backend to another had a handful of bugs squashed. * ps/reflog-migrate-fixes: refs: fix invalid old object IDs when migrating reflogs refs: stop unsetting REF_HAVE_OLD for log-only updates refs/files: detect race when generating reflog entry for HEAD refs: fix identity for migrated reflogs ident: fix type of string length parameter builtin/reflog: implement subcommand to write new entries refs: export `ref_transaction_update_reflog()` builtin/reflog: improve grouping of subcommands Documentation/git-reflog: convert to use synopsis type
2025-10-15refs/files: deprecate writing symrefs as symbolic linksPatrick Steinhardt1-2/+17
The "files" backend has the ability to store symbolic refs as symbolic links, which can be configured via "core.preferSymlinkRefs". This feature stems back from the early days: the initial implementation of symbolic refs used symlinks exclusively. The symref format was only introduced in 9b143c6e15 (Teach update-ref about a symbolic ref stored in a textfile., 2005-09-25) and made the default in 9f0bb90d16 (core.prefersymlinkrefs: use symlinks for .git/HEAD, 2006-05-02). This is all about 20 years ago, and there are no known reasons nowadays why one would want to use symlinks instead of symrefs. Mark the feature for deprecation in Git 3.0. Note that this only deprecates _writing_ symrefs as symbolic links. Reading such symrefs is still supported for now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-13Merge branch 'kn/reftable-consistency-checks'Junio C Hamano1-3/+0
The reftable backend learned to sanity check its on-disk data more carefully. * kn/reftable-consistency-checks: refs/reftable: add fsck check for checking the table name reftable: add code to facilitate consistency checks fsck: order 'fsck_msg_type' alphabetically Documentation/fsck-msgids: remove duplicate msg id reftable: check for trailing newline in 'tables.list' refs: move consistency check msg to generic layer refs: remove unused headers
2025-10-09refs: forbid clang to complain about unreachable codeJohannes Schindelin1-1/+7
When `NO_SYMLINK_HEAD` is defined, `create_ref_symlink()` is hard-coded as `(-1)`, and as a consequence the condition `!create_ref_symlink()` always evaluates to false, rendering any code guarded by that condition unreachable. Therefore, clang is _technically_ correct when it complains about unreachable code. It does completely miss the fact that this is okay because on _other_ platforms, where `NO_SYMLINK_HEAD` is not defined, the code isn't unreachable at all. Let's use the same trick as in 82e79c63642c (git-compat-util: add NOT_CONSTANT macro and use it in atfork_prepare(), 2025-03-17) to appease clang while at the same time keeping the `-Wunreachable` flag to potentially find _actually_ unreachable code. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-07refs: move consistency check msg to generic layerKarthik Nayak1-2/+0
The files-backend prints a message before the consistency checks run. Move this to the generic layer so both the files and reftable backend can benefit from this message. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-07refs: remove unused headersKarthik Nayak1-1/+0
In the 'refs/' namespace, some of the included header files are not needed, let's remove them. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-02Merge branch 'ms/refs-optimize'Junio C Hamano1-0/+10
"git refs optimize" is added for not very well explained reason despite it does the same thing as "git pack-refs"... * ms/refs-optimize: t: add test for git refs optimize subcommand t0601: refactor tests to be shareable builtin/refs: add optimize subcommand doc: pack-refs: factor out common options builtin/pack-refs: factor out core logic into a shared library builtin/pack-refs: convert to use the generic refs_optimize() API reftable-backend: implement 'optimize' action files-backend: implement 'optimize' action refs: add a generic 'optimize' API
2025-09-29Merge branch 'kn/refs-files-case-insensitive'Junio C Hamano1-12/+66
Deal more gracefully with directory / file conflicts when the files backend is used for ref storage, by failing only the ones that are involved in the conflict while allowing others. * kn/refs-files-case-insensitive: refs/files: handle D/F conflicts during locking refs/files: handle F/D conflicts in case-insensitive FS refs/files: use correct error type when lock exists refs/files: catch conflicts on case-insensitive file-systems
2025-09-19files-backend: implement 'optimize' actionMeet Soni1-0/+10
With the generic `refs_optimize()` API now in place, provide the first implementation for the 'files' reference backend. This makes the new API functional for existing repositories and serves as the foundation for migrating user-facing commands to the new architecture. The implementation simply calls the existing `files_pack_refs()` function, as 'packing' is the method used to optimize the files-based reference store. Wire up the new `files_optimize()` function to the `optimize` slot in the files backend's virtual table. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-17refs/files: handle D/F conflicts during lockingKarthik Nayak1-5/+6
The previous commit added the necessary validation and checks for F/D conflicts in the files backend when working on case insensitive systems. There is still a possibility for D/F conflicts. This is a different from the F/D since for F/D conflicts, there would not be a conflict during the lock creation phase: refs/heads/foo.lock refs/heads/foo/bar.lock However there would be a conflict when the locks are committed, since we cannot have 'refs/heads/foo/bar' and 'refs/heads/foo'. These kinds of conflicts are checked and resolved in `refs_verify_refnames_available()`, so the previous commit ensured that for case-insensitive filesystems, we would lowercase the inputs to that function. For D/F conflicts, there is a conflict during the lock creation phase itself: refs/heads/foo/bar.lock refs/heads/foo.lock As in `lock_raw_ref()` after creating the lock, we also check for D/F conflicts. This can occur in case-insensitive filesystems when trying to fetch case-conflicted references like: refs/heads/Foo/new refs/heads/foo D/F conflicts can also occur in case-sensitive filesystems, when the repository already contains a directory with a lock file 'refs/heads/foo/bar.lock' and trying to fetch 'refs/heads/foo'. This doesn't concern directories containing garbage files as those are handled on a higher level. To fix this, simply categorize the error as a name conflict. Also remove this reference from the list of valid refnames for availability checks. By categorizing the error and removing it from the list of valid references, batched updates now knows to reject such reference updates and apply the other reference updates. Fix a small typo in `ref_transaction_maybe_set_rejected()` while here. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-17refs/files: handle F/D conflicts in case-insensitive FSKarthik Nayak1-2/+17
When using the files-backend on case-insensitive filesystems, there is possibility of hitting F/D conflicts when creating references within a single transaction, such as: - 'refs/heads/foo' - 'refs/heads/Foo/bar' Ideally such conflicts are caught in `refs_verify_refnames_available()` which is responsible for checking F/D conflicts within a given transaction. This utility function is shared across the reference backends. As such, it doesn't consider the issues of using a case-insensitive file system, which only affects the files-backend. While one solution would be to make the function aware of such issues, this feels like leaking implementation details of file-backend specific issues into the utility function. So opt for the more simpler option, of lowercasing all references sent to this function when on a case-insensitive filesystem and operating on the files-backend. To do this, simply use a `struct strbuf` to convert the refname to lowercase and append it to the list of refnames to be checked. Since we use a `struct strbuf` and the memory is cleared right after, make sure that the string list duplicates all provided string. Without this change, the user would simply be left with a repository with '.lock' files which were created in the 'prepare' phase of the transaction, as the 'commit' phase would simply abort and not do the necessary cleanup. Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-17refs/files: use correct error type when lock existsKarthik Nayak1-3/+18
When fetching references into a repository, if a lock for a particular reference exists, then `lock_raw_ref()` throws: - REF_TRANSACTION_ERROR_CASE_CONFLICT: when there is a conflict because the transaction contains conflicting references while being on a case-insensitive filesystem. - REF_TRANSACTION_ERROR_GENERIC: for all other errors. The latter causes the entire set of batched updates to fail, even in case sensitive filessystems. Instead, return a 'REF_TRANSACTION_ERROR_CREATE_EXISTS' error. This allows batched updates to reject the individual update which conflicts with the existing file, while updating the rest of the references. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-17refs/files: catch conflicts on case-insensitive file-systemsKarthik Nayak1-5/+28
During the 'prepare' phase of a reference transaction in the files backend, we create the lock files for references to be created. When using batched updates on case-insensitive filesystems, the entire batched updates would be aborted if there are conflicting names such as: refs/heads/Foo refs/heads/foo This affects all commands which were migrated to use batched updates in Git 2.51, including 'git-fetch(1)' and 'git-receive-pack(1)'. Before that, reference updates would be applied serially with one transaction used per update. When users fetched multiple references on case-insensitive systems, subsequent references would simply overwrite any earlier references. So when fetching: refs/heads/foo: 5f34ec0bfeac225b1c854340257a65b106f70ea6 refs/heads/Foo: ec3053b0977e83d9b67fc32c4527a117953994f3 refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56 The user would simply end up with: refs/heads/foo: ec3053b0977e83d9b67fc32c4527a117953994f3 refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56 This is buggy behavior since the user is never informed about the overrides performed and missing references. Nevertheless, the user is left with a working repository with a subset of the references. Since Git 2.51, in such situations fetches would simply fail without updating any references. Which is also buggy behavior and worse off since the user is left without any references. The error is triggered in `lock_raw_ref()` where the files backend attempts to create a lock file. When a lock file already exists the function returns a 'REF_TRANSACTION_ERROR_GENERIC'. When this happens, the entire batched updates, not individual operation, is aborted as if it were in a transaction. Change this to return 'REF_TRANSACTION_ERROR_CASE_CONFLICT' instead to aid the batched update mechanism to simply reject such errors. The change only affects batched updates since batched updates will reject individual updates with non-generic errors. So specifically this would only affect: 1. git fetch 2. git receive-pack 3. git update-ref --batch-updates This bubbles the error type up to `files_transaction_prepare()` which tries to lock each reference update. So if the locking fails, we check if the rejection type can be ignored, which is done by calling `ref_transaction_maybe_set_rejected()`. As the error type is now 'REF_TRANSACTION_ERROR_CASE_CONFLICT', the specific reference update would simply be rejected, while other updates in the transaction would continue to be applied. This allows partial application of references in case-insensitive filesystems when fetching colliding references. While the earlier implementation allowed the last reference to be applied overriding the initial references, this change would allow the first reference to be applied while rejecting consequent collisions. This should be an okay compromise since with the files backend, there is no scenario possible where we would retain all colliding references. Let's also be more proactive and notify users on case-insensitive filesystems about such problems by providing a brief about the issue while also recommending using the reftable backend, which doesn't have the same issue. Reported-by: Joe Drew <joe.drew@indexexchange.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-29Merge branch 'jk/no-clobber-dangling-symref-with-fetch'Junio C Hamano1-4/+30
"git fetch" can clobber a symref that is dangling when the remote-tracking HEAD is set to auto update, which has been corrected. * jk/no-clobber-dangling-symref-with-fetch: refs: do not clobber dangling symrefs t5510: prefer "git -C" to subshell for followRemoteHEAD tests t5510: stop changing top-level working directory t5510: make confusing config cleanup more explicit
2025-08-21Merge branch 'ps/remote-rename-fix'Junio C Hamano1-6/+9
"git remote rename origin upstream" failed to move origin/HEAD to upstream/HEAD when origin/HEAD is unborn and performed other renames extremely inefficiently, which has been corrected. * ps/remote-rename-fix: builtin/remote: only iterate through refs that are to be renamed builtin/remote: rework how remote refs get renamed builtin/remote: determine whether refs need renaming early on builtin/remote: fix sign comparison warnings refs: simplify logic when migrating reflog entries refs: pass refname when invoking reflog entry callback
2025-08-21Merge branch 'ps/reflog-migrate-fixes'Junio C Hamano1-7/+58
"git refs migrate" to migrate the reflog entries from a refs backend to another had a handful of bugs squashed. * ps/reflog-migrate-fixes: refs: fix invalid old object IDs when migrating reflogs refs: stop unsetting REF_HAVE_OLD for log-only updates refs/files: detect race when generating reflog entry for HEAD refs: fix identity for migrated reflogs ident: fix type of string length parameter builtin/reflog: implement subcommand to write new entries refs: export `ref_transaction_update_reflog()` builtin/reflog: improve grouping of subcommands Documentation/git-reflog: convert to use synopsis type
2025-08-19refs: do not clobber dangling symrefsJeff King1-4/+30
When given an expected "before" state, the ref-writing code will avoid overwriting any ref that does not match that expected state. We use the null oid as a sentinel value for "nothing should exist", and likewise that is the sentinel value we get when trying to read a ref that does not exist. But there's one corner case where this is ambiguous: dangling symrefs. Trying to read them will yield the null oid, but there is potentially something of value there: the dangling symref itself. For a normal recursive write, this is OK. Imagine we have a symref "FOO_HEAD" that points to a ref "refs/heads/bar" that does not exist, and we try to write to it with a create operation like: oid=$(git rev-parse HEAD) ;# or whatever git symbolic-ref FOO_HEAD refs/heads/bar echo "create FOO_HEAD $oid" | git update-ref --stdin The attempt to resolve FOO_HEAD will actually resolve "bar", yielding the null oid. That matches our expectation, and the write proceeds. This is correct, because we are not writing FOO_HEAD at all, but writing its destination "bar", which in fact does not exist. But what if the operation asked not to dereference symrefs? Like this: echo "create FOO_HEAD $oid" | git update-ref --no-deref --stdin Resolving FOO_HEAD would still result in a null oid, and the write will proceed. But it will overwrite FOO_HEAD itself, removing the fact that it ever pointed to "bar". This case is a little esoteric; we are clobbering a symref with a no-deref write of a regular ref value. But the same problem occurs when writing symrefs. For example: echo "symref-create FOO_HEAD refs/heads/other" | git update-ref --no-deref --stdin The "create" operation asked us to create FOO_HEAD only if it did not exist. But we silently overwrite the existing value. You can trigger this without using update-ref via the fetch followRemoteHEAD code. In "create" mode, it should not overwrite an existing value. But if you manually create a symref pointing to a value that does not yet exist (either via symbolic-ref or with "remote add -m"), create mode will happily overwrite it. Instead, we should detect this case and refuse to write. The correct specification to overwrite FOO_HEAD in this case is to provide an expected target ref value, like: echo "symref-update FOO_HEAD refs/heads/other ref refs/heads/bar" | git update-ref --no-deref --stdin Note that the non-symref "update" directive does not allow you to do this (you can only specify an oid). This is a weakness in the update-ref interface, and you'd have to overwrite unconditionally, like: echo "update FOO_HEAD $oid" | git update-ref --no-deref --stdin Likewise other symref operations like symref-delete do not accept the "ref" keyword. You should be able to do: echo "symref-delete FOO_HEAD ref refs/heads/bar" but cannot (and can only delete unconditionally). This patch doesn't address those gaps. We may want to do so in a future patch for completeness, but it's not clear if anybody actually wants to perform those operations. The symref update case (specifically, via followRemoteHEAD) is what I ran into in the wild. The code for the fix is relatively straight-forward given the discussion above. But note that we have to implement it independently for the files and reftable backends. The "old oid" checks happen as part of the locking process, which is implemented separately for each system. We may want to factor this out somehow, but it's beyond the scope of this patch. (Another curiosity is that the messages in the reftable code are marked for translation, but the ones in the files backend are not. I followed local convention in each case, but we may want to harmonize this at some point). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-06refs: pass refname when invoking reflog entry callbackPatrick Steinhardt1-6/+9
With `refs_for_each_reflog_ent()` callers can iterate through all the reflog entries for a given reference. The callback that is being invoked for each such entry does not receive the name of the reference that we are currently iterating through. This isn't really a limiting factor, as callers can simply pass the name via the callback data. But this layout sometimes does make for a bit of an awkward calling pattern. One example: when iterating through all reflogs, and for each reflog we iterate through all refnames, we have to do some extra book keeping to track which reference name we are currently yielding reflog entries for. Change the signature of the callback function so that the reference name of the reflog gets passed through to it. Adapt callers accordingly and start using the new parameter in trivial cases. The next commit will refactor the reference migration logic to make use of this parameter so that we can simplify its logic a bit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-06Merge branch 'ps/reflog-migrate-fixes' into ps/remote-rename-fixJunio C Hamano1-7/+58
* ps/reflog-migrate-fixes: refs: fix invalid old object IDs when migrating reflogs refs: stop unsetting REF_HAVE_OLD for log-only updates refs/files: detect race when generating reflog entry for HEAD refs: fix identity for migrated reflogs ident: fix type of string length parameter builtin/reflog: implement subcommand to write new entries refs: export `ref_transaction_update_reflog()` builtin/reflog: improve grouping of subcommands Documentation/git-reflog: convert to use synopsis type
2025-08-06refs: fix invalid old object IDs when migrating reflogsPatrick Steinhardt1-1/+15
When migrating reflog entries between different storage formats we end up with invalid old object IDs for the migrated entries: instead of writing the old object ID of the to-be-migrated entry, we end up with the all-zeroes object ID. The root cause of this issue is that we don't know to use the old object ID provided by the caller. Instead, we manually resolve the old object ID by resolving the current value of its matching reference. But as that reference does not yet exist in the target ref storage we always end up resolving it to all-zeroes. This issue got unnoticed as there is no user-facing command that would even show the old object ID. While `git log -g` knows to show the new object ID, we don't have any formatting directive to show the old object ID. Fix the bug by introducing a new flag `REF_LOG_USE_PROVIDED_OIDS`. If set, backends are instructed to use the old and new object IDs provided by the caller, without doing any manual resolving. Set this flag in `ref_transaction_update_reflog()`. Amend our tests in t1460-refs-migrate to use our test tool to read reflog entries. This test tool prints out both old and new object ID of each reflog entry, which fixes the test gap. Furthermore it also prints the full identity used to write the reflog, which provides test coverage for the previous commit in this patch series that fixed the identity for migrated reflogs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-06refs: stop unsetting REF_HAVE_OLD for log-only updatesPatrick Steinhardt1-4/+5
The `REF_HAVE_OLD` flag indicates whether a given ref update has its old object ID set. If so, the value of that field is used to verify whether the current state of the reference matches this expected state. It is thus an important part of mitigating races with a concurrent process that updates the same set of references. When writing reflogs though we explicitly unset that flag. This is a sensible thing to do: the old state of reflog entry updates may not necessarily match the current on-disk state of its accompanying ref, but it's only intended to signal what old object ID we want to write into the new reflog entry. For example when migrating refs we end up writing many reflog entries for a single reference, and most likely those reflog entries will have many different old object IDs. But unsetting this flag also removes a useful signal, namely that the caller _did_ provide an old object ID for a given reflog entry. This signal will become useful in a subsequent commit, where we add a new flag that tells the transaction to use the provided old and new object IDs to write a reflog entry. The `REF_HAVE_OLD` flag is then used as a signal to verify that the caller really did provide an old object ID. Stop unsetting the flag so that we can use it as this described signal in a subsequent commit. Skip checking the old object ID for log-only updates so that we don't expect it to match the current on-disk state. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-06refs/files: detect race when generating reflog entry for HEADPatrick Steinhardt1-2/+38
When updating a reference that is being pointed to HEAD we don't only write a reflog message for that particular reference, but also generate one for HEAD. This logic is handled by `split_head_update()`, where we: 1. Verify that the condition actually triggered. This is done by reading HEAD at the start of the transaction so that we can then check whether a given reference update refers to its target. 2. Queue a new log-only update for HEAD in case it did. But the logic is unfortunately not free of races, as we do not lock the HEAD reference after we have read its target. This can lead to the following two scenarios: - HEAD gets concurrently updated to point to one of the references we have already processed. This causes us not writing a reflog message even though we should have done so. - HEAD gets concurrently updated to no longer point to a reference anymore that we have already processed. This causes us to write a reflog message even though we should _not_ have done so. Improve the situation by introducing a new `REF_LOG_VIA_SPLIT` flag that is specific to the "files" backend. If set, we will double check that the HEAD reference still points to the reference that we are creating the reflog entry for after we have locked HEAD. Furthermore, instead of manually resolving the old object ID of that entry, we now use the same old state as for the parent update. If we detect such a racy update we abort the transaction. This is a bit heavy-handed: the user didn't even ask us to write a reflog update for "HEAD", so it might be surprising if we abort the transaction. That being said: - Normal users wouldn't typically hit this case as we only hit the relevant code when committing to a branch that is being pointed to by "HEAD" directly. Commands like git-commit(1) typically commit to "HEAD" itself though. - Scripted users that use git-update-ref(1) and related plumbing commands are unlikely to hit this case either, as they would have to update the pointed-to-branch at the same as "HEAD" is being updated, which is an exceedingly rare event. The alternative would be to instead drop the log-only update completely, but that would require more logic that is hard to verify without adding infrastructure specific for such a test. So we rather do the pragmatic thing and don't worry too much about an edge case that is very unlikely to happen. Unfortunately, this change only helps with the second race. We cannot reliably plug the first race without locking the HEAD reference at the start of the transaction. Locking HEAD unconditionally would effectively serialize all writes though, and that doesn't seem like an option. Also, double checking its value at the end of the transaction is not an option either, as its target may have flip-flopped during the transaction. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-03Merge branch 'kn/for-each-ref-skip'Junio C Hamano1-3/+4
"git for-each-ref" learns "--start-after" option to help applications that want to page its output. * kn/for-each-ref-skip: ref-cache: set prefix_state when seeking for-each-ref: introduce a '--start-after' option ref-filter: remove unnecessary else clause refs: selectively set prefix in the seek functions ref-cache: remove unused function 'find_ref_entry()' refs: expose `ref_iterator` via 'refs.h'
2025-07-16Merge branch 'ps/refs-files-remove-empty-parent'Junio C Hamano1-0/+2
When a ref creation at refs/heads/foo/bar fails, the files backend now removes refs/heads/foo/ if the directory is otherwise not used. * ps/refs-files-remove-empty-parent: refs/files: remove empty parent dirs when ref creation fails
2025-07-15refs: selectively set prefix in the seek functionsKarthik Nayak1-3/+4
The ref iterator exposes a `ref_iterator_seek()` function. The name suggests that this would seek the iterator to a specific reference in some ways similar to how `fseek()` works for the filesystem. However, the function actually sets the prefix for refs iteration. So further iteration would only yield references which match the particular prefix. This is a bit confusing. Let's add a 'flags' field to the function, which when set with the 'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the iteration in-line with the existing behavior. Otherwise, the reference backends will simply seek to the specified reference and clears any previously set prefix. This allows users to start iteration from a specific reference. In the packed and reftable backend, since references are available in a sorted list, the changes are simply setting the prefix if needed. The changes on the files-backend are a little more involved, since the files backend uses the 'ref-cache' mechanism. We move out the existing logic within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()` which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We then parse the provided seek string and set the required levels and their indexes to ensure that seeking is possible. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-08Merge branch 'kn/fetch-push-bulk-ref-update'Junio C Hamano1-0/+7
"git push" and "git fetch" are taught to update refs in batches to gain performance. * kn/fetch-push-bulk-ref-update: receive-pack: handle reference deletions separately refs/files: skip updates with errors in batched updates receive-pack: use batched reference updates send-pack: fix memory leak around duplicate refs fetch: use batched reference updates refs: add function to translate errors to strings
2025-07-08refs/files: remove empty parent dirs when ref creation failsPatrick Steinhardt1-0/+2
When creating a new reference in the "files" backend we first create the directory hierarchy for that reference, then create the lockfile for that reference, and finally rename the lockfile into place. When the transaction gets aborted we prune the lockfile, but we don't clean up the directory hierarchy that we may have created for the lockfile. In some egde cases this can lead to lots of empty directories being cluttered in the ".git/refs" directory that really serve no purpose at all. We know to prune such empty directories when packing refs, but that only patches over the issue. Improve this by removing empty parents when cleaning up still-locked references in `files_transaction_cleanup()`. This function is also called when preparing or committing the transaction, so this change also helps when not explicitly aborting the transaction. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-25refs/files: skip updates with errors in batched updatesKarthik Nayak1-0/+7
The commit 23fc8e4f61 (refs: implement batch reference update support, 2025-04-08) introduced support for batched reference updates. This allows users to batch updates together, while allowing some of the updates to fail. Under the hood, batched updates use the reference transaction mechanism. Each update which fails is marked as such. Any failed updates must be skipped over in the rest of the code, as they wouldn't apply any more. In two of the loops within 'files_transaction_finish()' of the files backend, the failed updates aren't skipped over. This can cause a SEGFAULT otherwise. Add the missing skips and a test to validate the same. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03Merge branch 'sj/ref-contents-check-fix'Junio C Hamano1-0/+3
"git verify-refs" (and hence "git fsck --reference") started erroring out in a repository in which secondary worktrees were prepared with Git 2.43 or lower. * sj/ref-contents-check-fix: fsck: ignore missing "refs" directory for linked worktrees
2025-06-02fsck: ignore missing "refs" directory for linked worktreesshejialuo1-0/+3
"git refs verify" doesn't work if there are worktrees created on Git v2.43.0 or older versions. These versions don't automatically create the "refs" directory, causing the error: error: cannot open directory .git/worktrees/<worktree name>/refs: No such file or directory Since 8f4c00de95 (builtin/worktree: create refdb via ref backend, 2024-01-08), we automatically create the "refs" directory for new worktrees. And in 7c78d819e6 (ref: support multiple worktrees check for refs, 2024-11-20), we assume that all linked worktrees have this directory and would wrongly report an error to the user, thus introducing compatibility issue. Check for ENOENT errno before reporting directory access errors for linked worktrees to maintain backward compatibility. Reported-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup'Junio C Hamano1-2/+2
Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-16Merge branch 'kn/non-transactional-batch-updates'Junio C Hamano1-176/+138
Updating multiple references have only been possible in all-or-none fashion with transactions, but it can be more efficient to batch multiple updates even when some of them are allowed to fail in a best-effort manner. A new "best effort batches of updates" mode has been introduced. * kn/non-transactional-batch-updates: update-ref: add --batch-updates flag for stdin mode refs: support rejection in batch updates during F/D checks refs: implement batch reference update support refs: introduce enum-based transaction error types refs/reftable: extract code from the transaction preparation refs/files: remove duplicate duplicates check refs: move duplicate refname update check to generic layer refs/files: remove redundant check in split_symref_update()
2025-04-15Merge branch 'ps/object-wo-the-repository'Junio C Hamano1-1/+1
The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-15object-file: move `safe_create_leading_directories()` into "path.c"Patrick Steinhardt1-2/+2
The `safe_create_leading_directories()` function and its relatives are located in "object-file.c", which is not a good fit as they provide generic functionality not related to objects at all. Move them into "path.c", which already hosts `safe_create_dir()` and its relative `safe_create_dir_in_gitdir()`. "path.c" is free of `the_repository`, but the moved functions depend on `the_repository` to read the "core.sharedRepository" config. Adapt the function signature to accept a repository as argument to fix the issue and adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanupJunio C Hamano1-1/+1
* ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-08refs: support rejection in batch updates during F/D checksKarthik Nayak1-9/+18
The `refs_verify_refnames_available()` is used to batch check refnames for F/D conflicts. While this is the more performant alternative than its individual version, it does not provide rejection capabilities on a single update level. For batched updates, this would mean a rejection of the entire transaction whenever one reference has a F/D conflict. Modify the function to call `ref_transaction_maybe_set_rejected()` to check if a single update can be rejected. Since this function is only internally used within 'refs/' and we want to pass in a `struct ref_transaction *` as a variable. We also move and mark `refs_verify_refnames_available()` to 'refs-internal.h' to be an internal function. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: implement batch reference update supportKarthik Nayak1-1/+11
Git supports making reference updates with or without transactions. Updates with transactions are generally better optimized. But transactions are all or nothing. This means, if a user wants to batch updates to take advantage of the optimizations without the hard requirement that all updates must succeed, there is no way currently to do so. Particularly with the reftable backend where batching multiple reference updates is more efficient than performing them sequentially. Introduce batched update support with a new flag, 'REF_TRANSACTION_ALLOW_FAILURE'. Batched updates while different from transactions, use the transaction infrastructure under the hood. When enabled, this flag allows individual reference updates that would typically cause the entire transaction to fail due to non-system-related errors to be marked as rejected while permitting other updates to proceed. System errors referred by 'REF_TRANSACTION_ERROR_GENERIC' continue to result in the entire transaction failing. This approach enhances flexibility while preserving transactional integrity where necessary. The implementation introduces several key components: - Add 'rejection_err' field to struct `ref_update` to track failed updates with failure reason. - Add a new struct `ref_transaction_rejections` and a field within `ref_transaction` to this struct to allow quick iteration over rejected updates. - Modify reference backends (files, packed, reftable) to handle partial transactions by using `ref_transaction_set_rejected()` instead of failing the entire transaction when `REF_TRANSACTION_ALLOW_FAILURE` is set. - Add `ref_transaction_for_each_rejected_update()` to let callers examine which updates were rejected and why. This foundational change enables batched update support throughout the reference subsystem. A following commit will expose this capability to users by adding a `--batch-updates` flag to 'git-update-ref(1)', providing both a user-facing feature and a testable implementation. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: introduce enum-based transaction error typesKarthik Nayak1-100/+102
Replace preprocessor-defined transaction errors with a strongly-typed enum `ref_transaction_error`. This change: - Improves type safety and function signature clarity. - Makes error handling more explicit and discoverable. - Maintains existing error cases, while adding new error cases for common scenarios. This refactoring paves the way for more comprehensive error handling which we will utilize in the upcoming commits to add batch reference update support. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs/files: remove duplicate duplicates checkKarthik Nayak1-6/+0
Within the files reference backend's transaction's 'finish' phase, a verification step is currently performed wherein the refnames list is sorted and examined for multiple updates targeting the same refname. It has been observed that this verification is redundant, as an identical check is already executed during the transaction's 'prepare' stage. Since the refnames list remains unmodified following the 'prepare' stage, this secondary verification can be safely eliminated. The duplicate check has been removed accordingly, and the `ref_update_reject_duplicates()` function has been marked as static, as its usage is now confined to 'refs.c'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: move duplicate refname update check to generic layerKarthik Nayak1-53/+14
Move the tracking of refnames in `affected_refnames` from individual backends into the generic layer in 'refs.c'. This centralizes the duplicate refname detection that was previously handled separately by each backend. Make some changes to accommodate this move: - Add a `string_list` field `refnames` to `ref_transaction` to contain all the references in a transaction. This field is updated whenever a new update is added via `ref_transaction_add_update`, so manual additions in reference backends are dropped. - Modify the backends to use this field internally as needed. The backends need to check if an update for refname already exists when splitting symrefs or adding an update for 'HEAD'. - In the reftable backend, within `reftable_be_transaction_prepare()`, move the `string_list_has_string()` check above `ref_transaction_add_update()`. Since `ref_transaction_add_update()` automatically adds the refname to `transaction->refnames`, performing the check after will always return true, so we perform the check before adding the update. This helps reduce duplication of functionality between the backends and makes it easier to make changes in a more centralized manner. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs/files: remove redundant check in split_symref_update()Karthik Nayak1-17/+3
In `split_symref_update()`, there were two checks for duplicate refnames: - At the start, `string_list_has_string()` ensures the refname is not already in `affected_refnames`, preventing duplicates from being added. - After adding the refname, another check verifies whether the newly inserted item has a `util` value. The second check is unnecessary because the first one guarantees that `string_list_insert()` will never encounter a preexisting entry. The `item->util` field is assigned to validate that a rename doesn't already exist in the list. The validation is done after the first check. As this check is removed, clean up the validation and the assignment of this field in `split_head_update()` and `files_transaction_prepare()`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-29Merge branch 'ps/refname-avail-check-optim'Junio C Hamano1-44/+73
The code paths to check whether a refname X is available (by seeing if another ref X/Y exists, etc.) have been optimized. * ps/refname-avail-check-optim: refs: reuse iterators when determining refname availability refs/iterator: implement seeking for files iterators refs/iterator: implement seeking for packed-ref iterators refs/iterator: implement seeking for ref-cache iterators refs/iterator: implement seeking for reftable iterators refs/iterator: implement seeking for merged iterators refs/iterator: provide infrastructure to re-seek iterators refs/iterator: separate lifecycle from iteration refs: stop re-verifying common prefixes for availability refs/files: batch refname availability checks for initial transactions refs/files: batch refname availability checks for normal transactions refs/reftable: batch refname availability checks refs: introduce function to batch refname availability checks builtin/update-ref: skip ambiguity checks when parsing object IDs object-name: allow skipping ambiguity checks in `get_oid()` family object-name: introduce `repo_get_oid_with_flags()`
2025-03-12refs/iterator: implement seeking for files iteratorsPatrick Steinhardt1-0/+16
Implement seeking for "files" iterators. As we simply use a ref-cache iterator under the hood the implementation is straight-forward. Note that we do not implement seeking on reflog iterators, same as with the "reftable" backend. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs/iterator: separate lifecycle from iterationPatrick Steinhardt1-26/+10
The ref and reflog iterators have their lifecycle attached to iteration: once the iterator reaches its end, it is automatically released and the caller doesn't have to care about that anymore. When the iterator should be released before it has been exhausted, callers must explicitly abort the iterator via `ref_iterator_abort()`. This lifecycle is somewhat unusual in the Git codebase and creates two problems: - Callsites need to be very careful about when exactly they call `ref_iterator_abort()`, as calling the function is only valid when the iterator itself still is. This leads to somewhat awkward calling patterns in some situations. - It is impossible to reuse iterators and re-seek them to a different prefix. This feature isn't supported by any iterator implementation except for the reftable iterators anyway, but if it was implemented it would allow us to optimize cases where we need to search for specific references repeatedly by reusing internal state. Detangle the lifecycle from iteration so that we don't deallocate the iterator anymore once it is exhausted. Instead, callers are now expected to always call a newly introduce `ref_iterator_free()` function that deallocates the iterator and its internal state. Note that the `dir_iterator` is somewhat special because it does not implement the `ref_iterator` interface, but is only used to implement other iterators. Consequently, we have to provide `dir_iterator_free()` instead of `dir_iterator_release()` as the allocated structure itself is managed by the `dir_iterator` interfaces, as well, and not freed by `ref_iterator_free()` like in all the other cases. While at it, drop the return value of `ref_iterator_abort()`, which wasn't really required by any of the iterator implementations anyway. Furthermore, stop calling `base_ref_iterator_free()` in any of the backends, but instead call it in `ref_iterator_free()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs/files: batch refname availability checks for initial transactionsPatrick Steinhardt1-7/+16
The "files" backend explicitly carves out special logic for its initial transaction so that it can avoid writing out every single reference as a loose reference. While the assumption is that there shouldn't be any preexisting references, we still have to verify that none of the newly written references will conflict with any other new reference in the same transaction. Refactor the initial transaction to use batched refname availability checks. This does not yet have an effect on performance as we still call `refs_verify_refname_available()` in a loop. But this will change in subsequent commits and then impact performance when cloning a repository with many references or when migrating references to the "files" format. This will improve performance when cloning a repository with many references or when migrating references from any format to the "files" format once the availability checks have learned to optimize checks for many references in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs/files: batch refname availability checks for normal transactionsPatrick Steinhardt1-11/+31
Same as the "reftable" backend that we have adapted in the preceding commit to use batched refname availability checks we can also do so for the "files" backend. Things are a bit more intricate here though, as we call `refs_verify_refname_available()` in a set of different contexts: 1. `lock_raw_ref()` when it hits either EEXISTS or EISDIR when creating a new reference, mostly to create a nice, user-readable error message. This is nothing we have to care about too much, as we only hit this code path at most once when we hit a conflict. 2. `lock_raw_ref()` when it _could_ create the lockfile to check whether it is conflicting with any packed refs. In the general case, this code path will be hit once for every (successful) reference update. 3. `lock_ref_oid_basic()`, but it is only executed when copying or renaming references or when expiring reflogs. It will thus not be called in contexts where we have many references queued up. 4. `refs_refname_ref_available()`, but again only when copying or renaming references. It is thus not interesting due to the same reason as the previous case. 5. `files_transaction_finish_initial()`, which is only executed when creating a new repository or migrating references. So out of these, only (2) and (5) are viable candidates to use the batched checks. Adapt `lock_raw_ref()` accordingly by queueing up reference names that need to be checked for availability and then checking them after we have processed all updates. This check is done before we (optionally) lock the `packed-refs` file, which is somewhat flawed because it means that the `packed-refs` could still change after the availability check and thus create an undetected conflict. But unconditionally locking the file would change semantics that users are likely to rely on, so we keep the current locking sequence intact, even if it's suboptmial. The refactoring of `files_transaction_finish_initial()` will be done in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt1-1/+1
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-28path: adjust last remaining users of `the_repository`Patrick Steinhardt1-5/+5
With the preceding refactorings we now only have a couple of implicit users of `the_repository` left in the "path" subsystem, all of which depend on global state via `calc_shared_perm()`. Make the dependency on `the_repository` explicit by passing the repo as a parameter instead and adjust callers accordingly. Note that this change bubbles up into a couple of subsystems that were previously declared as free from `the_repository`. Instead of marking all of them as `the_repository`-dependent again, we instead use the repository that is available in the calling context. There are three exceptions though with "copy.c", "pack-write.c" and "tempfile.c". Adjusting these would require us to adapt callsites all over the place, so this is left for a future iteration. Mark "path.c" as free from `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-29Merge branch 'kn/reflog-symref-fix'Junio C Hamano1-3/+0
reflog entries for symbolic ref updates were broken, which has been corrected. * kn/reflog-symref-fix: refs: fix creation of reflog entries for symrefs
2025-01-23refs: fix creation of reflog entries for symrefsKarthik Nayak1-3/+0
The commit 297c09eabb (refs: allow multiple reflog entries for the same refname, 2024-12-16) added logic to exit early in `lock_ref_for_update()` after obtaining the required lock. This was added as a performance optimization on a false assumption that no further processing was required for reflog-only updates. However the assumption was wrong. For a symref's reflog entry, the update needs to be populated with the old_oid value, but the early exit skipped this necessary step. This caused a bug in Git 2.48 in the files backend where target references of symrefs being updated would create a corrupted reflog entry for the symref since the old_oid is not populated. Everything the early exit skipped in the code path is necessary for both regular and symbolic ref, so eliminate the mistaken optimization, and also add a test to ensure that such an issue doesn't arise in the future. Reported-by: Nika Layzell <nika@thelayzells.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'kn/reflog-migration'Junio C Hamano1-45/+86
"git refs migrate" learned to also migrate the reflog data across backends. * kn/reflog-migration: refs: mark invalid refname message for translation refs: add support for migrating reflogs refs: allow multiple reflog entries for the same refname refs: introduce the `ref_transaction_update_reflog` function refs: add `committer_info` to `ref_transaction_add_update()` refs: extract out refname verification in transactions refs/files: add count field to ref_lock refs: add `index` field to `struct ref_udpate` refs: include committer info in `ref_update` struct
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano1-0/+1
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-19Merge branch 'bf/set-head-symref'Junio C Hamano1-12/+19
When "git fetch $remote" notices that refs/remotes/$remote/HEAD is missing and discovers what branch the other side points with its HEAD, refs/remotes/$remote/HEAD is updated to point to it. * bf/set-head-symref: fetch set_head: handle mirrored bare repositories fetch: set remote/HEAD if it does not exist refs: add create_only option to refs_update_symref_extended refs: add TRANSACTION_CREATE_EXISTS error remote set-head: better output for --auto remote set-head: refactor for readability refs: atomically record overwritten ref in update_symref refs: standardize output of refs_read_symbolic_ref t/t5505-remote: test failure of set-head t/t5505-remote: set default branch to main
2024-12-16refs: allow multiple reflog entries for the same refnameKarthik Nayak1-4/+11
The reference transaction only allows a single update for a given reference to avoid conflicts. This, however, isn't an issue for reflogs. There are no conflicts to be resolved in reflogs and when migrating reflogs between backends we'd have multiple reflog entries for the same refname. So allow multiple reflog updates within a single transaction. Also the reflog creation logic isn't exposed to the end user. While this might change in the future, currently, this reduces the scope of issues to think about. In the reftable backend, the writer sorts all updates based on the update_index before writing to the block. When there are multiple reflogs for a given refname, it is essential that the order of the reflogs is maintained. So add the `index` value to the `update_index`. The `index` field is only set when multiple reflog entries for a given refname are added and as such in most scenarios the old behavior remains. This is required to add reflog migration support to `git refs migrate`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16refs: introduce the `ref_transaction_update_reflog` functionKarthik Nayak1-8/+16
Introduce a new function `ref_transaction_update_reflog`, for clients to add a reflog update to a transaction. While the existing function `ref_transaction_update` also allows clients to add a reflog entry, this function does a few things more, It: - Enforces that only a reflog entry is added and does not update the ref itself. - Allows the users to also provide the committer information. This means clients can add reflog entries with custom committer information. The `transaction_refname_valid()` function also modifies the error message selectively based on the type of the update. This change also affects reflog updates which go through `ref_transaction_update()`. A follow up commit will utilize this function to add reflog support to `git refs migrate`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16refs: add `committer_info` to `ref_transaction_add_update()`Karthik Nayak1-6/+8
The `ref_transaction_add_update()` creates the `ref_update` struct. To facilitate addition of reflogs in the next commit, the function needs to accommodate setting the `committer_info` field in the struct. So modify the function to also take `committer_info` as an argument and set it accordingly. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16refs/files: add count field to ref_lockKarthik Nayak1-19/+39
When refs are updated in the files-backend, a lock is obtained for the corresponding file path. This is the case even for reflogs, i.e. a lock is obtained on the reference path instead of the reflog path. This works, since generally, reflogs are updated alongside the ref. The upcoming patches will add support for reflog updates in ref transaction. This means, in a particular transaction we want to have ref updates and reflog updates. For a given ref in a given transaction there can be at most one update. But we can theoretically have multiple reflog updates for a given ref in a given transaction. A great example of this would be when migrating reflogs from one backend to another. There we would batch all the reflog updates for a given reference in a single transaction. The current flow does not support this, because currently refs & reflogs are treated as a single entity and capture the lock together. To separate this, add a count field to ref_lock. With this, multiple updates can hold onto a single ref_lock and the lock will only be released when all of them release the lock. This patch only adds the `count` field to `ref_lock` and adds the logic to increment and decrement the lock. In a follow up commit, we'll separate the reflog update logic from ref updates and utilize this functionality. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16refs: include committer info in `ref_update` structKarthik Nayak1-10/+14
The reference backends obtain the committer information from `git_committer_info(0)` when adding a reflog. The upcoming patches introduce support for migrating reflogs between the reference backends. This requires an interface to creating reflogs, including custom committer information. Add a new field `committer_info` to the `ref_update` struct, which is then used by the reference backends. If there is no `committer_info` provided, the reference backends default to using `git_committer_info(0)`. The field itself cannot be set to `git_committer_info(0)` since the values are dynamic and must be obtained right when the reflog is being committed. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt1-0/+1
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04Merge branch 'sj/ref-contents-check'Junio C Hamano1-16/+178
"git fsck" learned to issue warnings on "curiously formatted" ref contents that have always been taken valid but something Git wouldn't have written itself (e.g., missing terminating end-of-line after the full object name). * sj/ref-contents-check: ref: add symlink ref content check for files backend ref: check whether the target of the symref is a ref ref: add basic symref content check for files backend ref: add more strict checks for regular refs ref: port git-fsck(1) regular refs check for files backend ref: support multiple worktrees check for refs ref: initialize ref name outside of check functions ref: check the full refname instead of basename ref: initialize "fsck_ref_report" with zero
2024-11-25refs: add TRANSACTION_CREATE_EXISTS errorBence Ferdinandy1-8/+16
Currently there is only one special error for transaction, for when there is a naming conflict, all other errors are dumped under a generic error. Add a new special error case for when the caller requests the reference to be updated only when it does not yet exist and the reference actually does exist. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25refs: standardize output of refs_read_symbolic_refBence Ferdinandy1-4/+3
When the symbolic reference we want to read with refs_read_symbolic_ref is actually not a symbolic reference, the files and the reftable backends return different values (1 and -1 respectively). Standardize the returned values so that 0 is success, -1 is a generic error and -2 is that the reference was actually non-symbolic. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: add symlink ref content check for files backendshejialuo1-4/+34
Besides the textual symref, we also allow symbolic links as the symref. So, we should also provide the consistency check as what we have done for textual symref. And also we consider deprecating writing the symbolic links. We first need to access whether symbolic links still be used. So, add a new fsck message "symlinkRef(INFO)" to tell the user be aware of this information. We have already introduced "files_fsck_symref_target". We should reuse this function to handle the symrefs which use legacy symbolic links. We should not check the trailing garbage for symbolic refs. Add a new parameter "symbolic_link" to disable some checks which should only be executed for textual symrefs. And we need to also generate the "referent" parameter for reusing "files_fsck_symref_target" by the following steps: 1. Use "strbuf_add_real_path" to resolve the symlink and get the absolute path "ref_content" which the symlink ref points to. 2. Generate the absolute path "abs_gitdir" of "gitdir" and combine "ref_content" and "abs_gitdir" to extract the relative path "relative_referent_path". 3. If "ref_content" is outside of "gitdir", we just set "referent" with "ref_content". Instead, we set "referent" with "relative_referent_path". Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: check whether the target of the symref is a refshejialuo1-2/+12
Ideally, we want to the users use "git symbolic-ref" to create symrefs instead of writing raw contents into the filesystem. However, "git symbolic-ref" is strict with the refname but not strict with the referent. For example, we can make the "referent" located at the "$(gitdir)/logs/aaa" and manually write the content into this where we can still successfully parse this symref by using "git rev-parse". $ git init repo && cd repo && git commit --allow-empty -mx $ git symbolic-ref refs/heads/test logs/aaa $ echo $(git rev-parse HEAD) > .git/logs/aaa $ git rev-parse test We may need to add some restrictions for "referent" parameter when using "git symbolic-ref" to create symrefs because ideally all the nonpseudo-refs should be located under the "refs" directory and we may tighten this in the future. In order to tell the user we may tighten the above situation, create a new fsck message "symrefTargetIsNotARef" to notify the user that this may become an error in the future. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: add basic symref content check for files backendshejialuo1-0/+40
We have code that checks regular ref contents, but we do not yet check the contents of symbolic refs. By using "parse_loose_ref_content" for symbolic refs, we will get the information of the "referent". We do not need to check the "referent" by opening the file. This is because if "referent" exists in the file system, we will eventually check its correctness by inspecting every file in the "refs" directory. If the "referent" does not exist in the filesystem, this is OK as it is seen as the dangling symref. So we just need to check the "referent" string content. A regular ref could be accepted as a textual symref if it begins with "ref:", followed by zero or more whitespaces, followed by the full refname, followed only by whitespace characters. However, we always write a single SP after "ref:" and a single LF after the refname. It may seem that we should report a fsck error message when the "referent" does not apply above rules and we should not be so aggressive because third-party reimplementations of Git may have taken advantage of the looser syntax. Put it more specific, we accept the following contents: 1. "ref: refs/heads/master " 2. "ref: refs/heads/master \n \n" 3. "ref: refs/heads/master\n\n" When introducing the regular ref content checks, we created two fsck infos "refMissingNewline" and "trailingRefContent" which exactly represents above situations. So we will reuse these two fsck messages to write checks to info the user about these situations. But we do not allow any other trailing garbage. The followings are bad symref contents which will be reported as fsck error by "git-fsck(1)". 1. "ref: refs/heads/master garbage\n" 2. "ref: refs/heads/master \n\n\n garbage " And we introduce a new "badReferentName(ERROR)" fsck message to report above errors by using "is_root_ref" and "check_refname_format" to check the "referent". Since both "is_root_ref" and "check_refname_format" don't work with whitespaces, we use the trimmed version of "referent" with these functions. In order to add checks, we will do the following things: 1. Record the untrimmed length "orig_len" and untrimmed last byte "orig_last_byte". 2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure "is_root_ref" and "check_refname_format" won't be failed by them. 3. Use "orig_len" and "orig_last_byte" to check whether the "referent" misses '\n' at the end or it has trailing whitespaces or newlines. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: add more strict checks for regular refsshejialuo1-3/+23
We have already used "parse_loose_ref_contents" function to check whether the ref content is valid in files backend. However, by using "parse_loose_ref_contents", we allow the ref's content to end with garbage or without a newline. Even though we never create such loose refs ourselves, we have accepted such loose refs. So, it is entirely possible that some third-party tools may rely on such loose refs being valid. We should not report an error fsck message at current. We should notify the users about such "curiously formatted" loose refs so that adequate care is taken before we decide to tighten the rules in the future. And it's not suitable either to report a warn fsck message to the user. We don't yet want the "--strict" flag that controls this bit to end up generating errors for such weirdly-formatted reference contents, as we first want to assess whether this retroactive tightening will cause issues for any tools out there. It may cause compatibility issues which may break the repository. So, we add the following two fsck infos to represent the situation where the ref content ends without newline or has trailing garbages: 1. refMissingNewline(INFO): A loose ref that does not end with newline(LF). 2. trailingRefContent(INFO): A loose ref has trailing content. It might appear that we can't provide the user with any warnings by using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert FSCK_INFO to FSCK_WARN and we can still warn the user about these situations when using "git refs verify" without introducing compatibility issues. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: port git-fsck(1) regular refs check for files backendshejialuo1-0/+47
"git-fsck(1)" implicitly checks the ref content by passing the callback "fsck_handle_ref" to the "refs.c::refs_for_each_rawref". Then, it will check whether the ref content (eventually "oid") is valid. If not, it will report the following error to the user. error: refs/heads/main: invalid sha1 pointer 0000... And it will also report above errors when there are dangling symrefs in the repository wrongly. This does not align with the behavior of the "git symbolic-ref" command which allows users to create dangling symrefs. As we have already introduced the "git refs verify" command, we'd better check the ref content explicitly in the "git refs verify" command thus later we could remove these checks in "git-fsck(1)" and launch a subprocess to call "git refs verify" in "git-fsck(1)" to make the "git-fsck(1)" more clean. Following what "git-fsck(1)" does, add a similar check to "git refs verify". Then add a new fsck error message "badRefContent(ERROR)" to represent that a ref has an invalid content. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: support multiple worktrees check for refsshejialuo1-5/+12
We have already set up the infrastructure to check the consistency for refs, but we do not support multiple worktrees. However, "git-fsck(1)" will check the refs of worktrees. As we decide to get feature parity with "git-fsck(1)", we need to set up support for multiple worktrees. Because each worktree has its own specific refs, instead of just showing the users "refs/worktree/foo", we need to display the full name such as "worktrees/<id>/refs/worktree/foo". So we should know the id of the worktree to get the full name. Add a new parameter "struct worktree *" for "refs-internal.h::fsck_fn". Then change the related functions to follow this new interface. The "packed-refs" only exists in the main worktree, so we should only check "packed-refs" in the main worktree. Use "is_main_worktree" method to skip checking "packed-refs" in "packed_fsck" function. Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add "worktree/<id>/" prefix when we are not in the main worktree. Last, add a new test to check the refname when there are multiple worktrees to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: initialize ref name outside of check functionsshejialuo1-8/+13
We passes "refs_check_dir" to the "files_fsck_refs_name" function which allows it to create the checked ref name later. However, when we introduce a new check function, we have to allocate redundant memory and re-calculate the ref name. It's bad for us to allocate redundant memory and duplicate logic. Instead, we should allocate and calculate it only once and pass the ref name to the check functions. In order not to do repeat calculation, rename "refs_check_dir" to "refname". And in "files_fsck_refs_dir", create a new strbuf "refname", thus whenever we handle a new ref, calculate the name and call the check functions one by one. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: check the full refname instead of basenameshejialuo1-2/+5
In "files-backend.c::files_fsck_refs_name", we validate the refname format by using "check_refname_format" to check the basename of the iterator with "REFNAME_ALLOW_ONELEVEL" flag. However, this is a bad implementation. Although we doesn't allow a single "@" in ".git" directory, we do allow "refs/heads/@". So, we will report an error wrongly when there is a "refs/heads/@" ref by using one level refname "@". Because we just check one level refname, we either cannot check the other parts of the full refname. And we will ignore the following errors: "refs/heads/ new-feature/test" "refs/heads/~new-feature/test" In order to fix the above problem, enhance "files_fsck_refs_name" to use the full name for "check_refname_format". Then, replace the tests which are related to "@" and add tests to exercise the above situations using for loop to avoid repetition. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: initialize "fsck_ref_report" with zeroshejialuo1-1/+1
In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and "referent" is NULL. So, we need to always initialize these parameters to NULL instead of letting them point to anywhere when creating a new "fsck_ref_report" structure. The original code explicitly initializes the "path" member in the "struct fsck_ref_report" to NULL (which implicitly 0-initializes other members in the struct). It is more customary to use "{ 0 }" to express that we are 0-initializing everything. In order to align with the codebase, initialize "fsck_ref_report" with zero. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs: skip collision checks in initial transactionsPatrick Steinhardt1-7/+6
Reference transactions use `refs_verify_refname_available()` to check for colliding references. This check consists of two parts: - Checks for whether multiple ref updates in the same transaction conflict with each other. - Checks for whether existing refs conflict with any refs part of the transaction. While we generally cannot avoid the first check, the second check is superfluous in cases where the transaction is an initial one in an otherwise empty ref store. The check results in multiple ref reads as well as the creation of a ref iterator for every ref we're checking, which adds up quite fast when performing the check for many refs. Introduce a new flag that allows us to skip this check and wire it up in such that the backends pass it when running an initial transaction. This leads to significant speedups when migrating ref storage backends. From "files" to "reftable": Benchmark 1: migrate files:reftable (refcount = 100000, revision = HEAD~) Time (mean ± σ): 472.4 ms ± 6.7 ms [User: 175.9 ms, System: 285.2 ms] Range (min … max): 463.5 ms … 483.2 ms 10 runs Benchmark 2: migrate files:reftable (refcount = 100000, revision = HEAD) Time (mean ± σ): 86.1 ms ± 1.9 ms [User: 67.9 ms, System: 16.0 ms] Range (min … max): 82.9 ms … 90.9 ms 29 runs Summary migrate files:reftable (refcount = 100000, revision = HEAD) ran 5.48 ± 0.15 times faster than migrate files:reftable (refcount = 100000, revision = HEAD~) And from "reftable" to "files": Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~) Time (mean ± σ): 452.7 ms ± 3.4 ms [User: 209.9 ms, System: 235.4 ms] Range (min … max): 445.9 ms … 457.5 ms 10 runs Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD) Time (mean ± σ): 95.2 ms ± 2.2 ms [User: 73.6 ms, System: 20.6 ms] Range (min … max): 91.7 ms … 100.8 ms 28 runs Summary migrate reftable:files (refcount = 100000, revision = HEAD) ran 4.76 ± 0.11 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs/files: support symbolic and root refs in initial transactionPatrick Steinhardt1-10/+34
The "files" backend has implemented special logic when committing the first transactions in an otherwise empty ref store: instead of writing all refs as separate loose files, it instead knows to write them all into a "packed-refs" file directly. This is significantly more efficient than having to write each of the refs as separate "loose" ref. The only user of this optimization is git-clone(1), which only uses this mechanism to write regular refs. Consequently, the implementation does not know how to handle both symbolic and root refs. While fine in the context of git-clone(1), this keeps us from using the mechanism in more cases. Adapt the logic to also support symbolic and root refs by using a second transaction that we use for all of the refs that need to be written as loose refs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs: introduce "initial" transaction flagPatrick Steinhardt1-8/+8
There are two different ways to commit a transaction: - `ref_transaction_commit()` can be used to commit a regular transaction and is what almost every caller wants. - `initial_ref_transaction_commit()` can be used when it is known that the ref store that the transaction is committed for is empty and when there are no concurrent processes. This is used when cloning a new repository. Implementing this via two separate functions has a couple of downsides. First, every reference backend needs to implement a separate callback even in the case where they don't special-case the initial transaction. Second, backends are basically forced to reimplement the whole logic for how to commit the transaction like the "files" backend does, even though backends may wish to only tweak certain behaviour of a "normal" commit. Third, it is awkward that callers must never prepare the transaction as this is somewhat different than how a transaction typically works. Refactor the code such that we instead mark initial transactions via a separate flag when starting the transaction. This addresses all of the mentioned painpoints, where the most important part is that it will allow backends to have way more leeway in how exactly they want to handle the initial transaction. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs/files: move logic to commit initial transactionPatrick Steinhardt1-101/+101
Move the logic to commit initial transactions such that we can start to call it in `files_transaction_finish()` in a subsequent commit without requiring a separate function declaration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs: allow passing flags when setting up a transactionPatrick Steinhardt1-4/+7
Allow passing flags when setting up a transaction such that the behaviour of the transaction itself can be altered. This functionality will be used in a subsequent patch. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-23Merge branch 'ps/environ-wo-the-repository'Junio C Hamano1-12/+20
Code clean-up. * ps/environ-wo-the-repository: (21 commits) environment: stop storing "core.notesRef" globally environment: stop storing "core.warnAmbiguousRefs" globally environment: stop storing "core.preferSymlinkRefs" globally environment: stop storing "core.logAllRefUpdates" globally refs: stop modifying global `log_all_ref_updates` variable branch: stop modifying `log_all_ref_updates` variable repo-settings: track defaults close to `struct repo_settings` repo-settings: split out declarations into a standalone header environment: guard state depending on a repository environment: reorder header to split out `the_repository`-free section environment: move `set_git_dir()` and related into setup layer environment: make `get_git_namespace()` self-contained environment: move object database functions into object layer config: make dependency on repo in `read_early_config()` explicit config: document `read_early_config()` and `read_very_early_config()` environment: make `get_git_work_tree()` accept a repository environment: make `get_graft_file()` accept a repository environment: make `get_index_file()` accept a repository environment: make `get_object_directory()` accept a repository environment: make `get_git_common_dir()` accept a repository ...
2024-09-12Merge branch 'ps/pack-refs-auto-heuristics'Junio C Hamano1-0/+65
"git pack-refs --auto" for the files backend was too aggressive, which has been a bit tamed. * ps/pack-refs-auto-heuristics: refs/files: use heuristic to decide whether to repack with `--auto` t0601: merge tests for auto-packing of refs wrapper: introduce `log2u()`
2024-09-12environment: stop storing "core.preferSymlinkRefs" globallyPatrick Steinhardt1-1/+4
Same as the preceding commit, storing the "core.preferSymlinkRefs" value globally is misdesigned as this setting may be set per repository. There is only a single user of this value anyway, namely the "files" backend. So let's just remove the global variable and read the value of this setting when initializing the backend. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12environment: stop storing "core.logAllRefUpdates" globallyPatrick Steinhardt1-1/+3
The value of "core.logAllRefUpdates" is being stored in the global variable `log_all_ref_updates`. This design is somewhat aged nowadays, where it is entirely possible to access multiple repositories in the same process which all have different values for this setting. So using a single global variable to track it is plain wrong. Remove the global variable. Instead, we now provide a new function part of the repo-settings subsystem that parses the value for a specific repository. While that may require us to read the value multiple times, we work around this by reading it once when the ref backends are set up and caching the value there. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12refs: stop modifying global `log_all_ref_updates` variablePatrick Steinhardt1-11/+12
In refs-related code we modify the global `log_all_ref_updates` variable, which is done because `should_autocreate_reflog()` does not accept passing an `enum log_refs_config` but instead accesses the global variable. Adapt its interface such that the value is provided by the caller, which allows us to compute the proper value locally without having to modify global state. This change requires us to move the enum to "repo-settings.h", or otherwise we get compilation errors due to include cycles. We're about to fully move this setting into the repo-settings subsystem anyway, so this is fine. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12environment: guard state depending on a repositoryPatrick Steinhardt1-0/+2
In "environment.h" we have quite a lot of functions and variables that either explicitly or implicitly depend on `the_repository`. The implicit set of stateful declarations includes for example variables which get populated when parsing a repository's Git configuration. This set of variables is broken by design, as their state often depends on the last repository config that has been parsed. So they may or may not represent the state of `the_repository`. Fixing that is quite a big undertaking, and later patches in this series will demonstrate a solution for a first small set of those variables. So for now, let's guard these with `USE_THE_REPOSITORY_VARIABLE` so that callers are aware of the implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-04refs/files: use heuristic to decide whether to repack with `--auto`Patrick Steinhardt1-0/+65
The `--auto` flag for git-pack-refs(1) allows the ref backend to decide whether or not a repack is in order. This switch has been introduced mostly with the "reftable" backend in mind, which already knows to auto-compact its tables during normal operations. When the flag is set, then it will use the same auto-compaction mechanism and thus end up doing nothing in most cases. The "files" backend does not have any such heuristic yet and instead packs any loose references unconditionally. So we rewrite the complete "packed-refs" file even if there's only a single loose reference to be packed. Even worse, starting with 9f6714ab3e (builtin/gc: pack refs when using `git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is unconditionally executed via our auto maintenance, so we end up repacking references every single time auto maintenance kicks in. And while that commit already mentioned that the "files" backend unconditionally packs refs now, the author obviously didn't quite think about the consequences thereof. So while the idea was sound, we really should have added a heuristic to the "files" backend before implementing it. Introduce a heuristic that decides whether or not it is worth to pack loose references. The important factors to decide here are the number of loose references in comparison to the overall size of the "packed-refs" file. The bigger the "packed-refs" file, the longer it takes to rewrite it and thus we scale up the limit of allowed loose references before we repack. As is the nature of heuristics, this mechansim isn't obviously "correct", but should rather be seen as a tradeoff between how much resources we spend packing refs and how inefficient the ref store becomes. For all I can say, we have successfully been using the exact same heuristic in Gitaly for several years by now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-30refs/files-backend: work around -Wunused-parameterJunio C Hamano1-2/+5
This is needed to build things with -Werror=unused-parameter on a platform without symbolic link support. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-17refs: drop some unused parameters from create_symref_lock()Jeff King1-5/+3
This function was factored out in 57d0b1e2ea (files-backend: extract out `create_symref_lock()`, 2024-05-07), but we never look at the ref_store or refname parameters. We just need the path, which is already contained in the lockfile struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-16Merge branch 'sj/ref-fsck'Junio C Hamano1-1/+114
"git fsck" infrastructure has been taught to also check the sanity of the ref database, in addition to the object database. * sj/ref-fsck: fsck: add ref name check for files backend files-backend: add unified interface for refs scanning builtin/refs: add verify subcommand refs: set up ref consistency check infrastructure fsck: add refs report function fsck: add a unified interface for reporting fsck messages fsck: make "fsck_error" callback generic fsck: rename objects-related fsck error functions fsck: rename "skiplist" to "skip_oids"
2024-08-15Merge branch 'jc/refs-symref-referent'Junio C Hamano1-4/+13
The refs API has been taught to give symref target information to the users of ref iterators, allowing for-each-ref and friends to avoid an extra ref_resolve_* API call per a symbolic ref. * jc/refs-symref-referent: ref-filter: populate symref from iterator refs: add referent to each_ref_fn refs: keep track of unresolved reference value in iterators
2024-08-14Merge branch 'ss/packed-ref-store-leakfix'Junio C Hamano1-0/+1
Leakfix. * ss/packed-ref-store-leakfix: refs/files: prevent memory leak by freeing packed_ref_store
2024-08-09refs: add referent to each_ref_fnJohn Cai1-1/+1
Add a parameter to each_ref_fn so that callers to the ref APIs that use this function as a callback can have acess to the unresolved value of a symbolic ref. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09refs: keep track of unresolved reference value in iteratorsJohn Cai1-3/+12
Since ref iterators do not hold onto the direct value of a reference without resolving it, the only way to get ahold of a direct value of a symbolic ref is to make a separate call to refs_read_symbolic_ref. To make accessing the direct value of a symbolic ref more efficient, let's save the direct value of the ref in the iterators for both the files backend and the reftable backend. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08fsck: add ref name check for files backendshejialuo1-0/+31
The git-fsck(1) only implicitly checks the reference, it does not fully check refs with bad format name such as standalone "@". However, a file ending with ".lock" should not be marked as having a bad ref name. It is expected that concurrent writers may have such lock files. We currently ignore this situation. But for bare ".lock" file, we will report it as error. In order to provide such checks, add a new fsck message id "badRefName" with default ERROR type. Use existing "check_refname_format" to explicit check the ref name. And add a new unit test to verify the functionality. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08files-backend: add unified interface for refs scanningshejialuo1-1/+72
For refs and reflogs, we need to scan its corresponding directories to check every regular file or symbolic link which shares the same pattern. Introduce a unified interface for scanning directories for files-backend. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08refs: set up ref consistency check infrastructureshejialuo1-1/+12
The "struct ref_store" is the base class which contains the "be" pointer which provides backend-specific functions whose interfaces are defined in the "ref_storage_be". We could reuse this polymorphism to define only one interface. For every backend, we need to provide its own function pointer. The interfaces defined in the `ref_storage_be` are carefully structured in semantic. It's organized as the five parts: 1. The name and the initialization interfaces. 2. The ref transaction interfaces. 3. The ref internal interfaces (pack, rename and copy). 4. The ref filesystem interfaces. 5. The reflog related interfaces. To keep consistent with the git-fsck(1), add a new interface named "fsck_refs_fn" to the end of "ref_storage_be". This semantic cannot be grouped into any above five categories. Explicitly add blank line to make it different from others. Last, implement placeholder functions for each ref backends. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-05refs/files: prevent memory leak by freeing packed_ref_storeSven Strickroth1-0/+1
This complements 64a6dd8ffc (refs: implement removal of ref storages, 2024-06-06). Signed-off-by: Sven Strickroth <email@cs-ware.de> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-30refs/files: stop using `the_repository`Patrick Steinhardt1-5/+3
Convert the files ref backend to stop using `the_repository` in favor of the repo that gets passed in via `struct ref_store`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-30refs/files: stop using `the_repository` in `parse_loose_ref_contents()`Patrick Steinhardt1-9/+12
We implicitly rely on `the_repository` in `parse_loose_ref_contents()` by calling `parse_oid_hex()`. Convert the function to instead use `parse_oid_hex_algop()` and have callers pass in the hash algorithm to use. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02Merge branch 'ps/use-the-repository'Junio C Hamano1-3/+5
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-06-20Merge branch 'kn/update-ref-symref'Junio C Hamano1-8/+9
"git update-ref --stdin" learned to handle transactional updates of symbolic-refs. * kn/update-ref-symref: update-ref: add support for 'symref-update' command reftable: pick either 'oid' or 'target' for new updates update-ref: add support for 'symref-create' command update-ref: add support for 'symref-delete' command update-ref: add support for 'symref-verify' command refs: specify error for regular refs with `old_target` refs: create and use `ref_update_expects_existing_old_ref()`
2024-06-17Merge branch 'ps/ref-storage-migration'Junio C Hamano1-14/+110
A new command has been added to migrate a repository that uses the files backend for its ref storage to use the reftable backend, with limitations. * ps/ref-storage-migration: builtin/refs: new command to migrate ref storage formats refs: implement logic to migrate between ref storage formats refs: implement removal of ref storages worktree: don't store main worktree twice reftable: inline `merged_table_release()` refs/files: fix NULL pointer deref when releasing ref store refs/files: extract function to iterate through root refs refs/files: refactor `add_pseudoref_and_head_entries()` refs: allow to skip creation of reflog entries refs: pass storage format to `ref_store_init()` explicitly refs: convert ref storage format to an enum setup: unset ref storage when reinitializing repository version
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt1-0/+2
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14hash: require hash algorithm in `oidread()` and `oidclr()`Patrick Steinhardt1-3/+3
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-13Merge branch 'ps/ref-storage-migration' into ps/use-the-repositoryJunio C Hamano1-14/+110
* ps/ref-storage-migration: builtin/refs: new command to migrate ref storage formats refs: implement logic to migrate between ref storage formats refs: implement removal of ref storages worktree: don't store main worktree twice reftable: inline `merged_table_release()` refs/files: fix NULL pointer deref when releasing ref store refs/files: extract function to iterate through root refs refs/files: refactor `add_pseudoref_and_head_entries()` refs: allow to skip creation of reflog entries refs: pass storage format to `ref_store_init()` explicitly refs: convert ref storage format to an enum setup: unset ref storage when reinitializing repository version
2024-06-07refs: specify error for regular refs with `old_target`Karthik Nayak1-6/+8
When a reference update tries to update a symref, but the ref in question is actually a regular ref, we raise an error. However the error raised in this situation is: verifying symref target: '<ref>': reference is missing but expected <old-target> which is very generic and doesn't indicate the mismatch of types. Let's make this error more specific: cannot lock ref '<ref>': expected symref with target '<old-target>': but is a regular ref so that users have a clearer understanding. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07refs: create and use `ref_update_expects_existing_old_ref()`Karthik Nayak1-2/+1
The files and reftable backend, need to check if a ref must exist, so that the required validation can be done. A ref must exist only when the `old_oid` value of the update has been explicitly set and it is not the `null_oid` value. Since we also support symrefs now, we need to ensure that even when `old_target` is set a ref must exist. While this was missed when we added symref support in transactions, there are no active users of this path. As we introduce the 'symref-verify' command in the upcoming commits, it is important to fix this. So let's export this to a function called `ref_update_expects_existing_old_ref()` and expose it internally via 'refs-internal.h'. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06refs: implement removal of ref storagesPatrick Steinhardt1-0/+62
We're about to introduce logic to migrate ref storages. One part of the migration will be to delete the files that are part of the old ref storage format. We don't yet have a way to delete such data generically across ref backends though. Implement a new `delete` callback and expose it via a new `ref_storage_delete()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06refs/files: extract function to iterate through root refsPatrick Steinhardt1-9/+42
Extract a new function that can be used to iterate through all root refs known to the "files" backend. This will be used in the next commit, where we start to teach ref backends to remove themselves. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06refs/files: refactor `add_pseudoref_and_head_entries()`Patrick Steinhardt1-9/+6
The `add_pseudoref_and_head_entries()` function accepts both the ref store as well as a directory name as input. This is unnecessary though as the ref store already uniquely identifies the root directory of the ref store anyway. Furthermore, the function is misnamed now that we have clarified the meaning of pseudorefs as it doesn't add pseudorefs, but root refs. Rename it accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06refs: allow to skip creation of reflog entriesPatrick Steinhardt1-0/+4
The ref backends do not have any way to disable the creation of reflog entries. This will be required for upcoming ref format migration logic so that we do not create any entries that didn't exist in the original ref database. Provide a new `REF_SKIP_CREATE_REFLOG` flag that allows the caller to disable reflog entry creation. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-30Merge branch 'ps/refs-without-the-repository-updates'Junio C Hamano1-23/+36
Further clean-up the refs subsystem to stop relying on the_repository, and instead use the repository associated to the ref_store object. * ps/refs-without-the-repository-updates: refs/packed: remove references to `the_hash_algo` refs/files: remove references to `the_hash_algo` refs/files: use correct repository refs: remove `dwim_log()` refs: drop `git_default_branch_name()` refs: pass repo when peeling objects refs: move object peeling into "object.c" refs: pass ref store when detecting dangling symrefs refs: convert iteration over replace refs to accept ref store refs: retrieve worktree ref stores via associated repository refs: refactor `resolve_gitlink_ref()` to accept a repository refs: pass repo when retrieving submodule ref store refs: track ref stores via strmap refs: implement releasing ref storages refs: rename `init_db` callback to avoid confusion refs: adjust names for `init` and `init_db` callbacks
2024-05-28Merge branch 'ps/pseudo-ref-terminology'Junio C Hamano1-2/+1
Terminology to call various ref-like things are getting straightened out. * ps/pseudo-ref-terminology: refs: refuse to write pseudorefs ref-filter: properly distinuish pseudo and root refs refs: pseudorefs are no refs refs: classify HEAD as a root ref refs: do not check ref existence in `is_root_ref()` refs: rename `is_special_ref()` to `is_pseudo_ref()` refs: rename `is_pseudoref()` to `is_root_ref()` Documentation/glossary: define root refs as refs Documentation/glossary: clarify limitations of pseudorefs Documentation/glossary: redefine pseudorefs as special refs
2024-05-23Merge branch 'it/refs-name-conflict'Junio C Hamano1-1/+3
Expose "name conflict" error when a ref creation fails due to D/F conflict in the ref namespace, to improve an error message given by "git fetch". * it/refs-name-conflict: refs: return conflict error when checking packed refs
2024-05-23Merge branch 'ps/pseudo-ref-terminology' into ps/ref-storage-migrationJunio C Hamano1-2/+1
* ps/pseudo-ref-terminology: refs: refuse to write pseudorefs ref-filter: properly distinuish pseudo and root refs refs: pseudorefs are no refs refs: classify HEAD as a root ref refs: do not check ref existence in `is_root_ref()` refs: rename `is_special_ref()` to `is_pseudo_ref()` refs: rename `is_pseudoref()` to `is_root_ref()` Documentation/glossary: define root refs as refs Documentation/glossary: clarify limitations of pseudorefs Documentation/glossary: redefine pseudorefs as special refs
2024-05-23Merge branch 'ps/refs-without-the-repository-updates' into ↵Junio C Hamano1-23/+36
ps/ref-storage-migration * ps/refs-without-the-repository-updates: refs/packed: remove references to `the_hash_algo` refs/files: remove references to `the_hash_algo` refs/files: use correct repository refs: remove `dwim_log()` refs: drop `git_default_branch_name()` refs: pass repo when peeling objects refs: move object peeling into "object.c" refs: pass ref store when detecting dangling symrefs refs: convert iteration over replace refs to accept ref store refs: retrieve worktree ref stores via associated repository refs: refactor `resolve_gitlink_ref()` to accept a repository refs: pass repo when retrieving submodule ref store refs: track ref stores via strmap refs: implement releasing ref storages refs: rename `init_db` callback to avoid confusion refs: adjust names for `init` and `init_db` callbacks
2024-05-17refs/files: remove references to `the_hash_algo`Patrick Steinhardt1-2/+2
Remove references to `the_hash_algo` in favor of the hash algo specified by the repository associated with the files ref store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs/files: use correct repositoryPatrick Steinhardt1-10/+13
There are several places in the "files" backend where we use `the_repository` instead of the repository associated with the ref store itself. Adapt those to use the correct repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: implement releasing ref storagesPatrick Steinhardt1-0/+10
Ref storages are typically only initialized once for `the_repository` and then never released. Until now we got away with that without causing memory leaks because `the_repository` stays reachable, and because the ref backend is reachable via `the_repository` its memory basically never leaks. This is about to change though because of the upcoming migration logic, which will create a secondary ref storage. In that case, we will either have to release the old or new ref storage to avoid leaks. Implement a new `release` callback and expose it via a new `ref_storage_release()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: rename `init_db` callback to avoid confusionPatrick Steinhardt1-6/+6
Reference backends have two callbacks `init` and `init_db`. The similarity of these two callbacks has repeatedly confused me whenever I was looking at them, where I always had to look up which of them does what. Rename the `init_db` callback to `create_on_disk`, which should hopefully be clearer. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: adjust names for `init` and `init_db` callbacksPatrick Steinhardt1-5/+5
The names of the functions that implement the `init` and `init_db` callbacks in the "files" and "packed" backends do not match the names of the callbacks, which is inconsistent. Rename them so that they match, which makes it easier to discover their respective implementations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15refs: classify HEAD as a root refPatrick Steinhardt1-2/+1
Root refs are those refs that live in the root of the ref hierarchy. Our old and venerable "HEAD" reference falls into this category, but we don't yet classify it as such in `is_root_ref()`. Adapt the function to also treat "HEAD" as a root ref. This change is safe to do for all current callers: - `ref_kind_from_refname()` already handles "HEAD" explicitly before calling `is_root_ref()`. - The "files" and "reftable" backends explicitly call both `is_root_ref()` and `is_headref()` together. This also aligns behaviour or `is_root_ref()` and `is_headref()` such that we stop checking for ref existence. This changes semantics for our backends: - In the reftable backend we already know that the ref must exist because `is_headref()` is called as part of the ref iterator. The existence check is thus redundant, and the change is safe to do. - In the files backend we use it when populating root refs, where we would skip adding the "HEAD" file if it was not possible to resolve it. The new behaviour is to instead mark "HEAD" as broken, which will cause us to emit warnings in various places. As there are no callers of `is_headref()` left afer the refactoring, we can absorb it completely into `is_root_ref()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15refs: do not check ref existence in `is_root_ref()`Patrick Steinhardt1-2/+2
Before this patch series, root refs except for "HEAD" and our special refs were classified as pseudorefs. Furthermore, our terminology clarified that pseudorefs must not be symbolic refs. This restriction is enforced in `is_root_ref()`, which explicitly checks that a supposed root ref resolves to an object ID without recursing. This has been extremely confusing right from the start because (in old terminology) a ref name may sometimes be a pseudoref and sometimes not depending on whether it is a symbolic or regular ref. This behaviour does not seem reasonable at all and I very much doubt that it results in anything sane. Last but not least, the current behaviour can actually lead to a segfault when calling `is_root_ref()` with a reference that either does not exist or that is a symbolic ref because we never initialized `oid`, but then read it via `is_null_oid()`. We have now changed terminology to clarify that pseudorefs are really only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live in the root of the ref hierarchy are just plain refs. Thus, we do not need to check whether the ref is symbolic or not. In fact, we can now avoid looking up the ref completely as the name is sufficient for us to figure out whether something would be a root ref or not. This change of course changes semantics for our callers. As there are only three of them we can assess each of them individually: - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs. It's clear that the intent is to classify based on the ref name, only. - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to filter root refs. Again, using existence checks is pointless here as the iterator has just surfaced the ref, so we know it does exist. - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to determine whether it should add a ref to the root directory of its iterator. This had the effect that we skipped over any files that are either a symbolic ref, or which are not a ref at all. The new behaviour is to include symbolic refs know, which aligns us with the adapted terminology. Furthermore, files which look like root refs but aren't are now mark those as "broken". As broken refs are not surfaced by our tooling, this should not lead to a change in user-visible behaviour, but may cause us to emit warnings. This feels like the right thing to do as we would otherwise just silently ignore corrupted root refs completely. So in all cases the existence check was either superfluous, not in line with the adapted terminology or masked potential issues. This commit thus changes the behaviour as proposed and drops the existence check altogether. Add a test that verifies that this does not change user-visible behaviour. Namely, we still don't want to show broken refs to the user by default in git-for-each-ref(1). What this does allow though is for internal callers to surface dangling root refs when they pass in the `DO_FOR_EACH_INCLUDE_BROKEN` flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15refs: rename `is_pseudoref()` to `is_root_ref()`Patrick Steinhardt1-1/+1
Rename `is_pseudoref()` to `is_root_ref()` to adapt to the newly defined terminology in our gitglossary(7). Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07refs: remove `create_symref` and associated dead codeKarthik Nayak1-67/+0
In the previous commits, we converted `refs_create_symref()` to utilize transactions to perform symref updates. Earlier `refs_create_symref()` used `create_symref()` to do the same. We can now remove `create_symref()` and any code associated with it which is no longer used. We remove `create_symref()` code from all the reference backends and also remove it entirely from the `ref_storage_be` struct. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07refs: add support for transactional symref updatesKarthik Nayak1-21/+92
The reference backends currently support transactional reference updates. While this is exposed to users via 'git-update-ref' and its '--stdin' mode, it is also used internally within various commands. However, we do not support transactional updates of symrefs. This commit adds support for symrefs in both the 'files' and the 'reftable' backend. Here, we add and use `ref_update_has_null_new_value()`, a helper function which is used to check if there is a new_value in a reference update. The new value could either be a symref target `new_target` or a OID `new_oid`. We also add another common function `ref_update_check_old_target` which will be used to check if the update's old_target corresponds to a reference's current target. Now transactional updates (verify, create, delete, update) can be used for: - regular refs - symbolic refs - conversion of regular to symbolic refs and vice versa This also allows us to expose this to users via new commands in 'git-update-ref' in the future. Note that a dangling symref update does not record a new reflog entry, which is unchanged before and after this commit. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07refs: move `original_update_refname` to 'refs.c'Karthik Nayak1-16/+5
The files backend and the reftable backend implement `original_update_refname` to obtain the original refname of the update. Move it out to 'refs.c' and only expose it internally to the refs library. This will be used in an upcoming commit to also introduce another common functionality for the two backends. We also rename the function to `ref_update_original_update_refname` to keep it consistent with the upcoming other 'ref_update_*' functions that'll be introduced. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07files-backend: extract out `create_symref_lock()`Karthik Nayak1-14/+37
The function `create_symref_locked()` creates a symref by creating a '<symref>.lock' file and then committing the symref lock, which creates the final symref. Extract the early half of `create_symref_locked()` into a new helper function `create_symref_lock()`. Because the name of the new function is too similar to the original, rename the original to `create_and_commit_symref()` to avoid confusion. The new function `create_symref_locked()` can be used to create the symref lock in a separate step from that of committing it. This allows to add transactional support for symrefs, where the lock would be created in the preparation step and the lock would be committed in the finish step. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07refs: accept symref values in `ref_transaction_update()`Karthik Nayak1-6/+6
The function `ref_transaction_update()` obtains ref information and flags to create a `ref_update` and add them to the transaction at hand. To extend symref support in transactions, we need to also accept the old and new ref targets and process it. This commit adds the required parameters to the function and modifies all call sites. The two parameters added are `new_target` and `old_target`. The `new_target` is used to denote what the reference should point to when the transaction is applied. Some functions allow this parameter to be NULL, meaning that the reference is not changed. The `old_target` denotes the value the reference must have before the update. Some functions allow this parameter to be NULL, meaning that the old value of the reference is not checked. We also update the internal function `ref_transaction_add_update()` similarly to take the two new parameters. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-06refs: return conflict error when checking packed refsIvan Tse1-1/+3
The TRANSACTION_NAME_CONFLICT error code refers to a failure to create a ref due to a name conflict with another ref. An example of this is a directory/file conflict such as ref names A/B and A. "git fetch" uses this error code to more accurately describe the error by recommending to the user that they try running "git remote prune" to remove any old refs that are deleted by the remote which would clear up any directory/file conflicts. This helpful error message is not displayed when the conflicted ref is stored in packed refs. This change fixes this by ensuring error return code consistency in `lock_raw_ref`. Signed-off-by: Ivan Tse <ivan.tse1@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05Merge branch 'kn/for-all-refs'Junio C Hamano1-34/+93
"git for-each-ref" learned "--include-root-refs" option to show even the stuff outside the 'refs/' hierarchy. * kn/for-all-refs: for-each-ref: add new option to include root refs ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR' refs: introduce `refs_for_each_include_root_refs()` refs: extract out `loose_fill_ref_dir_regular_file()` refs: introduce `is_pseudoref()` and `is_headref()`
2024-02-23refs: introduce `refs_for_each_include_root_refs()`Karthik Nayak1-5/+60
Introduce a new ref iteration flag `DO_FOR_EACH_INCLUDE_ROOT_REFS`, which will be used to iterate over regular refs plus pseudorefs and HEAD. Refs which fall outside the `refs/` and aren't either pseudorefs or HEAD are more of a grey area. This is because we don't block the users from creating such refs but they are not officially supported. Introduce `refs_for_each_include_root_refs()` which calls `do_for_each_ref()` with this newly introduced flag. In `refs/files-backend.c`, introduce a new function `add_pseudoref_and_head_entries()` to add pseudorefs and HEAD to the `ref_dir`. We then finally call `add_pseudoref_and_head_entries()` whenever the `DO_FOR_EACH_INCLUDE_ROOT_REFS` flag is set. Any new ref backend will also have to implement similar changes on its end. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23refs: extract out `loose_fill_ref_dir_regular_file()`Karthik Nayak1-29/+33
Extract out the code for adding a single file to the loose ref dir as `loose_fill_ref_dir_regular_file()` from `loose_fill_ref_dir()` in `refs/files-backend.c`. This allows us to use this function independently in the following commits where we add code to also add pseudorefs to the ref dir. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: stop resolving ref corresponding to reflogsPatrick Steinhardt1-10/+2
The reflog iterator tries to resolve the corresponding ref for every reflog that it is about to yield. Historically, this was done due to multiple reasons: - It ensures that the refname is safe because we end up calling `check_refname_format()`. Also, non-conformant refnames are skipped altogether. - The iterator used to yield the resolved object ID as well as its flags to the callback. This info was never used though, and the corresponding parameters were dropped in the preceding commit. - When a ref is corrupt then the reflog is not emitted at all. We're about to introduce a new `git reflog list` subcommand that will print all reflogs that the refdb knows about. Skipping over reflogs whose refs are corrupted would be quite counterproductive in this case as the user would have no way to learn about reflogs which may still exist in their repository to help and rescue such a corrupted ref. Thus, the only remaining reason for why we'd want to resolve the ref is to verify its refname. Refactor the code to call `check_refname_format()` directly instead of trying to resolve the ref. This is significantly more efficient given that we don't have to hit the object database anymore to list reflogs. And second, it ensures that we end up showing reflogs of broken refs, which will help to make the reflog more useful. Note that this really only impacts the case where the corresponding ref is corrupt. Reflogs for nonexistent refs would have been returned to the caller beforehand already as we did not pass `RESOLVE_REF_READING` to the function, and thus `refs_resolve_ref_unsafe()` would have returned successfully in that case. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: drop unused params from the reflog iterator callbackPatrick Steinhardt1-7/+1
The ref and reflog iterators share much of the same underlying code to iterate over the corresponding entries. This results in some weird code because the reflog iterator also exposes an object ID as well as a flag to the callback function. Neither of these fields do refer to the reflog though -- they refer to the corresponding ref with the same name. This is quite misleading. In practice at least the object ID cannot really be implemented in any other way as a reflog does not have a specific object ID in the first place. This is further stressed by the fact that none of the callbacks except for our test helper make use of these fields. Split up the infrastucture so that ref and reflog iterators use separate callback signatures. This allows us to drop the nonsensical fields from the reflog iterator. Note that internally, the backends still use the same shared infra to iterate over both types. As the backends should never end up being called directly anyway, this is not much of a problem and thus kept as-is for simplicity's sake. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: always treat iterators as orderedPatrick Steinhardt1-4/+3
In the preceding commit we have converted the reflog iterator of the "files" backend to be ordered, which was the only remaining ref iterator that wasn't ordered. Refactor the ref iterator infrastructure so that we always assume iterators to be ordered, thus simplifying the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs/files: sort merged worktree and common reflogsPatrick Steinhardt1-28/+2
When iterating through reflogs in a worktree we create a merged iterator that merges reflogs from both refdbs. The resulting refs are ordered so that instead we first return all worktree reflogs before we return all common refs. This is the only remaining case where a ref iterator returns entries in a non-lexicographic order. The result would look something like the following (listed with a command we introduce in a subsequent commit): ``` $ git reflog list HEAD refs/worktree/per-worktree refs/heads/main refs/heads/wt ``` So we first print the per-worktree reflogs in lexicographic order, then the common reflogs in lexicographic order. This is confusing and not consistent with how we print per-worktree refs, which are exclusively sorted lexicographically. Sort reflogs lexicographically in the same way as we sort normal refs. As this is already implemented properly by the "reftable" backend via a separate selection function, we simply pull out that logic and reuse it for the "files" backend. As logs are properly sorted now, mark the merged reflog iterator as sorted. Tests will be added in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs/files: sort reflogs returned by the reflog iteratorPatrick Steinhardt1-2/+2
We use a directory iterator to return reflogs via the reflog iterator. This iterator returns entries in the same order as readdir(3P) would and will thus yield reflogs with no discernible order. Set the new `DIR_ITERATOR_SORTED` flag that was introduced in the preceding commit so that the order is deterministic. While the effect of this can only been observed in a test tool, a subsequent commit will start to expose this functionality to users via a new `git reflog list` subcommand. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-26Merge branch 'ps/worktree-refdb-initialization'Junio C Hamano1-6/+31
Instead of manually creating refs/ hierarchy on disk upon a creation of a secondary worktree, which is only usable via the files backend, use the refs API to populate it. * ps/worktree-refdb-initialization: builtin/worktree: create refdb via ref backend worktree: expose interface to look up worktree by name builtin/worktree: move setup of commondir file earlier refs/files: skip creation of "refs/{heads,tags}" for worktrees setup: move creation of "refs/" into the files backend refs: prepare `refs_init_db()` for initializing worktree refs
2024-01-16Merge branch 'ps/refstorage-extension'Junio C Hamano1-1/+0
Introduce a new extension "refstorage" so that we can mark a repository that uses a non-default ref backend, like reftable. * ps/refstorage-extension: t9500: write "extensions.refstorage" into config builtin/clone: introduce `--ref-format=` value flag builtin/init: introduce `--ref-format=` value flag builtin/rev-parse: introduce `--show-ref-format` flag t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar setup: introduce GIT_DEFAULT_REF_FORMAT envvar setup: introduce "extensions.refStorage" extension setup: set repository's formats on init setup: start tracking ref storage format refs: refactor logic to look up storage backends worktree: skip reading HEAD when repairing worktrees t: introduce DEFAULT_REPO_FORMAT prereq
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano1-2/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-08refs/files: skip creation of "refs/{heads,tags}" for worktreesPatrick Steinhardt1-8/+14
The files ref backend will create both "refs/heads" and "refs/tags" in the Git directory. While this logic makes sense for normal repositories, it does not for worktrees because those refs are "common" refs that would always be contained in the main repository's ref database. Introduce a new flag telling the backend that it is expected to create a per-worktree ref database and skip creation of these dirs in the files backend when the flag is set. No other backends (currently) need worktree-specific logic, so this is the only required change to start creating per-worktree ref databases via `refs_init_db()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08setup: move creation of "refs/" into the files backendPatrick Steinhardt1-0/+17
When creating the ref database we unconditionally create the "refs/" directory in "setup.c". This is a mandatory prerequisite for all Git repositories regardless of the ref backend in use, because Git will be unable to detect the directory as a repository if "refs/" doesn't exist. We are about to add another new caller that will want to create a ref database when creating worktrees. We would require the same logic to create the "refs/" directory even though the caller really should not care about such low-level details. Ideally, the ref database should be fully initialized after calling `refs_init_db()`. Move the code to create the directory into the files backend itself to make it so. This means that future ref backends will also need to have equivalent logic around to ensure that the directory exists, but it seems a lot more sensible to have it this way round than to require callers to create the directory themselves. An alternative to this would be to create "refs/" in `refs_init_db()` directly. This feels conceptually unclean though as the creation of the refdb is now cluttered across different callsites. Furthermore, both the "files" and the upcoming "reftable" backend write backend-specific data into the "refs/" directory anyway, so splitting up this logic would only make it harder to reason about. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08refs: prepare `refs_init_db()` for initializing worktree refsPatrick Steinhardt1-1/+3
The purpose of `refs_init_db()` is to initialize the on-disk files of a new ref database. The function is quite inflexible right now though, as callers can neither specify the `struct ref_store` nor can they pass any flags. Refactor the interface to accept both of these. This will be required so that we can start initializing per-worktree ref databases via the ref backend instead of open-coding the initialization in "worktree.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02refs: refactor logic to look up storage backendsPatrick Steinhardt1-1/+0
In order to look up ref storage backends, we're currently using a linked list of backends, where each backend is expected to set up its `next` pointer to the next ref storage backend. This is kind of a weird setup as backends need to be aware of other backends without much of a reason. Refactor the code so that the array of backends is centrally defined in "refs.c", where each backend is now identified by an integer constant. Expose functions to translate from those integer constants to the name and vice versa, which will be required by subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren1-2/+0
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-17refs: remove `delete_refs` callback from backendsPatrick Steinhardt1-7/+0
Now that `refs_delete_refs` is implemented in a generic way via the ref transaction interfaces there are no callers left that invoke the `delete_refs` callback anymore. Remove it from all of our backends. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-17refs: deduplicate code to delete referencesPatrick Steinhardt1-45/+1
Both the files and the packed-refs reference backends now use the same generic transactions-based code to delete references. Let's pull these implementations up into `refs_delete_refs()` to deduplicate the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-17refs/files: use transactions to delete referencesPatrick Steinhardt1-32/+34
In the `files_delete_refs()` callback function of the files backend we implement deletion of references. This is done in two steps: 1. We lock the packed-refs file and delete all references from it in a single transaction. 2. We delete all loose references via separate calls to `refs_delete_ref()`. These steps essentially duplicate the logic around locking and deletion order that we already have in the transactional interfaces, where we do know to lock and evict references from the packed-refs file. Despite the fact that we duplicate the logic, it's also less efficient than if we used a single generic transaction: - The transactional interface knows to skip locking of the packed refs in case they don't contain any of the refs which are about to be deleted. - We end up creating N+1 separate reference transactions, one for the packed-refs file and N for the individual loose references. Refactor the code to instead delete references via a single transaction. As we don't assert the expected old object ID this is equivalent to the previous behaviour, and we already do the same in the packed-refs backend. Despite the fact that the result is simpler to reason about, this change also results in improved performance. The following benchmarks have been executed in linux.git: ``` $ hyperfine -n '{rev}, packed={packed} refcount={refcount}' \ -L packed true,false -L refcount 1,1000 -L rev master,pks-ref-store-generic-delete-refs \ --setup 'git -C /home/pks/Development/git switch --detach {rev} && make -C /home/pks/Development/git -j17' \ --prepare 'printf "create refs/heads/new-branch-%d HEAD\n" $(seq {refcount}) | git -C /home/pks/Reproduction/linux.git update-ref --stdin && if test {packed} = true; then git pack-refs --all; fi' \ --warmup=10 \ '/home/pks/Development/git/bin-wrappers/git -C /home/pks/Reproduction/linux.git branch -d new-branch-{1..{refcount}}' Benchmark 1: master packed=true refcount=1 Time (mean ± σ): 7.8 ms ± 1.6 ms [User: 3.4 ms, System: 4.4 ms] Range (min … max): 5.5 ms … 11.0 ms 120 runs Benchmark 2: master packed=false refcount=1 Time (mean ± σ): 7.0 ms ± 1.1 ms [User: 3.2 ms, System: 3.8 ms] Range (min … max): 5.7 ms … 9.8 ms 180 runs Benchmark 3: master packed=true refcount=1000 Time (mean ± σ): 283.8 ms ± 5.2 ms [User: 45.7 ms, System: 231.5 ms] Range (min … max): 276.7 ms … 291.6 ms 10 runs Benchmark 4: master packed=false refcount=1000 Time (mean ± σ): 284.4 ms ± 5.3 ms [User: 44.2 ms, System: 233.6 ms] Range (min … max): 277.1 ms … 293.3 ms 10 runs Benchmark 5: generic-delete-refs packed=true refcount=1 Time (mean ± σ): 6.2 ms ± 1.8 ms [User: 2.3 ms, System: 3.9 ms] Range (min … max): 4.1 ms … 12.2 ms 142 runs Benchmark 6: generic-delete-refs packed=false refcount=1 Time (mean ± σ): 7.1 ms ± 1.4 ms [User: 2.8 ms, System: 4.3 ms] Range (min … max): 4.2 ms … 10.8 ms 157 runs Benchmark 7: generic-delete-refs packed=true refcount=1000 Time (mean ± σ): 198.9 ms ± 1.7 ms [User: 29.5 ms, System: 165.7 ms] Range (min … max): 196.1 ms … 201.4 ms 10 runs Benchmark 8: generic-delete-refs packed=false refcount=1000 Time (mean ± σ): 199.7 ms ± 7.8 ms [User: 32.2 ms, System: 163.2 ms] Range (min … max): 193.8 ms … 220.7 ms 10 runs Summary generic-delete-refs packed=true refcount=1 ran 1.14 ± 0.37 times faster than master packed=false refcount=1 1.15 ± 0.40 times faster than generic-delete-refs packed=false refcount=1 1.26 ± 0.44 times faster than master packed=true refcount=1 32.24 ± 9.17 times faster than generic-delete-refs packed=true refcount=1000 32.36 ± 9.29 times faster than generic-delete-refs packed=false refcount=1000 46.00 ± 13.10 times faster than master packed=true refcount=1000 46.10 ± 13.13 times faster than master packed=false refcount=1000 ``` Especially in the case where we have many references we can see a clear performance speedup of nearly 30%. This is in contrast to the stated objecive in a27dcf89b68 (refs: make delete_refs() virtual, 2016-09-04), where the virtual `delete_refs()` function was introduced with the intent to speed things up rather than making things slower. So it seems like we have outlived the need for a virtual function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09files-backend.c: avoid stat in 'loose_fill_ref_dir'Victoria Dye1-9/+5
Modify the 'readdir' loop in 'loose_fill_ref_dir' to, rather than 'stat' a file to determine whether it is a directory or not, use 'get_dtype'. Currently, the loop uses 'stat' to determine whether each dirent is a directory itself or not in order to construct the appropriate ref cache entry. If 'stat' fails (returning a negative value), the dirent is silently skipped; otherwise, 'S_ISDIR(st.st_mode)' is used to check whether the entry is a directory. On platforms that include an entry's d_type in in the 'dirent' struct, this extra 'stat' check is redundant. We can use the 'get_dtype' method to extract this information on platforms that support it (i.e. where NO_D_TYPE_IN_DIRENT is unset), and derive it with 'stat' on platforms that don't. Because 'stat' is an expensive call, this confers a modest-but-noticeable performance improvement when iterating over large numbers of refs (approximately 20% speedup in 'git for-each-ref' in a 30k ref repo). Unlike other existing usage of 'get_dtype', the 'follow_symlinks' arg is set to 1 to replicate the existing handling of symlink dirents. This unfortunately requires calling 'stat' on the associated entry regardless of platform, but symlinks in the loose ref store are highly unlikely since they'd need to be created manually by a user. Note that this patch also changes the condition for skipping creation of a ref entry from "when 'stat' fails" to "when the d_type is anything other than DT_REG or DT_DIR". If a dirent's d_type is DT_UNKNOWN (either because the platform doesn't support d_type in dirents or some other reason) or DT_LNK, 'get_dtype' will try to derive the underlying type with 'stat'. If the 'stat' fails, the d_type will remain 'DT_UNKNOWN' and dirent will be skipped. However, it will also be skipped if it is any other valid d_type (e.g. DT_FIFO for named pipes, DT_LNK for a nested symlink). Git does not handle these properly anyway, so we can safely constrain accepted types to directories and regular files. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-21Merge branch 'tb/refs-exclusion-and-packed-refs'Junio C Hamano1-2/+3
Enumerating refs in the packed-refs file, while excluding refs that match certain patterns, has been optimized. * tb/refs-exclusion-and-packed-refs: ls-refs.c: avoid enumerating hidden refs where possible upload-pack.c: avoid enumerating hidden refs where possible builtin/receive-pack.c: avoid enumerating hidden references refs.h: implement `hidden_refs_to_excludes()` refs.h: let `for_each_namespaced_ref()` take excluded patterns revision.h: store hidden refs in a `strvec` refs/packed-backend.c: add trace2 counters for jump list refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) refs/packed-backend.c: refactor `find_reference_location()` refs: plumb `exclude_patterns` argument throughout builtin/for-each-ref.c: add `--exclude` option ref-filter.c: parameterize match functions over patterns ref-filter: add `ref_filter_clear()` ref-filter: clear reachable list pointers after freeing ref-filter.h: provide `REF_FILTER_INIT` refs.c: rename `ref_filter`
2023-07-10refs: plumb `exclude_patterns` argument throughoutTaylor Blau1-2/+3
The subsequent patch will want to access an optional `excluded_patterns` array within `refs/packed-backend.c` that will cull out certain references matching any of the given patterns on a best-effort basis. To do so, the refs subsystem needs to be updated to pass this value across a number of different locations. Prepare for a future patch by introducing this plumbing now, passing NULLs at top-level APIs in order to make that patch less noisy and more easily readable. Signed-off-by: Taylor Blau <me@ttaylorr.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21repository: remove unnecessary include of path.hElijah Newren1-0/+1
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren1-1/+1
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-13Merge branch 'jc/pack-ref-exclude-include'Junio C Hamano1-10/+17
"git pack-refs" learns "--include" and "--exclude" to tweak the ref hierarchy to be packed using pattern matching. * jc/pack-ref-exclude-include: pack-refs: teach pack-refs --include option pack-refs: teach --exclude option to exclude refs from being packed docs: clarify git-pack-refs --all will pack all refs
2023-05-12pack-refs: teach pack-refs --include optionJohn Cai1-8/+10
Allow users to be more selective over which refs to pack by adding an --include option to git-pack-refs. The existing options allow some measure of selectivity. By default git-pack-refs packs all tags. --all can be used to include all refs, and the previous commit added the ability to exclude certain refs with --exclude. While these knobs give the user some selection over which refs to pack, it could be useful to give more control. For instance, a repository may have a set of branches that are rarely updated and would benefit from being packed. --include would allow the user to easily include a set of branches to be packed while leaving everything else unpacked. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-12pack-refs: teach --exclude option to exclude refs from being packedJohn Cai1-6/+10
At GitLab, we have a system that creates ephemeral internal refs that don't live long before getting deleted. Having an option to exclude certain refs from a packed-refs file allows these internal references to be deleted much more efficiently. Add an --exclude option to the pack-refs builtin, and use the ref exclusions API to exclude certain refs from being packed into the final packed-refs file Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24hash-ll.h: split out of hash.h to remove dependency on repository.hElijah Newren1-0/+1
hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24copy.h: move declarations for copy.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21write-or-die.h: move declarations for write-or-die.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21setup.h: move declarations for setup.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21wrapper.h: move declarations for wrapper.c functions from cache.hElijah Newren1-1/+2
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23ident.h: move ident-related declarations out of cache.hElijah Newren1-0/+1
These functions were all defined in a separate ident.c already, so create ident.h and move the declarations into that file. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-19refs: unify parse_worktree_ref() and ref_type()Han-Wen Nienhuys1-44/+36
The logic to handle worktree refs (worktrees/NAME/REF and main-worktree/REF) existed in two places: * ref_type() in refs.c * parse_worktree_ref() in worktree.c Collapse this logic together in one function parse_worktree_ref(): this avoids having to cross-check the result of parse_worktree_ref() and ref_type(). Introduce enum ref_worktree_type, which is slightly different from enum ref_type. The latter is a misleading name (one would think that 'ref_type' would have the symref option). Instead, enum ref_worktree_type only makes explicit how a refname relates to a worktree. From this point of view, HEAD and refs/bisect/abc are the same: they specify the current worktree implicitly. The files-backend must avoid packing refs/bisect/* and friends into packed-refs, so expose is_per_worktree_ref() separately. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason1-7/+7
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused virtual method parametersJeff King1-5/+5
The refs code uses various polymorphic types (e.g., loose vs packed ref_stores, abstracted iterators). Not every virtual function or callback needs all of its parameters. Let's mark the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused each_ref_fn parametersJeff King1-1/+3
Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13Revert "Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'"Junio C Hamano1-19/+7
This reverts commit 991b4d47f0accd3955d05927d5ce434e03ffbdb6, reversing changes made to bcd020f88e1e22f38422ac3f73ab06b34ec4bef1.
2022-03-29Merge branch 'ab/refs-various-fixes'Junio C Hamano1-32/+32
Code clean-up. * ab/refs-various-fixes: refs debug: add a wrapper for "read_symbolic_ref" packed-backend: remove stub BUG(...) functions misc *.c: use designated initializers for struct assignments refs: use designated initializers for "struct ref_iterator_vtable" refs: use designated initializers for "struct ref_storage_be"
2022-03-25Merge branch 'ps/fsync-refs'Junio C Hamano1-0/+1
Updates to refs traditionally weren't fsync'ed, but we can configure using core.fsync variable to do so. * ps/fsync-refs: core.fsync: new option to harden references
2022-03-17refs: use designated initializers for "struct ref_iterator_vtable"Ævar Arnfjörð Bjarmason1-6/+6
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17refs: use designated initializers for "struct ref_storage_be"Ævar Arnfjörð Bjarmason1-26/+26
Change the definition of the three refs backends we currently carry to use designated initializers. The "= NULL" assignments being retained here are redundant, and could be removed, but let's keep them for clarity. All of these backends define almost all fields, so we're not saving much in terms of line count by omitting these, but e.g. for "refs_be_debug" it's immediately apparent that we're omitting "init" when comparing its assignment to the others. This is a follow-up to similar work merged in bd4232fac33 (Merge branch 'ab/struct-init', 2021-07-16), a4b9fb6a5cf (Merge branch 'ab/designated-initializers-more', 2021-10-18) and a30321b9eae (Merge branch 'ab/designated-initializers' into ab/designated-initializers-more, 2021-09-27). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-15core.fsync: new option to harden referencesPatrick Steinhardt1-0/+1
When writing both loose and packed references to disk we first create a lockfile, write the updated values into that lockfile, and on commit we rename the file into place. According to filesystem developers, this behaviour is broken because applications should always sync data to disk before doing the final rename to ensure data consistency [1][2][3]. If applications fail to do this correctly, a hard crash of the machine can easily result in corrupted on-disk data. This kind of corruption can in fact be easily observed with Git when the machine hard-resets shortly after writing references to disk. On machines with ext4, this will likely lead to the "empty files" problem: the file has been renamed, but its data has not been synced to disk. The result is that the reference is corrupt, and in the worst case this can lead to data loss. Implement a new option to harden references so that users and admins can avoid this scenario by syncing locked loose and packed references to disk before we rename them into place. [1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ [2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename) [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01refs/files-backend: optimize reading of symbolic refsPatrick Steinhardt1-6/+28
When reading references via `files_read_raw_ref()` we always consult both the loose reference, and if that wasn't found, we also consult the packed-refs file. While this makes sense to read a generic reference, it is wasteful in the case where we only care about symbolic references: the packed-refs backend does not support them, and thus it cannot ever return one for us. Special-case reading of symbolic references for the files backend such that we always skip asking the packed-refs backend. We use `refs_read_symbolic_ref()` extensively to determine whether we need to skip updating local symbolic references during a fetch, which is why the change results in a significant speedup when doing fetches in repositories with huge numbers of references. The following benchmark executes a mirror-fetch in a repository with about 2 million references via `git fetch --prune --no-write-fetch-head +refs/*:refs/*`: Benchmark 1: HEAD~ Time (mean ± σ): 68.372 s ± 2.344 s [User: 65.629 s, System: 8.786 s] Range (min … max): 65.745 s … 70.246 s 3 runs Benchmark 2: HEAD Time (mean ± σ): 60.259 s ± 0.343 s [User: 61.019 s, System: 7.245 s] Range (min … max): 60.003 s … 60.649 s 3 runs Summary 'HEAD' ran 1.13 ± 0.04 times faster than 'HEAD~' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01refs: add ability for backends to special-case reading of symbolic refsPatrick Steinhardt1-0/+1
Reading of symbolic and non-symbolic references is currently treated the same in reference backends: we always call `refs_read_raw_ref()` and then decide based on the returned flags what type it is. This has one downside though: symbolic references may be treated different from normal references in a backend from normal references. The packed-refs backend for example doesn't even know about symbolic references, and as a result it is pointless to even ask it for one. There are cases where we really only care about whether a reference is symbolic or not, but don't care about whether it exists at all or may be a non-symbolic reference. But it is not possible to optimize for this case right now, and as a consequence we will always first check for a loose reference to exist, and if it doesn't, we'll query the packed-refs backend for a known-to-not-be-symbolic reference. This is inefficient and requires us to search all packed references even though we know to not care for the result at all. Introduce a new function `refs_read_symbolic_ref()` which allows us to fix this case. This function will only ever return symbolic references and can thus optimize for the scenario layed out above. By default, if the backend doesn't provide an implementation for it, we just use the old code path and fall back to `read_raw_ref()`. But in case the backend provides its own, more efficient implementation, we will use that one instead. Note that this function is explicitly designed to not distinguish between missing references and non-symbolic references. If it did, we'd be forced to always search the packed-refs backend to see whether the symbolic reference the user asked for really doesn't exist, or if it exists as a non-symbolic reference. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-18Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'Junio C Hamano1-7/+19
Because a deletion of ref would need to remove it from both the loose ref store and the packed ref store, a delete-ref operation that logically removes one ref may end up invoking ref-transaction hook twice, which has been corrected. * ps/avoid-unnecessary-hook-invocation-with-packed-refs: refs: skip hooks when deleting uncovered packed refs refs: do not execute reference-transaction hook on packing refs refs: demonstrate excessive execution of the reference-transaction hook refs: allow skipping the reference-transaction hook refs: allow passing flags when beginning transactions refs: extract packed_refs_delete_refs() to allow control of transaction
2022-01-26refs API: remove "failure_errno" from refs_resolve_ref_unsafe()Ævar Arnfjörð Bjarmason1-22/+9
Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17refs: skip hooks when deleting uncovered packed refsPatrick Steinhardt1-3/+6
When deleting refs from the loose-files refs backend, then we need to be careful to also delete the same ref from the packed refs backend, if it exists. If we don't, then deleting the loose ref would "uncover" the packed ref. We thus always have to queue up deletions of refs for both the loose and the packed refs backend. This is done in two separate transactions, where the end result is that the reference-transaction hook is executed twice for the deleted refs. This behaviour is quite misleading: it's exposing implementation details of how the files backend works to the user, in contrast to the logical updates that we'd really want to expose via the hook. Worse yet, whether the hook gets executed once or twice depends on how well-packed the repository is: if the ref only exists as a loose ref, then we execute it once, otherwise if it is also packed then we execute it twice. Fix this behaviour and don't execute the reference-transaction hook at all when refs in the packed-refs backend if it's driven by the files backend. This works as expected even in case the refs to be deleted only exist in the packed-refs backend because the loose-backend always queues refs in its own transaction even if they don't exist such that they can be locked for concurrent creation. And it also does the right thing in case neither of the backends has the ref because that would cause the transaction to fail completely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17refs: do not execute reference-transaction hook on packing refsPatrick Steinhardt1-2/+4
The reference-transaction hook is supposed to track logical changes to references, but it currently also gets executed when packing refs in a repository. This is unexpected and ultimately not all that useful: packing refs is not supposed to result in any user-visible change to the refs' state, and it ultimately is an implementation detail of how refs stores work. Fix this excessive execution of the hook when packing refs. Reported-by: Waleed Khan <me@waleedkhan.name> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17refs: allow passing flags when beginning transactionsPatrick Steinhardt1-5/+5
We do not currently have any flags when creating reference transactions, but we'll add one to disable execution of the reference transaction hook in some cases. Allow passing flags to `ref_store_transaction_begin()` to prepare for this change. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17refs: extract packed_refs_delete_refs() to allow control of transactionPatrick Steinhardt1-3/+10
When deleting loose refs, then we also have to delete the refs in the packed backend. This is done by calling `refs_delete_refs()`, which then uses the packed-backend's logic to delete refs. This doesn't allow us to exercise any control over the reference transaction which is being created in the packed backend, which is required in a subsequent commit. Extract a new function `packed_refs_delete_refs()`, which hosts most of the logic to delete refs except for creating the transaction itself. Like this, we can easily create the transaction in the files backend and thus exert more control over it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14Merge branch 'ab/refs-errno-cleanup'Junio C Hamano1-2/+1
A brown-paper-bag fix on top of a topic that was merged during this cycle. * ab/refs-errno-cleanup: refs API: use "failure_errno", not "errno"