| Age | Commit message (Collapse) | Author | Files | Lines |
|
This shrinks the top-level directory a bit, and makes it much more
pleasant to use auto-completion on the thing. Instead of
[torvalds@nehalem git]$ em buil<tab>
Display all 180 possibilities? (y or n)
[torvalds@nehalem git]$ em builtin-sh
builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c
builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o
[torvalds@nehalem git]$ em builtin-shor<tab>
builtin-shortlog.c builtin-shortlog.o
[torvalds@nehalem git]$ em builtin-shortlog.c
you get
[torvalds@nehalem git]$ em buil<tab> [type]
builtin/ builtin.h
[torvalds@nehalem git]$ em builtin [auto-completes to]
[torvalds@nehalem git]$ em builtin/sh<tab> [type]
shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o
[torvalds@nehalem git]$ em builtin/sho [auto-completes to]
[torvalds@nehalem git]$ em builtin/shor<tab> [type]
shortlog.c shortlog.o
[torvalds@nehalem git]$ em builtin/shortlog.c
which doesn't seem all that different, but not having that annoying
break in "Display all 180 possibilities?" is quite a relief.
NOTE! If you do this in a clean tree (no object files etc), or using an
editor that has auto-completion rules that ignores '*.o' files, you
won't see that annoying 'Display all 180 possibilities?' message - it
will just show the choices instead. I think bash has some cut-off
around 100 choices or something.
So the reason I see this is that I'm using an odd editory, and thus
don't have the rules to cut down on auto-completion. But you can
simulate that by using 'ls' instead, or something similar.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This reverts most of commit a2430dde8ceaaaabf05937438249397b883ca77a.
That commit made the situation better for repositories with relatively
small number of objects. However with many objects and a small pack size
limit, the time required to complete the repack tends towards O(n^2),
or even much worse with long delta chains.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The value passed to --max-pack-size used to count in MiB which was
inconsistent with the corresponding configuration variable as well as
other command arguments which are defined to count in bytes with an
optional unit suffix. This brings --max-pack-size in line with the
rest of Git.
Also, in order not to cause havoc with people used to the previous
megabyte scale, and because this is a sane thing to do anyway, a
minimum size of 1 MiB is enforced to avoid an explosion of pack files.
Adjust and extend test suite accordingly.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Current handling of pack_size_limit is quite suboptimal. Let's consider
a list of objects to pack which contain alternatively big and small
objects (which pretty matches reality when big blobs are interlaced
with tree objects). Currently, the code simply close the pack and opens
a new one when the next object in line breaks the size limit.
The current code may degenerate into:
- small tree object => store into pack #1
- big blob object busting the pack size limit => store into pack #2
- small blob but pack #2 is over the limit already => pack #3
- big blob busting the size limit => pack #4
- small tree but pack #4 is over the limit => pack #5
- big blob => pack #6
- small tree => pack #7
- ... and so on.
The reality is that the content of packs 1, 3, 5 and 7 could well be
stored more efficiently (and delta compressed) together in pack #1 if
the big blobs were not forcing an immediate transition to a new pack.
Incidentally this can be fixed pretty easily by simply skipping over
those objects that are too big to fit in the current pack while trying
the whole list of unwritten objects, and then that list considered from
the beginning again when a new pack is opened. This creates much fewer
smallish pack files and help making more predictable test cases for the
test suite.
This change made one of the self sanity checks useless so it is removed
as well. That check was rather redundant already anyway.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the first piece of threaded code was introduced in commit 8ecce684, it
came with its own THREADED_DELTA_SEARCH Makefile option. Since this time,
more threaded code has come into the codebase and a NO_PTHREADS option has
also been added. Get rid of the original option as the newer, more generic
option covers everything we need.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This required some fairly trivial packfile function 'const' cleanup,
since the builtin commands get a const char *argv[] array.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/conflict-marker-size:
rerere: honor conflict-marker-size attribute
rerere: prepare for customizable conflict marker length
conflict-marker-size: new attribute
rerere: use ll_merge() instead of using xdl_merge()
merge-tree: use ll_merge() not xdl_merge()
xdl_merge(): allow passing down marker_size in xmparam_t
xdl_merge(): introduce xmparam_t for merge specific parameters
git_attr(): fix function signature
Conflicts:
builtin-merge-file.c
ll-merge.c
xdiff/xdiff.h
xdiff/xmerge.c
|
|
The function took (name, namelen) as its arguments, but all the public
callers wanted to pass a full string.
Demote the counted-string interface to an internal API status, and allow
public callers to just pass the string to the function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This patch implements native to Windows subset of pthreads API used by Git.
It allows to remove Pthreads for Win32 dependency for MSVC, msysgit and
Cygwin.
[J6t: If the MinGW build was built as part of the msysgit build
environment, then threading was already enabled because the
pthreads-win32 package is available in msysgit. With this patch, we can now
enable threaded code unconditionally.]
Signed-off-by: Andrzej K. Haczewski <ahaczewski@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint:
pack-objects: split implications of --all-progress from progress activation
instaweb: restart server if already running
prune-packed: only show progress when stderr is a tty
Conflicts:
builtin-pack-objects.c
|
|
Currently the --all-progress flag is used to use force progress display
during the writing object phase even if output goes to stdout which is
primarily the case during a push operation. This has the unfortunate
side effect of forcing progress display even if stderr is not a
terminal.
Let's introduce the --all-progress-implied argument which has the same
intent except for actually forcing the activation of any progress
display. With this, progress display will be automatically inhibited
whenever stderr is not a terminal, or full progress display will be
included otherwise. This should let people use 'git push' within a cron
job without filling their logs with useless percentage displays.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Tested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Let's keep thread stuff close together if possible. And in this case,
this even reduces the #ifdef noise, and allows for skipping the
autodetection altogether if delta search is not needed (like with a pure
clone).
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These spaces immediately before the end of lines are unnecessary.
While at it, instead of using a single string literal with backslashes
at end of each line, split the lines into individual string literals
and tell the compiler to concatenate them.
Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* np/maint-1.6.3-deepen:
pack-objects: free preferred base memory after usage
make shallow repository deepening more network efficient
|
|
When adding objects for preferred delta base, the content from tree
objects leading to given paths is kept in a cache. This has the
potential to grow significantly, especially with large directories as
the whole tree object content is loaded in memory, even if in practice
the number of those objects is limited to the 256 cache entries plus the
$window root tree objects. Still, that can't hurt freeing that up after
object enumeration is done, and before more memory is needed for delta
search.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is one of only two places that we use C99 variable length array on
the stack, which some older compilers apparently are not happy with.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The majority of code in core git appears to use a single
space after if/for/while. This is an attempt to bring more
code to this standard. These are entirely cosmetic changes.
Signed-off-by: Brian Gianforcaro <b.gianfo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* cc/replace:
t6050: check pushing something based on a replaced commit
Documentation: add documentation for "git replace"
Add git-replace to .gitignore
builtin-replace: use "usage_msg_opt" to give better error messages
parse-options: add new function "usage_msg_opt"
builtin-replace: teach "git replace" to actually replace
Add new "git replace" command
environment: add global variable to disable replacement
mktag: call "check_sha1_signature" with the replacement sha1
replace_object: add a test case
object: call "check_sha1_signature" with the replacement sha1
sha1_file: add a "read_sha1_file_repl" function
replace_object: add mechanism to replace objects found in "refs/replace/"
refs: add a "for_each_replace_ref" function
|
|
I have 4GB of RAM on my system which should, in theory, be quite enough
to repack a 600 MB repository. However the unbounded delta cache size
always pushes it into swap, at which point everything virtually comes to
a halt. So unbounded caches are never a good idea.
A default of 256MB should be a good compromize between memory usage and
speed where medium sized repositories are still likely to fit in the
cache with a reasonable memory usage, and larger repositories are going
to take quite some time to repack already anyway.
While at it, clarify the associated config variable documentation
entries a bit.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* js/maint-graft-unhide-true-parents:
git repack: keep commits hidden by a graft
Add a test showing that 'git repack' throws away grafted-away parents
Conflicts:
git-repack.sh
|
|
When you have grafts that pretend that a given commit has different
parents than the ones recorded in the commit object, it is dangerous
to let 'git repack' remove those hidden parents, as you can easily
remove the graft and end up with a broken repository.
So let's play it safe and keep those parent objects and everything
that is reachable by them, in addition to the grafted parents.
As this behavior can only be triggered by git pack-objects, and as that
command handles duplicate parents gracefully, we do not bother to cull
duplicated parents that may result by using both true and grafted
parents.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* tr/die_errno:
Use die_errno() instead of die() when checking syscalls
Convert existing die(..., strerror(errno)) to die_errno()
die_errno(): double % in strerror() output just in case
Introduce die_errno() that appends strerror(errno) to die()
|
|
Change calls to die(..., strerror(errno)) to use the new die_errno().
In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Shifting 'unsigned char' or 'unsigned short' left can result in sign
extension errors, since the C integer promotion rules means that the
unsigned char/short will get implicitly promoted to a signed 'int' due to
the shift (or due to other operations).
This normally doesn't matter, but if you shift things up sufficiently, it
will now set the sign bit in 'int', and a subsequent cast to a bigger type
(eg 'long' or 'unsigned long') will now sign-extend the value despite the
original expression being unsigned.
One example of this would be something like
unsigned long size;
unsigned char c;
size += c << 24;
where despite all the variables being unsigned, 'c << 24' ends up being a
signed entity, and will get sign-extended when then doing the addition in
an 'unsigned long' type.
Since git uses 'unsigned char' pointers extensively, we actually have this
bug in a couple of places.
I may have missed some, but this is the result of looking at
git grep '[^0-9 ][ ]*<<[ ][a-z]' -- '*.c' '*.h'
git grep '<<[ ]*24'
which catches at least the common byte cases (shifting variables by a
variable amount, and shifting by 24 bits).
I also grepped for just 'unsigned char' variables in general, and
converted the ones that most obviously ended up getting implicitly cast
immediately anyway (eg hash_name(), encode_85()).
In addition to just avoiding 'unsigned char', this patch also tries to use
a common idiom for the delta header size thing. We had three different
variations on it: "& 0x7fUL" in one place (getting the sign extension
right), and "& ~0x80" and "& 0x7f" in two other places (not getting it
right). Apart from making them all just avoid using "unsigned char" at
all, I also unified them to then use a simple "& 0x7f".
I considered making a sparse extension which warns about doing implicit
casts from unsigned types to signed types, but it gets rather complex very
quickly, so this is just a hack.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This new "read_replace_refs" global variable is set to 1 by
default, so that replace refs are used by default. But
reachability traversal and packing commands ("cmd_fsck",
"cmd_prune", "cmd_pack_objects", "upload_pack",
"cmd_unpack_objects") set it to 0, as they must work with the
original DAG.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* lt/pack-object-memuse:
show_object(): push path_name() call further down
process_{tree,blob}: show objects without buffering
Conflicts:
builtin-pack-objects.c
builtin-rev-list.c
list-objects.c
list-objects.h
upload-pack.c
|
|
In particular, pushing the "path_name()" call _into_ the show() function
would seem to allow
- more clarity into who "owns" the name (ie now when we free the name in
the show_object callback, it's because we generated it ourselves by
calling path_name())
- not calling path_name() at all, either because we don't care about the
name in the first place, or because we are actually happy walking the
linked list of "struct name_path *" and the last component.
Now, I didn't do that latter optimization, because it would require some
more coding, but especially looking at "builtin-pack-objects.c", we really
don't even want the whole pathname, we really would be better off with the
list of path components.
Why? We use that name for two things:
- add_preferred_base_object(), which actually _wants_ to traverse the
path, and now does it by looking for '/' characters!
- for 'name_hash()', which only cares about the last 16 characters of a
name, so again, generating the full name seems to be just unnecessary
work.
Anyway, so I didn't look any closer at those things, but it did convince
me that the "show_object()" calling convention was crazy, and we're
actually better off doing _less_ in list-objects.c, and giving people
access to the internal data structures so that they can decide whether
they want to generate a path-name or not.
This patch does that, and then for people who did use the name (even if
they might do something more clever in the future), it just does the
straightforward "name = path_name(path, component); .. free(name);" thing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Here's a less trivial thing, and slightly more dubious one.
I was looking at that "struct object_array objects", and wondering why we
do that. I have honestly totally forgotten. Why not just call the "show()"
function as we encounter the objects? Rather than add the objects to the
object_array, and then at the very end going through the array and doing a
'show' on all, just do things more incrementally.
Now, there are possible downsides to this:
- the "buffer using object_array" _can_ in theory result in at least
better I-cache usage (two tight loops rather than one more spread out
one). I don't think this is a real issue, but in theory..
- this _does_ change the order of the objects printed. Instead of doing a
"process_tree(revs, commit->tree, &objects, NULL, "");" in the loop
over the commits (which puts all the root trees _first_ in the object
list, this patch just adds them to the list of pending objects, and
then we'll traverse them in that order (and thus show each root tree
object together with the objects we discover under it)
I _think_ the new ordering actually makes more sense, but the object
ordering is actually a subtle thing when it comes to packing
efficiency, so any change in order is going to have implications for
packing. Good or bad, I dunno.
- There may be some reason why we did it that odd way with the object
array, that I have simply forgotten.
Anyway, now that we don't buffer up the objects before showing them
that may actually result in lower memory usage during that whole
traverse_commit_list() phase.
This is seriously not very deeply tested. It makes sense to me, it seems
to pass all the tests, it looks ok, but...
Does anybody remember why we did that "object_array" thing? It used to be
an "object_list" a long long time ago, but got changed into the array due
to better memory usage patterns (those linked lists of obejcts are
horrible from a memory allocation standpoint). But I wonder why we didn't
do this back then. Maybe there's a reason for it.
Or maybe there _used_ to be a reason, and no longer is.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* cc/bisect-filter: (21 commits)
rev-list: add "int bisect_show_flags" in "struct rev_list_info"
rev-list: remove last static vars used in "show_commit"
list-objects: add "void *data" parameter to show functions
bisect--helper: string output variables together with "&&"
rev-list: pass "int flags" as last argument of "show_bisect_vars"
t6030: test bisecting with paths
bisect: use "bisect--helper" and remove "filter_skipped" function
bisect: implement "read_bisect_paths" to read paths in "$GIT_DIR/BISECT_NAMES"
bisect--helper: implement "git bisect--helper"
bisect: use the new generic "sha1_pos" function to lookup sha1
rev-list: call new "filter_skip" function
patch-ids: use the new generic "sha1_pos" function to lookup sha1
sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1
rev-list: pass "revs" to "show_bisect_vars"
rev-list: make "show_bisect_vars" non static
rev-list: move code to show bisect vars into its own function
rev-list: move bisect related code into its own file
rev-list: make "bisect_list" variable local to "cmd_rev_list"
refs: add "for_each_ref_in" function to refactor "for_each_*_ref" functions
quote: add "sq_dequote_to_argv" to put unwrapped args in an argv array
...
|
|
* maint:
GIT 1.6.2.3
State the effect of filter-branch on graft explicitly
process_{tree,blob}: Remove useless xstrdup calls
Conflicts:
GIT-VERSION-GEN
|
|
* maint-1.6.1:
State the effect of filter-branch on graft explicitly
process_{tree,blob}: Remove useless xstrdup calls
|
|
* maint-1.6.0:
State the effect of filter-branch on graft explicitly
process_{tree,blob}: Remove useless xstrdup calls
|
|
On Wed, 8 Apr 2009, Björn Steinbrink wrote:
>
> The name of the processed object was duplicated for passing it to
> add_object(), but that already calls path_name, which allocates a new
> string anyway. So the memory allocated by the xstrdup calls just went
> nowhere, leaking memory.
Ack, ack.
There's another easy 5% or so for the built-in object walker: once we've
created the hash from the name, the name isn't interesting any more, and
so something trivial like this can help a bit.
Does it matter? Probably not on its own. But a few more memory saving
tricks and it might all make a difference.
Linus
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the case of a small repository, pack-objects is smart enough to not
start more threads than necessary. However, the output to the user always
reports the value of the pack.threads configuration and not the real
number of threads to be used.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
Conflicts:
t/t7700-repack.sh
|
|
The goal of this patch is to get rid of the "static struct rev_info
revs" static variable in "builtin-rev-list.c".
To do that, we need to pass the revs to the "show_commit" function
in "builtin-rev-list.c" and this in turn means that the
"traverse_commit_list" function in "list-objects.c" must be passed
functions pointers to functions with 2 parameters instead of one.
So we have to change all the callers and all the functions passed
to "traverse_commit_list".
Anyway this makes the code more clean and more generic, so it
should be a good thing in the long run.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
Conflicts:
t/t7700-repack.sh
|
|
* maint:
Increase the size of the die/warning buffer to avoid truncation
close_sha1_file(): make it easier to diagnose errors
avoid possible overflow in delta size filtering computation
|
|
* maint-1.6.1:
close_sha1_file(): make it easier to diagnose errors
avoid possible overflow in delta size filtering computation
|
|
* maint-1.6.0:
close_sha1_file(): make it easier to diagnose errors
avoid possible overflow in delta size filtering computation
|
|
On a 32-bit system, the maximum possible size for an object is less than
4GB, while 64-bit systems may cope with larger objects. Due to this
limitation, variables holding object sizes are using an unsigned long
type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).
When large objects are encountered, and/or people play with large delta
depth values, it is possible for the maximum allowed delta size
computation to overflow, especially on a 32-bit system. When this
occurs, surviving result bits may represent a value much smaller than
what it is supposed to be, or even zero. This prevents some objects
from being deltified although they do get deltified when a smaller depth
limit is used. Fix this by always performing a 64-bit multiplication.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-pack-directory:
Fix odb_mkstemp() on AIX
Make sure objects/pack exists before creating a new pack
Conflicts:
wrapper.c
|
|
If pack-objects is called with the --unpack-unreachable option then it
will unpack (i.e. loosen) all unreferenced objects from local not-kept
packs, including those that also exist in packs residing in an alternate
object database or a locally kept pack. The only user of this option is
git-repack.
In this case, repack will follow the call to pack-objects with a call to
prune-packed, which will delete these newly loosened objects, making the
act of loosening a waste of time. The unnecessary loosening can be
avoided by checking whether an object exists in a non-local pack or a
locally kept pack before loosening it.
This fixes the 'local packed unreachable obs that exist in alternate ODB
are not loosened' test in t7700.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack. It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These two features were invented for use by repack when repack will delete
the local packs that have been made redundant. The packs accessible
through alternates are not deleted by repack, so the objects contained in
them are still accessible after the local packs are deleted. They do not
need to be repacked into the new pack or loosened. For the case of
loosening they would immediately be deleted by the subsequent prune-packed
that is called by repack anyway.
This fixes the test
'packed unreachable obs in alternate ODB are not loosened' in t7700.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-keep-pack:
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
|
|
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.
Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.
This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This removes --unpacked=<packfile> parameter from the revision parser, and
rewrites its use in git-repack to pass a single --kept-pack-only option
instead.
The new --kept-pack-only option means just that. When this option is
given, is_kept_pack() that used to say "not on the --unpacked=<packfile>
list" now says "the packfile has corresponding .keep file".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack(). The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-pack-directory:
Make sure objects/pack exists before creating a new pack
|
|
In a repository created with git older than f49fb35 (git-init-db: create
"pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is
not created upon initialization. It was Ok because subdirectories are
created as needed inside directories init-db creates, and back then,
packfiles were recent invention.
After the said commit, new codepaths started relying on the presense of
objects/pack/ directory in the repository. This was exacerbated with
8b4eb6b (Do not perform cross-directory renames when creating packs,
2008-09-22) that moved the location temporary pack files are created from
objects/ directory to objects/pack/ directory, because moving temporary to
the final location was done carefully with lazy leading directory creation.
Many packfile related operations in such an old repository can fail
mysteriously because of this.
This commit introduces two helper functions to make things work better.
- odb_mkstemp() is a specialized version of mkstemp() to refactor the
code and teach it to create leading directories as needed;
- odb_pack_keep() refactors the code to create a ".keep" file while
create leading directories as needed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* lt/maint-wrap-zlib:
Wrap inflate and other zlib routines for better error reporting
Conflicts:
http-push.c
http-walker.c
sha1_file.c
|
|
* lt/maint-wrap-zlib:
Wrap inflate and other zlib routines for better error reporting
Conflicts:
http-push.c
http-walker.c
sha1_file.c
|
|
R. Tyler Ballance reported a mysterious transient repository corruption;
after much digging, it turns out that we were not catching and reporting
memory allocation errors from some calls we make to zlib.
This one _just_ wraps things; it doesn't do the "retry on low memory
error" part, at least not yet. It is an independent issue from the
reporting. Some of the errors are expected and passed back to the caller,
but we die when zlib reports it failed to allocate memory for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If there are few objects to deltify, they might be split amongst threads
so that there is simply no other objects left to delta against within
the same thread. Let's use the same 2*window treshold as used for the
final load balancing to allow extra threads to be created.
This fixes the benign t5300 test failure.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
... and display the actual number of threads used when locally
repacking. A remote server still won't tell you how many threads it
uses during a fetch though.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint:
fsck: reduce stack footprint
make sure packs to be replaced are closed beforehand
|
|
Especially on Windows where an opened file cannot be replaced, make
sure pack-objects always close packs it is about to replace. Even on
non Windows systems, this could save potential bad results if ever
objects were to be read from the new pack file using offset from the old
index.
This should fix t5303 on Windows.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Johannes Sixt <j6t@kdbg.org> (MinGW)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* bc/maint-keep-pack:
repack: only unpack-unreachable if we are deleting redundant packs
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
|
|
* np/pack-safer:
t5303: fix printf format string for portability
t5303: work around printf breakage in dash
pack-objects: don't leak pack window reference when splitting packs
extend test coverage for latest pack corruption resilience improvements
pack-objects: allow "fixing" a corrupted pack without a full repack
make find_pack_revindex() aware of the nasty world
make check_object() resilient to pack corruptions
make packed_object_info() resilient to pack corruptions
make unpack_object_header() non fatal
better validation on delta base object offsets
close another possibility for propagating pack corruption
|
|
* bc/maint-keep-pack:
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
|
|
* maint:
Start 1.6.0.5 cycle
Fix pack.packSizeLimit and --max-pack-size handling
checkout: Fix "initial checkout" detection
Remove the period after the git-check-attr summary
Conflicts:
RelNotes
|
|
If the limit was sufficiently low, having a single object written
could bust the limit (by design), but caused the remaining allowed
size to go negative for subsequent objects, which for an unsigned
variable is a rather huge limit.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With this patch, --local means pack only local objects that are not already
packed.
Additionally, this fixes t7700 testing whether loose objects in an alternate
object database are repacked.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This adds a new option to pack-objects which will cause it to ignore an
object which appears in a local pack which has a .keep file, even if it
was specified for packing.
This option will be used by the porcelain repack.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the pack data to be reused is found to be bad, let's fall back to
full object access through the generic path which has its own strategies
to find alternate object sources in that case. This allows for "fixing"
a corrupted pack simply by copying either another pack containing the
object(s) found to be bad, or the loose object itself, into the object
store and launch a repack without the need for -f.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It currently calls die() whenever given offset is not found thinking
that such thing should never happen. But this offset may come from a
corrupted pack whych _could_ happen and not be found. Callers should
deal with this possibility gracefully instead.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The check_object() function tries to get away with the least amount of
pack access possible when it already has partial information on given
object rather than calling the more costly packed_object_info().
When things don't look right, it should just give up and fall back to
packed_object_info() directly instead of die()'ing.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is possible to have pack corruption in the object header. Currently
unpack_object_header() simply die() on them instead of letting the caller
deal with that gracefully.
So let's have unpack_object_header() return an error instead, and find
a better name for unpack_object_header_gently() in that context. All
callers of unpack_object_header() are ready for it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In one case, it was possible to have a bad offset equal to 0 effectively
pointing a delta onto itself and crashing git after too many recursions.
In the other cases, a negative offset could result due to off_t being
signed. Catch those.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Abstract
--------
With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking. This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.
The Context
-----------
The Git database is all checksummed with SHA1 hashes. Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data. However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.
First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data. Any
disk corruption would be caught already in this case.
Then, packed objects are also compressed with zlib but only for their
actual payload. The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions. Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.
Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms. Still, the possibility
exists as demonstrated in t5302 and is certainly greater than a SHA1
collision, especially in the OBJ_OFS_DELTA case which is now the default
when repacking.
Again, repacking by reusing existing pack data is OK since the per object
CRC provided by index v2 guards against any such corruptions. What t5302
failed to test is a full repack in such case.
The Solution
------------
As unlikely as this kind of stealth corruption can be in practice, it
certainly isn't acceptable to propagate it into a freshly created pack.
But, because this is so unlikely, we don't want to pay the run time cost
associated with extra validation checks all the time either. Furthermore,
consequences of such corruption in anything but repacking should be rather
visible, and even if it could be quite unpleasant, it still has far less
severe consequences than actively creating bad packs.
So the best compromize is to check packed object CRC when unpacking
objects, and only during the compression/writing phase of a repack, and
only when not streaming the result. The cost of this is minimal (less
than 1% CPU time), and visible only with a full repack.
Someone with a stats background could provide an objective evaluation of
this, but I suspect that it's bad RAM that has more potential for data
corruptions at this point, even in those cases where this extra check
is not performed. Still, it is best to prevent a known hole for
corruption when recreating object data into a new pack.
What about the streamed pack case? Well, any client receiving a pack
must always consider that pack as untrusty and perform full validation
anyway, hence no such stealth corruption could be propagated to remote
repositoryes already. It is therefore worthless doing local validation
in that case.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint:
Start 1.6.0.4 cycle
add instructions on how to send patches to the mailing list with Gmail
Documentation/gitattributes: Add subsection header for each attribute
git send-email: avoid leaking directory file descriptors.
send-pack: do not send out single-level refs such as refs/stash
fix overlapping memcpy in normalize_absolute_path
pack-objects: avoid reading uninitalized data
correct cache_entry allocation
Conflicts:
RelNotes
|
|
In the main loop of find_deltas, we do:
struct object_entry *entry = *list++;
...
if (!*list_size)
...
break
Because we look at and increment *list _before_ the check of
list_size, in the very last iteration of the loop we will
look at uninitialized data, and increment the pointer beyond
one past the end of the allocated space. Since we don't
actually do anything with the data until after the check,
this is not a problem in practice.
But since it technically violates the C standard, and
because it provokes a spurious valgrind warning, let's just
move the initialization of entry to a safe place.
This fixes valgrind errors in t5300, t5301, t5302, t303, and
t9400.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Many call sites immediately initialize allocated memory with zero after
calling xmalloc. A single call to xcalloc can replace this two-call
sequence.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
* maint:
builtin-prune.c: prune temporary packs in <object_dir>/pack directory
Do not perform cross-directory renames when creating packs
|
|
A comment on top of create_tmpfile() describes caveats ('can have
problems on various systems (FAT, NFS, Coda)') that should apply
in this situation as well. This in the end did not end up solving
any of my personal problems, but it might be a useful cleanup patch
nevertheless.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* np/pack:
t5300: improve SHA1 collision test
pack-objects: don't include missing preferred base objects
sha1write: don't copy full sized buffers
Conflicts:
t/t5300-pack-object.sh
|
|
User notifications are presented as 'git cmd', and code comments
are presented as '"cmd"' or 'git's cmd', rather than 'git-cmd'.
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* np/maint-safer-pack:
fixup_pack_header_footer(): use nicely aligned buffer sizes
index-pack: use fixup_pack_header_footer()'s validation mode
pack-objects: use fixup_pack_header_footer()'s validation mode
improve reliability of fixup_pack_header_footer()
pack-objects: improve returned information from write_one()
|
|
This improves commit 6d6f9cddbe a bit by simply not including missing
bases in the list of objects to process at all.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* np/maint-safer-pack:
fixup_pack_header_footer(): use nicely aligned buffer sizes
index-pack: use fixup_pack_header_footer()'s validation mode
pack-objects: use fixup_pack_header_footer()'s validation mode
improve reliability of fixup_pack_header_footer()
pack-objects: improve returned information from write_one()
|
|
* sp/missing-thin-base:
pack-objects: Allow missing base objects when creating thin packs
|
|
When limiting the pack size, a new header has to be written to the
pack and a new SHA1 computed. Make sure that the SHA1 of what is being
read back matches the SHA1 of what was written.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Currently, this function has the potential to read corrupted pack data
from disk and give it a valid SHA1 checksum. Let's add the ability to
validate SHA1 checksum of existing data along the way, including before
and after any arbitrary point in the pack.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This function returns 0 when the current object couldn't be written
due to the pack size limit, otherwise the current offset in the pack.
There is a problem with this approach however, since current object
could be a delta and its delta base might just have been written in
the same write_one() call, but those successfully written objects are
not accounted in the offset variable tracked by the caller. Currently
this is not an issue but a subsequent patch will need this.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The index-pack command, when processing a thin pack, fixed up the pack
after-the-fact. It forgets to fsync the result, because it only did that
in one path rather in all cases of fixup.
This moves the fsync_or_die() to the fix-up routine itself, rather than
doing it in one of the callers, so that all cases are covered.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If we are building a thin pack and one of the base objects we would
consider for deltification is missing its OK, the other side already
has that base object. We may be able to get a delta from another
object, or we can simply send the new object whole (no delta).
This change allows a shallow clone to store only the objects which
are unique to it, as well as the boundary commit and its trees, but
avoids storing the boundary blobs. This special form of a shallow
clone is able to represent just the difference between two trees.
Pack objects change suggested by Nicolas Pitre.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When printing valuds of type uint32_t, we should use PRIu32, and should
not assume that it is unsigned int. On 32-bit platforms, it could be
defined as unsigned long. The same caution applies to ntohl().
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To do so, check_pack_crc() moved from builtin-pack-objects.c to
pack-check.c where it is more logical to share.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This makes life much easier for next patch, as well as being more efficient
when the revindex is actually not used.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since the pack-files are now always created stably on disk, there is no
need to sync() before pruning lose objects or old stale pack-files.
[jc: with Nico's clean-up]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This means that we can depend on packs always being stable on disk,
simplifying a lot of the object serialization worries. And unlike loose
objects, serializing pack creation IO isn't going to be a performance
killer.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* js/config-cb:
Provide git_config with a callback-data parameter
Conflicts:
builtin-add.c
builtin-cat-file.c
|
|
* bc/repack:
Documentation/git-repack.txt: document new -A behaviour
let pack-objects do the writing of unreachable objects as loose objects
add a force_object_loose() function
builtin-gc.c: deprecate --prune, it now really has no effect
git-gc: always use -A when manually repacking
repack: modify behavior of -A option to leave unreferenced objects unpacked
Conflicts:
builtin-pack-objects.c
|
|
git_config() only had a function parameter, but no callback data
parameter. This assumes that all callback functions only modify
global variables.
With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit ccc1297226b184c40459e9d373cc9eebfb7bd898 changed the behavior
of 'git repack -A' so unreachable objects are stored as loose objects.
However it did so in a naive and inn efficient way by making packs
about to be deleted inaccessible and feeding their content through
'git unpack-objects'. While this works, there are major flaws with
this approach:
- It is unacceptably sloooooooooooooow.
In the Linux kernel repository with no actual unreachable objects,
doing 'git repack -A -d' before:
real 2m33.220s
user 2m21.675s
sys 0m3.510s
And with this change:
real 0m36.849s
user 0m24.365s
sys 0m1.950s
For reference, here's the timing for 'git repack -a -d':
real 0m35.816s
user 0m22.571s
sys 0m2.011s
This is explained by the fact that 'git unpack-objects' was used to
unpack _every_ objects even if (almost) 100% of them were thrown away.
- There is a black out period.
Between the removal of the .idx file for the redundant pack and the
completion of its unpacking, the unreachable objects become completely
unaccessible. This is not a big issue as we're talking about unreachable
objects, but some consistency is always good.
- There is no way to easily set a sensible mtime for the newly created
unreachable loose objects.
So, while having a command called "pack-objects" to perform object
unpacking looks really odd, this is probably the best compromize to be
able to solve the above issues in an efficient way.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The 'depth' variable doesn't reflect the actual maximum depth used
when other objects already depend on the current one.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the delta data is cached in memory until it is written to a pack
file on disk, it is best to compress it right away in find_deltas() for
the following reasons:
- we have to compress that data anyway;
- this allows for caching more deltas with the same cache size limit;
- compression is potentially threaded.
This last point is especially relevant for SMP run time. For example,
repacking the Linux repo on a quad core processor using 4 threads with
all default settings produce the following results before this change:
real 2m27.929s
user 4m36.492s
sys 0m3.091s
And with this change applied:
real 2m13.787s
user 4m37.486s
sys 0m3.159s
So the actual execution time stayed more or less the same but the
wall clock time is shorter.
This is however not a good thing to do when generating a pack for
network transmission. In that case, the network is most likely to
throttle the data throughput, so it is best to make find_deltas()
faster in order to start writing data ASAP since we can afford
spending more time between writes to compress the data
at that point.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A later patch will make use of that code too.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
... for improved readability. No functional changes.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Parsing !no_reuse_delta everywhere makes my brain spend extra
cycles wondering each time.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Better encapsulate delta creation for writing.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Runtime pack access is done in the pack file mtime order since recent
packs are more likely to contain frequently used objects than old packs.
However the --max-pack-size option can produce multiple packs with mtime
in the reversed order as newer objects are always written first.
Let's modify mtime of later pack files (when any) so they appear older
than preceding ones when a repack creates multiple packs.
Signed-off-by: Nicolas Pitre <nico@cam.org>
|
|
The new option "--include-tag" allows the caller to request that
any annotated tag be included into the packfile if the object the tag
references was also included as part of the packfile.
This option can be useful on the server side of a native git transport,
where the server knows what commits it is including into a packfile to
update the client. If new annotated tags have been introduced then we
can also include them in the packfile, saving the client from needing
to request them through a second connection.
This change only introduces the backend option and provides a test.
Protocol extensions to make this useful in fetch-pack/upload-pack
are still necessary to activate the logic during transport.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* np/verify-pack:
add storage size output to 'git verify-pack -v'
fix unimplemented packed_object_info_detail() features
make verify_one_pack() a bit less wrong wrt packed_git structure
factorize revindex code out of builtin-pack-objects.c
Conflicts:
Makefile
|
|
* mk/maint-parse-careful:
receive-pack: use strict mode for unpacking objects
index-pack: introduce checking mode
unpack-objects: prevent writing of inconsistent objects
unpack-object: cache for non written objects
add common fsck error printing function
builtin-fsck: move common object checking code to fsck.c
builtin-fsck: reports missing parent commits
Remove unused object-ref code
builtin-fsck: move away from object-refs to fsck_walk
add generic, type aware object chain walker
Conflicts:
Makefile
builtin-fsck.c
|
|
No functional change. This is needed to fix verify-pack in a later patch.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jm/free:
Avoid unnecessary "if-before-free" tests.
Conflicts:
builtin-branch.c
|
|
packing"
This reverts commit 6c723f5e6bc579e06a904874f1ceeb8ff2b5a17c.
The additional message may be interesting for git developers,
but not useful for the end users, and clutters the output.
|
|
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Packing objects can be done in parallell nowadays, but it's
only done if the config option pack.threads is set to a value
above 1. Because of that, the code-path used is often not the
most optimal one.
This patch adds a routine to detect the number of online CPU's
at runtime (online_cpus()). When pack.threads (or --threads=) is
given a value of 0, the number of threads is set to the number of
online CPU's. This feature is also documented.
As per Nicolas Pitre's recommendations, the default is still to
run pack-objects single-threaded unless explicitly activated,
either by configuration or by command line parameter.
The routine online_cpus() is a rework of "numcpus.c", written by
one Philip Willoughby <pgw99@doc.ic.ac.uk>. numcpus.c is in the
public domain and can presently be downloaded from
http://csgsoft.doc.ic.ac.uk/numcpus/
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This change removes all obvious useless if-before-free tests.
E.g., it replaces code like this:
if (some_expression)
free (some_expression);
with the now-equivalent:
free (some_expression);
It is equivalent not just because POSIX has required free(NULL)
to work for a long time, but simply because it has worked for
so long that no reasonable porting target fails the test.
Here's some evidence from nearly 1.5 years ago:
http://www.winehq.org/pipermail/wine-patches/2006-October/031544.html
FYI, the change below was prepared by running the following:
git ls-files -z | xargs -0 \
perl -0x3b -pi -e \
's/\bif\s*\(\s*(\S+?)(?:\s*!=\s*NULL)?\s*\)\s+(free\s*\(\s*\1\s*\))/$2/s'
Note however, that it doesn't handle brace-enclosed blocks like
"if (x) { free (x); }". But that's ok, since there were none like
that in git sources.
Beware: if you do use the above snippet, note that it can
produce syntactically invalid C code. That happens when the
affected "if"-statement has a matching "else".
E.g., it would transform this
if (x)
free (x);
else
foo ();
into this:
free (x);
else
foo ();
There were none of those here, either.
If you're interested in automating detection of the useless
tests, you might like the useless-if-before-free script in gnulib:
[it *does* detect brace-enclosed free statements, and has a --name=S
option to make it detect free-like functions with different names]
http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=build-aux/useless-if-before-free
Addendum:
Remove one more (in imap-send.c), spotted by Jean-Luc Herren <jlh@gmx.ch>.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A failure in prepare_revision_walk can be caused by
a not parseable object.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint:
config: add test cases for empty value and no value config variables.
cvsimport: have default merge regex also match beginning of commit message
git clone -s documentation: force a new paragraph for the NOTE
status: suggest "git rm --cached" to unstage for initial commit
Protect get_author_ident_from_commit() from filenames in work tree
upload-pack: Initialize the exec-path.
bisect: use verbatim commit subject in the bisect log
git-cvsimport.txt: fix '-M' description.
Revert "pack-objects: only throw away data during memory pressure"
|
|
This reverts commit 9c2174350cc0ae0f6bad126e15fe1f9f044117ab.
Nico analyzed and found out that this does not really help, and
I agree with it.
By the time this gets into action and data is actively thrown
away, performance simply goes down the drain due to the data
constantly being reloaded over and over and over and over and
over and over again, to the point of virtually making no
relative progress at all. The previous behavior of enforcing
the memory limit by dynamically shrinking the window size at
least had the effect of allowing some kind of progress, even if
the end result wouldn't be optimal.
And that's the whole point behind this memory limiting feature:
allowing some progress to be made when resources are too limited
to let the repack go unbounded.
|
|
* maint: (35 commits)
config.c: guard config parser from value=NULL
builtin-log.c: guard config parser from value=NULL
imap-send.c: guard config parser from value=NULL
wt-status.c: guard config parser from value=NULL
setup.c: guard config parser from value=NULL
remote.c: guard config parser from value=NULL
merge-recursive.c: guard config parser from value=NULL
http.c: guard config parser from value=NULL
help.c: guard config parser from value=NULL
git.c: guard config parser from value=NULL
diff.c: guard config parser from value=NULL
convert.c: guard config parser from value=NULL
connect.c: guard config parser from value=NULL
builtin-tag.c: guard config parser from value=NULL
builtin-show-branch.c: guard config parser from value=NULL
builtin-reflog.c: guard config parser from value=NULL
builtin-log.c: guard config parser from value=NULL
builtin-config.c: guard config parser from value=NULL
builtin-commit.c: guard config parser from value=NULL
builtin-branch.c: guard config parser from value=NULL
...
|
|
If pack-objects hit the memory limit, it deletes objects from the delta
window.
This patch make it only delete the data, which is recomputed, if needed again.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git pack-objects" has the option --max-pack-size to limit the file
size of the packs to a certain amount of bytes. On platforms where
the pack file size is limited by filesystem constraints, it is easy
to forget this option, and this option does not exist for "git gc"
to begin with.
So introduce a config variable to set the default maximum, but make
this overrideable by the command line.
Suggested by Tor Arvid Lund.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When partitioning the work amongst threads, dividing the number of
objects by the number of threads may return 0 when there are less
objects than threads; this will cause the subsequent code to segfault
when accessing list[sub_size-1]. Allow some threads to have
zero objects to work on instead of barfing, while letting others
to have more.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We somehow called deflateEnd() on a stream that we have called
deflateEnd() on already.
In fact, the second deflateEnd() has always been returning
Z_STREAM_ERROR. We just never checked the error return from
that particular deflateEnd().
The first one returns 0 for success. We might want to tighten
the check even more to check that.
Noticed by Marco.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A NUL byte at beginning of file, or just after a newline
would provoke an invalid buf[-1] access in a few places.
* builtin-grep.c (cmd_grep): Don't access buf[-1].
* builtin-pack-objects.c (get_object_list): Likewise.
* builtin-rev-list.c (read_revisions_from_stdin): Likewise.
* bundle.c (read_bundle_header): Likewise.
* server-info.c (read_pack_info_file): Likewise.
* transport.c (insert_packed_refs): Likewise.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A mutex and a condition variable is allocated for each thread and torn
down when the thread terminates. However, for certain workloads it can
happen that some threads are actually not started at all. In this case
we would leak the mutex and condition variable. Now we allocate them only
for those threads that are actually started.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the threaded pack-objects code the main thread and the worker threads
must mutually signal that they have assigned a new pack of work or have
completed their work, respectively. Previously, the code used mutexes that
were locked in one thread and unlocked from a different thread, which is
bogus (and happens to work on Linux).
Here we rectify the implementation by using condition variables: There is
one condition variable on which the main thread waits until a thread
requests new work; and each worker thread has its own condition variable
on which it waits until it is assigned new work or signaled to terminate.
As a cleanup, the worker threads are spawned only after the initial work
packages have been assigned.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code that splits the object list amongst work threads tries to do so
on "path" boundaries not to prevent good delta matches. However, in
some cases, a few paths may largely dominate the hash distribution and
it is not possible to have good load balancing without ignoring those
boundaries.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The current method consists of a master thread serving chunks of objects
to work threads when they're done with their previous chunk. The issue
is to determine the best chunk size: making it too large creates poor
load balancing, while making it too small has a negative effect on pack
size because of the increased number of chunk boundaries and poor delta
window utilization.
This patch implements a completely different approach by initially
splitting the work in large chunks uniformly amongst all threads, and
whenever a thread is done then it steals half of the remaining work from
another thread with the largest amount of unprocessed objects.
This has the advantage of greatly reducing the number of chunk boundaries
with an almost perfect load balancing.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is currently sorted and then walked backward. Not only this doesn't
feel natural for my poor brain, but it would make the next patch less
obvious as well.
So reverse the sort order, and reverse the list walking direction,
which effectively produce the exact same end result as before.
Also bring the relevant comment nearer the actual code and adjust it
accordingly, with minor additional clarifications.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The wrong value was substracted from delta_cache_size when replacing
a cached delta, as trg_entry->delta_size was used after the old size
had been replaced by the new size.
Noticed by Linus.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function mark_tree_uninteresting() assumed that the tree entries
are blob when they are not trees. This is not so. Since we do
not traverse into submodules (yet), the gitlinks should be ignored.
In general, we should try to start moving away from using the
"S_ISLNK()" like things for internal git state. It was a mistake to
just assume the numbers all were same across all systems in the first
place. This implementation converts to the "object_type", and then
uses a case statement.
Noticed by Ilari on IRC.
Test script taken from an earlier version by Dscho.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* np/pack:
pack-objects: get rid of an ugly cast
make the pack index version configurable
Conflicts:
builtin-pack-objects.c
|
|
... when calling write_idx_file().
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is a good idea to use pack index version 2 all the time since it has
proper protection against propagation of certain pack corruptions when
repacking which is not possible with index version 1, as demonstrated
in test t5302.
Hence this config option.
The default is still pack index version 1.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This one triggers only when git-pack-objects is called with
--all-progress and --stdout which is the combination used by
git-push.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since it is now OK to pass a null pointer to display_progress() and
stop_progress() resulting in a no-op, then we can simplify the code
and remove a bunch of lines by not making those calls conditional all
the time.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This allows for better management of progress "object" existence,
as well as making the progress display implementation more independent
from its callers.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Recently I was referred to the Grammar Police as the git-pack-objects
progress message 'Deltifying %u objects' is considered to be not
proper English to at least some small but vocal segment of the
English speaking population. Techncially we are applying delta
compression to these objects at this stage, so the new term is
slightly more acceptable to the Grammar Police but is also just
as correct.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
Two functions, namely write_idx_file() and open_pack_file(), currently
return a const pointer. However that pointer is either a copy of the
first argument, or set to a malloc'd buffer when that first argument
is null. In the later case it is wrong to qualify that pointer as const
since ownership of the buffer is transferred to the caller to dispose of,
and obviously the free() function is not meant to be passed const
pointers.
Making the return pointer not const causes a warning when the first
argument is returned since that argument is also marked const.
The correct thing to do is therefore to remove the const qualifiers,
avoiding the need for ugly casts only to silence some warnings.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
To keep things well layered, sha1close() now returns the file descriptor
when it doesn't close the file.
An ugly cast was added to the return of write_idx_file() to avoid a
warning. A proper fix will come separately.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
... so don't even try in that case, and save another useless line of
progress display.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
Each progress can be on a single line instead of two.
[sp: Changed "Checking files out" to "Checking out files" at
Johannes Sixt's suggestion as it better explains the
action that is taking place]
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
* jc/autogc:
git-gc --auto: run "repack -A -d -l" as necessary.
git-gc --auto: restructure the way "repack" command line is built.
git-gc --auto: protect ourselves from accumulated cruft
git-gc --auto: add documentation.
git-gc --auto: move threshold check to need_to_gc() function.
repack -A -d: use --keep-unreachable when repacking
pack-objects --keep-unreachable
Export matches_pack_name() and fix its return value
Invoke "git gc --auto" from commit, merge, am and rebase.
Implement git gc --auto
|
|
This new option is meant to be used in conjunction with the
options "git repack -a -d" usually invokes the underlying
pack-objects with. When this option is given, objects unreachable
from the refs in packs named with --unpacked= option are added
to the resulting pack, in addition to the reachable objects that
are not in packs marked with *.keep files.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These empty statement marcos can solicit bogus "statement with no effect"
warnings; squelch them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Found by Jeff King.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This adds a --threads=<n> parameter to 'git pack-objects' with
documentation.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Try to keep object with the same name hash together.
Suggested by Martin Koegler.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With this, each thread get repeatedly assigned the next available chunk of
objects to process until the whole list is done. The idea is to have
reasonably small chunks so that all CPUs remain busy with a minimum
number of threads for as long as there is data to process.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
this is still rough, hence it is disabled by default. You need to compile
with "make THREADED_DELTA_SEARCH=1 ..." at the moment.
Threading is done on different portions of the object list to be
deltified. This is currently done by spliting the list into n parts and
then a thread is spawned for each of them. A better method would consist
of spliting the list into more smaller parts and have the n threads
pick the next part available.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is to help threadification of the delta search code, with a bonus
consistency check.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is to help threadification of delta searching.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Not all objects are subject to deltification, so avoid carrying those
along, and provide the real count to progress display.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is based on Martin Koegler's idea to keep the object that
was successfully used as the base of the delta when it is about
to fall off the edge of the window. Instead of doing so only
for the objects at the edge of the window, this makes the window
a lru eviction mechanism. If an entry is used as a base, it is
moved to the last of the queue to be evicted.
This is a quick-and-dirty implementation, as it keeps the original
implementation of the data structure used for the window. This
originally was done as an array, not as an array of pointers,
because it was meant to be used as a cyclic FIFO buffer and a
plain array avoids an extra pointer indirection, while its FIFOness
eant that we are not "moving" the entries like this patch does.
The runtime from three versions were comparable. It seems to
make the resulting chain even shorter, which can only be good.
(stock "master") 15782196 bytes
chain length = 1: 2972 objects
chain length = 2: 2651 objects
chain length = 3: 2369 objects
chain length = 4: 2121 objects
chain length = 5: 1877 objects
...
chain length = 46: 490 objects
chain length = 47: 515 objects
chain length = 48: 527 objects
chain length = 49: 570 objects
chain length = 50: 408 objects
(with your patch) 15745736 bytes (0.23% smaller)
chain length = 1: 3137 objects
chain length = 2: 2688 objects
chain length = 3: 2322 objects
chain length = 4: 2146 objects
chain length = 5: 1824 objects
...
chain length = 46: 503 objects
chain length = 47: 509 objects
chain length = 48: 536 objects
chain length = 49: 588 objects
chain length = 50: 357 objects
(with this patch) 15612086 bytes (1.08% smaller)
chain length = 1: 4831 objects
chain length = 2: 3811 objects
chain length = 3: 2964 objects
chain length = 4: 2352 objects
chain length = 5: 1944 objects
...
chain length = 46: 327 objects
chain length = 47: 353 objects
chain length = 48: 304 objects
chain length = 49: 298 objects
chain length = 50: 135 objects
[jc: this is with code simplification follow-up from Nico]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code favoring shallower deltas when size is equal was triggered
only when previous delta was also cached. There should be no relation
between cached deltas and same sized deltas.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When a thin pack wants to send a tree object at "sub/dir", and
the commit that is common between the sender and the receiver
that is used as the base object has a subproject at that path,
we should not try to use the data at "sub/dir" of the base tree
as a tree object. It is not a tree to begin with, and more
importantly, the commit object there does not have to even
exist.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Not only are they unused, but the order in the function declaration
and the actual usage don't match.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
xmkstemp() performs error checking and prints a standard error message when
an error occur.
Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 5a235b5e was missing this little detail. Otherwise your pack
will explode.
Problem noted by Brian Downing.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The delta depth doesn't have to be stored in the global object array
structure since it is only used during the deltification pass.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This adds an option (--window-memory=N) and configuration variable
(pack.windowMemory = N) to limit the memory size of the pack-objects
delta search window. This works by removing the oldest unpacked objects
whenever the total size goes above the limit. It will always leave
at least one object, though, so as not to completely eliminate the
possibility of computing deltas.
This is an extra limit on top of the normal window size (--window=N);
the window will not dynamically grow above the fixed number of entries
specified to fill the memory limit.
With this, repacking a repository with a mix of large and small objects
is possible even with a very large window.
Cleaner and correct circular buffer handling courtesy of Nicolas Pitre.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new try_delta heuristic. Don't bother trying to make a delta if
the target object size is much smaller (currently 1/32) than the source,
as it's very likely not going to get a match. Even if it does, you will
have to read at least 32x the size of the new file to reassemble it,
which isn't such a good deal. This leads to a considerable performance
improvement when deltifying a mix of small and large files with a very
large window, because you don't have to wait for the large files to
percolate out of the window before things start going fast again.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We already apply a bias on the initial delta attempt with max_size being
a function of the base object depth. This has the effect of favoring
shallower deltas even if deeper deltas could be smaller, and therefore
creating a wider delta tree (see commits 4e8da195 and c3b06a69).
This principle should also be applied to all delta attempts for the same
object and not only the first attempt. With this the criteria for the
best delta is not only its size but also its depth, so that a shallower
delta might be selected even if it is larger than a deeper one. Even if
some deltas get larger, they allow for wider delta trees making the
depth limit less quickly reached and therefore better deltas can be
subsequently found, keeping the resulting pack size even smaller.
Runtime access to the pack should also benefit from shallower deltas.
Testing on different repositories showed slighter faster repacks,
smaller resulting packs, and a much nicer curve for delta depth
distribution with no more peak at the maximum depth level.
Improvements are even more significant with smaller depth limits.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change "try_delta" so that if it finds a delta that has the same size
but shallower depth than the existing delta, it will prefer the
shallower one. This makes certain delta trees vastly less deep.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This patch unifies the write_index_file functions in
builtin-pack-objects.c and index-pack.c. As the name
"index" is overloaded in git, move in the direction of
using "idx" and "pack idx" when refering to the pack index.
There should be no change in functionality.
Signed-off-by: Geert Bosch <bosch@gnat.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Two issues here:
1) git-repack -a --max-pack-size=10 on the GIT repo dies pretty quick.
There is a lot of confusion about deltas that were suposed to be
reused from another pack but that get stored undeltified due to pack
limit and object size doesn't match entry->size anymore. This test
is not really worth the complexity for determining when it is valid
so get rid of it.
2) If pack limit is reached, the object buffer is freed, including when
it comes from a cached delta data. In practice the object will be
stored in a subsequent pack undeltified, but let's make sure no
pointer to freed data subsists by clearing entry->delta_data.
I also reorganized that code a bit to make it more readable.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Creating deltas between big blobs is a CPU and memory intensive task.
In the writing phase, all (not reused) deltas are redone.
This patch adds support for caching deltas from the deltifing phase, so
that that the writing phase is faster.
The caching is limited to small deltas to avoid increasing memory usage very much.
The implemented limit is (memory needed to create the delta)/1024.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
If builtin-pack-objects runs out of memory while finding
the best deltas, it bails out with an error.
If the delta index creation fails (because there is not enough memory),
we can downgrade the error message to a warning and continue with the
next object.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
* 'dh/repack' (early part):
Ensure git-repack -a -d --max-pack-size=N deletes correct packs
pack-objects: clarification & option checks for --max-pack-size
git-repack --max-pack-size: add option parsing to enable feature
git-repack --max-pack-size: split packs as asked by write_{object,one}()
git-repack --max-pack-size: write_{object,one}() respect pack limit
git-repack --max-pack-size: new file statics and code restructuring
Alter sha1close() 3rd argument to request flush only
|
|
Explain the special code for detecting a corner-case error,
and complain about --stdout & --max-pack-size being used together.
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
As we do not consider objects marked as "no-delta" early, there
is no point to check if the other objects already in the delta
window are marked as such -- "no-delta" objects will not enter
the window to begin with.
Pointed out by Nico.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
This teaches pack-objects to use .gitattributes mechanism so
that the user can specify certain blobs are not worth spending
CPU cycles to attempt deltification.
The name of the attrbute is "delta", and when it is set to
false, like this:
== .gitattributes ==
*.jpg -delta
they are always stored in the plain-compressed base object
representation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Instead of giving a hash for grouping, pass fullname to add_object_entry().
I want to add "do not try deltifying this object" bit to object_entry based on
the settings in .gitattributes, and hashing the name down too early would
interfere with that plan.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Add --max-pack-size parsing and usage messages.
Upgrade git-repack.sh to handle multiple packfile names,
and build packfiles in GIT_OBJECT_DIRECTORY not GIT_DIR.
Update documentation.
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Rewrite write_pack_file() to break to a new packfile
whenever write_object/write_one request it, and
correct the header's object count in the previous packfile.
Change write_index_file() to write an index
for just the objects in the most recent packfile.
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
With --max-pack-size, generate the appropriate write limit
for each object and check against it before each group of writes.
Update delta usability rules to handle base being in a previously-
written pack. Inline sha1write_compress() so we know the
exact size of the written data when it needs to be compressed.
Detect and return write "failure".
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Add "pack_size_limit", the limit specified by --max-pack-size,
"written_list", the list of objects written to the current pack,
and "nr_written", the number of objects in written_list.
Put "base_name" at file scope again and add forward declarations.
Move write_index_file() call from cnd_pack_objects() to
write_pack_file() since only the latter will know how
many times to call write_index_file().
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
* dh/pack:
Custom compression levels for objects and packs
|
|
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Now that we encourage and actively preserve objects in a packed form
more agressively than we did at the time the new loose object format and
core.legacyheaders were introduced, that extra loose object format
doesn't appear to be worth it anymore.
Because the packing of loose objects has to go through the delta match
loop anyway, and since most of them should end up being deltified in
most cases, there is really little advantage to have this parallel loose
object format as the CPU savings it might provide is rather lost in the
noise in the end.
This patch gets rid of core.legacyheaders, preserve the legacy format as
the only writable loose object format and deprecate the other one to
keep things simpler.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Currently non deltified object data is always reused when possible.
This means that any change to core.compression has no effect on those
objects as they don't get recompressed when repacking them.
Let's add a --no-reuse-object flag to git-repack in order to force
recompression of all objects when desired.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
If the progress bar ends up in a box, better provide a title for it too.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Instead of having this code duplicated in multiple places, let's have
a common interface for progress display. If someday someone wishes to
display a cheezy progress bar instead then only one file will have to
be changed.
Note: I left merge-recursive.c out since it has a strange notion of
progress as it apparently increase the expected total number as it goes.
Someone with more intimate knowledge of what that is supposed to mean
might look at converting it to the common progress interface.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
The earlier quickfix forced world-readable permission bits. This
updates it to honor umask and core.sharedrepository settings.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
mkstemp() often creates the file in 0600 which means the
resulting packfile is not readable by anybody other than the
repository owner. Force 0644 for now, even though this is not
strictly correct.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
The sorted-by-sha ans sorted-by-type arrays are no more.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
With large amount of objects, check_object() is really trashing the pack
sliding map and the filesystem cache. It has a completely random access
pattern especially with old objects where delta replay jumps back and
forth all over the pack.
This patch improves things by:
1) sorting objects by their offset in pack before calling check_object()
so the pack access pattern is linear;
2) recording the object type at add_object_entry() time since it is
already known in most cases;
3) recording the pack offset even for preferred_base objects;
4) avoid calling sha1_object_info() if all possible.
This limits pack accesses to the bare minimum and makes them perfectly
linear.
In the process check_object() was made more clear (to me at least).
Note: I thought about walking the sorted_by_offset list backward in
get_object_details() so if a pack happens to be larger than the available
file cache, then the cache would have been populated with useful data from
the beginning of the pack already when find_deltas() is called. Strangely,
testing (on Linux) showed absolutely no performance difference.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
It currently aliases delta_size on the principle that reused deltas won't
go through the whole delta matching loop hence delta_size was unused.
This is not true if given delta doesn't find its base in the pack though.
But we need that information even for whole object data reuse.
Well in short the current state looks awful and is prone to bugs. It just
works fine now because try_delta() tests trg_entry->delta before using
trg_entry->delta_size, but that is a bit subtle and I was wondering for a
while why things just worked fine... even if I'm guilty of having
introduced this abomination myself in the first place.
Let's do the sensible thing instead with no ambiguity, which is to have
a separate variable for in_pack_header_size. This might even help future
optimizations.
While at it, let's reorder some struct object_entry members so they all
align well with their own width, regardless of the architecture or the
size of off_t. Some memory saving is to be expected with this alone.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
Because we don't have to know the SHA1 h(hence the name) of the pack
up front anymore, let's get rid of yet another global sorted object list
and sort them only in write_index_file(), then compute the object list
SHA1 on the fly.
This has the advantage of saving another chunk of memory, and the sorted
list SHA1 won't be computed needlessly on servers during a fetch.
Of course the cunning plan is also to make write_index_file() much like
the function with the same name in index-pack.c for an eventual easy
sharing.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
This capability is practically never useful, and therefore never tested,
because it is fairly unlikely that the requested pack will be already
available. Furthermore it is of little gain over the ability to reuse
existing pack data.
In fact the ability to change delta type on the fly when reusing delta
data is a nice thing that has almost no cost and allows greater backward
compatibility with a client's capabilities than if the client is blindly
sent a whole pack without any discrimination.
And this "feature" is simply in the way of other cleanups.
Let's get rid of it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|