aboutsummaryrefslogtreecommitdiffstats
path: root/vcs-svn/svndump.c
AgeCommit message (Collapse)AuthorFilesLines
2020-08-13drop vcs-svn experimentJeff King1-540/+0
The code in vcs-svn was started in 2010 as an attempt to build a remote-helper for interacting with svn repositories (as opposed to git-svn). However, we never got as far as shipping a mature remote helper, and the last substantive commit was e99d012a6bc in 2012. We do have a git-remote-testsvn, and it is even installed as part of "make install". But given the name, it seems unlikely to be used by anybody (you'd have to explicitly "git clone testsvn::$url", and there have been zero mentions of that on the mailing list since 2013, and even that includes the phrase "you might need to hack a bit to get it working properly"[1]). We also ship contrib/svn-fe, which builds on the vcs-svn work. However, it does not seem to build out of the box for me, as the link step misses some required libraries for using libgit.a. Curiously, the original build breakage bisects for me to eff80a9fd9 (Allow custom "comment char", 2013-01-16), which seems unrelated. There was an attempt to fix it in da011cb0e7 (contrib/svn-fe: fix Makefile, 2014-08-28), but on my system that only switches the error message. So it seems like the result is not really usable by anybody in practice. It would be wonderful if somebody wanted to pick up the topic again, and potentially it's worth carrying around for that reason. But the flip side is that people doing tree-wide operations have to deal with this code. And you can see the list with (replace "HEAD" with this commit as appropriate): { echo "--" git diff-tree --diff-filter=D -r --name-only HEAD^ HEAD } | git log --no-merges --oneline e99d012a6bc.. --stdin which shows 58 times somebody had to deal with the code, generally due to a compile or test failure, or a tree-wide style fix or API change. Let's drop it and let anybody who wants to pick it up do so by resurrecting it from the git history. As a bonus, this also reduces the size of a stripped installation of Git from 21MB to 19MB. [1] https://lore.kernel.org/git/CALkWK0mPHzKfzFKKpZkfAus3YVC9NFYDbFnt+5JQYVKipk3bQQ@mail.gmail.com/ Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07vcs-svn: release strbuf after use in end_revision()Rene Scharfe1-0/+1
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-26Merge branch 'jn/vcs-svn-cleanup'Junio C Hamano1-17/+16
Code clean-up. * jn/vcs-svn-cleanup: vcs-svn: move remaining repo_tree functions to fast_export.h vcs-svn: remove repo_delete wrapper function vcs-svn: remove custom mode constants vcs-svn: remove more unused prototypes and declarations
2017-08-26Merge branch 'bc/vcs-svn-cleanup'Junio C Hamano1-4/+4
Code clean-up. * bc/vcs-svn-cleanup: vcs-svn: rename repo functions to "svn_repo" vcs-svn: remove unused prototypes
2017-08-23vcs-svn: move remaining repo_tree functions to fast_export.hJonathan Nieder1-3/+2
These used to be for manipulating the in-memory repo_tree structure, but nowadays they are convenience wrappers to handle a few git-vs-svn mismatches: 1. Git does not track empty directories but Subversion does. When looking up a path in git that Subversion thinks exists and finding nothing, we can safely assume that the path represents a directory. This is needed when a later Subversion revision modifies that directory. 2. Subversion allows deleting a file by copying. In Git fast-import we have to handle that more explicitly as a deletion. These are details of the tool's interaction with git fast-import. Move them to fast_export.c, where other such details are handled. This way the function names do not start with a repo_ prefix that would clash with the repository object introduced in v2.14.0-rc0~38^2~16 (repository: introduce the repository object, 2017-06-22) or an svn_ prefix that would clash with libsvn (in case someone wants to link this code with libsvn some day). Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23vcs-svn: remove repo_delete wrapper functionJonathan Nieder1-2/+2
Since v1.7.10-rc0~118^2~4^2~4^2~3 (vcs-svn: pass paths through to fast-import, 2010-12-13) this is an alias for fast_export_delete. Remove the unnecessary layer of indirection. No functional change intended. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23vcs-svn: remove custom mode constantsJonathan Nieder1-12/+12
In the rest of Git, these modes are spelled as S_IFDIR, S_IFREG | 0644, S_IFREG | 0755, and S_IFLNK. Use the same constants in svn-fe for simplicity and consistency. No functional change intended. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20vcs-svn: rename repo functions to "svn_repo"brian m. carlson1-4/+4
There were several functions in the Subversion code that started with "repo_". This namespace is also used by the Git struct repository code. Rename these functions to start with "svn_repo" to avoid any future conflicts. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-27timestamp_t: a new data type for timestampsJohannes Schindelin1-1/+1
Git's source code assumes that unsigned long is at least as precise as time_t. Which is incorrect, and causes a lot of problems, in particular where unsigned long is only 32-bit (notably on Windows, even in 64-bit versions). So let's just use a more appropriate data type instead. In preparation for this, we introduce the new `timestamp_t` data type. By necessity, this is a very, very large patch, as it has to replace all timestamps' data type in one go. As we will use a data type that is not necessarily identical to `time_t`, we need to be very careful to use `time_t` whenever we interact with the system functions, and `timestamp_t` everywhere else. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-09vcs-svn: use error_errno()Nguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-07remote-svn: add incremental importFlorian Achleitner1-5/+5
Search for a note attached to the ref to update and read it's 'Revision-number:'-line. Start import from the next svn revision. If there is no next revision in the svn repo, svnrdump terminates with a message on stderr an non-zero return value. This looks a little weird, but there is no other way to know whether there is a new revision in the svn repo. On the start of an incremental import, the parent of the first commit in the fast-import stream is set to the branch name to update. All following commits specify their parent by a mark number. Previous mark files are currently not reused. Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com> Acked-by: David Michael Barr <b@rr-dav.id.au> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-07Create a note for every imported commit containing svn metadataFlorian Achleitner1-2/+19
To provide metadata from svn dumps for further processing, e.g. branch detection, attach a note to each imported commit that stores additional information. The notes are currently hard-coded in refs/notes/svn/revs. Currently the following lines from the svn dump are directly accumulated in the note. This can be refined as needed. - "Revision-number" - "Node-path" - "Node-kind" - "Node-action" - "Node-copyfrom-path" - "Node-copyfrom-rev" Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com> Acked-by: David Michael Barr <b@rr-dav.id.au> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-07remote-svn, vcs-svn: Enable fetching to private refsFlorian Achleitner1-6/+6
The reference to update by the fast-import stream is hard-coded. When fetching from a remote the remote-helper shall update refs in a private namespace, i.e. a private subdir of refs/. This namespace is defined by the 'refspec' capability, that the remote-helper advertises as a reply to the 'capabilities' command. Extend svndump and fast-export to allow passing the target ref. Update svn-fe to be compatible. Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com> Acked-by: David Michael Barr <b@rr-dav.id.au> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-07Add svndump_init_fd to allow reading dumps from arbitrary FDsFlorian Achleitner1-4/+18
The existing function only allows reading from a filename or from stdin. Allow passing of a FD and an additional FD for the back report pipe. This allows us to retrieve the name of the pipe in the caller. Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com> Acked-by: David Michael Barr <b@rr-dav.id.au> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-13Merge branch 'jn/vcs-svn'Junio C Hamano1-18/+19
vcs-svn updates to clean-up compilation, lift 32-bit limitations, etc. * jn/vcs-svn: vcs-svn: allow 64-bit Prop-Content-Length vcs-svn: suppress a signed/unsigned comparison warning vcs-svn: suppress a signed/unsigned comparison warning vcs-svn: suppress signed/unsigned comparison warnings vcs-svn: use strstr instead of memmem vcs-svn: use constcmp instead of prefixcmp vcs-svn: simplify cleanup in apply_one_window vcs-svn: avoid self-assignment in dummy initialization of pre_off vcs-svn: drop no-op reset methods vcs-svn: suppress -Wtype-limits warning vcs-svn: allow import of > 4GiB files vcs-svn: rename check_overflow and its arguments for clarity
2012-07-05vcs-svn: allow 64-bit Prop-Content-LengthJonathan Nieder1-15/+18
Currently the vcs-svn/ library only pays attention to the presence of the Prop-Content-Length field and doesn't care about its value, but some day we might care about the value. Parse it as an off_t instead of arbitrarily limiting to 32 bits for intuitiveness. So now you can import from a dump with more than 2 GiB of properties for a node. In practice that isn't likely to happen often, and this is mostly meant as a cleanup. Based-on-patch-by: David Barr <davidbarr@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05vcs-svn: use constcmp instead of prefixcmpDavid Barr1-1/+1
Since the length of t is already known, we can simplify a little by using memcmp() instead of strncmp() to carry out a prefix comparison. All nearby code already does this. Noticed in the standalone svn-dump-fast-export project which has not needed to implement prefixcmp() yet. Signed-off-by: David Barr <davidbarr@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05vcs-svn: drop no-op reset methodsDavid Barr1-2/+0
Since v1.7.5~42^2~6 (vcs-svn: remove buffer_read_string) buffer_reset() does nothing thus fast_export_reset() also. Signed-off-by: David Barr <davidbarr@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-04-30remove superfluous newlines in error messagesPete Wyckoff1-2/+2
The error handling routines add a newline. Remove the duplicate ones in error messages. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02vcs-svn: allow import of > 4GiB filesJonathan Nieder1-6/+15
There is no reason in principle that an svn-format dump would not be able to represent a file whose length does not fit in a 32-bit integer. Use off_t consistently to represent file lengths (in place of using uint32_t in some contexts) so we can handle that. Most svn-fe code is already ready to do that without this patch and passes values of type off_t around. The type mismatch from stragglers was noticed with gcc -Wtype-limits. While at it, tighten the parsing of the Text-content-length field to make sure it is a number and does not overflow, and tighten other overflow checks as that value is passed around and manipulated. Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02vcs-svn: allow import of > 4GiB filesJonathan Nieder1-6/+15
There is no reason in principle that an svn-format dump would not be able to represent a file whose length does not fit in a 32-bit integer. Use off_t consistently (instead of uint32_t) to represent file lengths so we can handle that. Most of our code is already ready to do that without this patch and already passes values of type off_t around. The type mismatch due to stragglers was noticed with gcc -Wtype-limits. Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-01-27Merge branch 'svn-fe' of git://repo.or.cz/git/jrn into jn/svn-feJunio C Hamano1-36/+81
This simplifies svn-fe a great deal and fulfills a longstanding wish: support for dumps with deltas in them, and incremental imports. The cost is that commandline usage of the svn-fe tool becomes a little more complicated since it no longer keeps state itself but instead reads blobs back from fast-import in order to copy them between revisions and apply deltas to them. Also removes a couple of custom data structures and replaces them with strbufs like other parts of Git. * 'svn-fe' of git://repo.or.cz/git/jrn: (32 commits) vcs-svn: reset first_commit_done in fast_export_init vcs-svn: do not initialize report_buffer twice vcs-svn: avoid hangs from corrupt deltas vcs-svn: guard against overflow when computing preimage length vcs-svn: cap number of bytes read from sliding view test-svn-fe: split off "test-svn-fe -d" into a separate function vcs-svn: implement text-delta handling vcs-svn: let deltas use data from preimage vcs-svn: let deltas use data from postimage vcs-svn: verify that deltas consume all inline data vcs-svn: implement copyfrom_data delta instruction vcs-svn: read instructions from deltas vcs-svn: read inline data from deltas vcs-svn: read the preimage when applying deltas vcs-svn: parse svndiff0 window header vcs-svn: skeleton of an svn delta parser vcs-svn: make buffer_read_binary API more convenient vcs-svn: learn to maintain a sliding view of a file Makefile: list one vcs-svn/xdiff object or header per line vcs-svn: avoid using ls command twice ... Conflicts: Makefile contrib/svn-fe/svn-fe.txt
2011-05-26vcs-svn: implement text-delta handlingDavid Barr1-4/+9
Handle input in Subversion's dumpfile format, version 3. This is the format produced by "svnrdump dump" and "svnadmin dump --deltas", and the main difference between v3 dumpfiles and the dumpfiles already handled is that these can include nodes whose properties and text are expressed relative to some other node. To handle such nodes, we find which node the text and properties are based on, handle its property changes, use the cat-blob command to request the basis blob from the fast-import backend, use the svndiff0_apply() helper to apply the text delta on the fly, writing output to a temporary file, and then measure that postimage file's length and write its content to the fast-import stream. The temporary postimage file is shared between delta-using nodes to avoid some file system overhead. The svn-fe interface needs to be more complicated to accomodate the backward flow of information from the fast-import backend to svn-fe. The backflow fd is not needed when parsing streams without deltas, though, so existing scripts using svn-fe on v2 dumps should continue to work. NEEDSWORK: generalize interface so caller sets the backflow fd, close temporary file before exiting Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-05-26Merge branch 'db/svn-fe-code-purge' into svn-feJonathan Nieder1-14/+18
* db/svn-fe-code-purge: vcs-svn: drop obj_pool vcs-svn: drop treap vcs-svn: drop string_pool vcs-svn: pass paths through to fast-import Conflicts: vcs-svn/fast_export.c vcs-svn/fast_export.h vcs-svn/repo_tree.c vcs-svn/repo_tree.h vcs-svn/string_pool.c vcs-svn/svndump.c vcs-svn/trp.txt
2011-05-26Merge branch 'db/vcs-svn-incremental' into svn-feJonathan Nieder1-25/+61
This teaches svn-fe to incrementally import into an existing repository (at last!) at the expense of less convenient UI. Think of it as growing pains. This opens the door to many excellent things, and it would be a bad idea to discourage people from building on it for much longer. * db/vcs-svn-incremental: vcs-svn: avoid using ls command twice vcs-svn: use mark from previous import for parent commit vcs-svn: handle filenames with dq correctly vcs-svn: quote paths correctly for ls command vcs-svn: eliminate repo_tree structure vcs-svn: add a comment before each commit vcs-svn: save marks for imported commits vcs-svn: use higher mark numbers for blobs vcs-svn: set up channel to read fast-import cat-blob response Conflicts: t/t9010-svn-fe.sh vcs-svn/fast_export.c vcs-svn/fast_export.h vcs-svn/repo_tree.c vcs-svn/svndump.c
2011-04-22sparse: Fix some "symbol not declared" warningsRamsay Jones1-0/+1
In particular, sparse issues the "symbol 'a_symbol' was not declared. Should it be static?" warnings for the following symbols: attr.c:468:12: 'git_etc_gitattributes' attr.c:476:5: 'git_attr_system' vcs-svn/svndump.c:282:6: 'svndump_read' vcs-svn/svndump.c:417:5: 'svndump_init' vcs-svn/svndump.c:432:6: 'svndump_deinit' vcs-svn/svndump.c:445:6: 'svndump_reset' The symbols in attr.c only require file scope, so we add the static modifier to their declaration. The symbols in vcs-svn/svndump.c are external symbols, and they already have extern declarations in the "svndump.h" header file, so we simply include the header in svndump.c. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-29vcs-svn: a void function shouldn't try to return somethingMichael Witten1-1/+2
As v1.7.4-rc0~184 (2010-10-04) and C99 §6.8.6.4.1 remind us, standard C does not permit returning an expression of type void, even for a tail call. Noticed with gcc -pedantic: vcs-svn/svndump.c: In function 'handle_node': vcs-svn/svndump.c:213:3: warning: ISO C forbids 'return' with expression, in function returning void [-pedantic] [jn: with simplified log message] Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26vcs-svn: avoid using ls command twiceDavid Barr1-2/+1
Currently there are two functions to retrieve the mode and content at a path: const char *repo_read_path(const uint32_t *path); uint32_t repo_read_mode(const uint32_t *path) Replace them with a single function with two return values. This means we can use one round-trip to get the same information from fast-import that previously took two. Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26vcs-svn: handle log message with embedded NULJonathan Nieder1-1/+1
Pass the log message by strbuf instead of as a C-style string and use fwrite instead of printf to write it to fast-import so embedded '\0' bytes can be preserved. Currently "git log" doesn't show the embedded NULs but "git cat-file commit" can. While at it, stop including system headers from repo_tree.h. git source files need to include git-compat-util.h (or cache.h or builtin.h) sooner to ensure the appropriate feature test macros are defined. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26vcs-svn: avoid unnecessary copying of log message and authorJonathan Nieder1-10/+10
Use strbuf_swap when storing the svn:log and svn:author properties, so pointers to rather than the contents of buffers get copied. The main effect should be to make the code a little easier to read. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26vcs-svn: make reading of properties binary-safeJonathan Nieder1-14/+10
svn-fe errors out on revision 59151 of the ASF repository: fatal: invalid dump: unexpected end of file The proximate cause is a property with an embedded NUL character. Previously such anomalies were ignored but commit c9d1c8ba (2010-12-28) introduced a check strlen(val) == len to avoid reading uninitialized data when a property list ends early and unfortunately this test does not distinguish between "foo" followed by EOF and the string "foo\0bar\0baz". Fix it by using buffer_read_binary to read to a strbuf and checking the actual length read. Most consumers of properties still use C-style strings, so in practice an author or log message with embedded NULs will be truncated, but a least this way svn-fe won't error out (fixing the regression). Reported-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22Merge branch 'db/length-as-hash' into svn-feJonathan Nieder1-69/+105
* db/length-as-hash: vcs-svn: use strchr to find RFC822 delimiter vcs-svn: implement perfect hash for top-level keys vcs-svn: implement perfect hash for node-prop keys Conflicts: vcs-svn/svndump.c
2011-03-22vcs-svn: pass paths through to fast-importDavid Barr1-15/+19
Now that there is no internal representation of the repo, it is not necessary to tokenise paths. Use strbuf instead and bypass string_pool. This means svn-fe can handle arbitrarily long paths (as long as a strbuf can fit them), with arbitrarily many path components. While at it, since we now treat paths in their entirety, only quote when necessary. Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22Merge branch 'db/strbufs-for-metadata' into db/svn-fe-code-purgeJonathan Nieder1-37/+34
* db/strbufs-for-metadata: vcs-svn: use strbuf for author, UUID, and URL vcs-svn: use strbuf for revision log Conflicts: vcs-svn/fast_export.c vcs-svn/fast_export.h vcs-svn/repo_tree.c vcs-svn/svndump.c
2011-03-22Merge branch 'db/length-as-hash' (early part) into db/svn-fe-code-purgeJonathan Nieder1-73/+129
* 'db/length-as-hash' (early part): vcs-svn: implement perfect hash for top-level keys vcs-svn: implement perfect hash for node-prop keys vcs-svn: improve reporting of input errors vcs-svn: make buffer_copy_bytes return length read vcs-svn: make buffer_skip_bytes return length read vcs-svn: improve support for reading large files Conflicts: vcs-svn/fast_export.c vcs-svn/svndump.c
2011-03-22vcs-svn: use strchr to find RFC822 delimiterDavid Barr1-2/+5
This is a small optimisation (4% reduction in user time) but is the largest artifact within the parsing portion of svndump.c Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22vcs-svn: implement perfect hash for top-level keysDavid Barr1-50/+59
Instead of interning property names and comparing their string_pool keys, look them up in a table by string length, which should be about as fast. Another small step towards removing dependence on string_pool altogether. Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22vcs-svn: implement perfect hash for node-prop keysDavid Barr1-19/+43
Instead of interning property names and comparing their string_pool keys, look them up in a table by string length, which should be about as fast. This is a small step towards removing dependence on string_pool. Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22vcs-svn: use strbuf for author, UUID, and URLDavid Barr1-17/+28
Use strbufs and strings instead of interned strings for values of rev, dump, and node fields that happen to be strings. After this change, the only remaining string_pool use is for paths in the repo_tree API and internals. Functional change: treat an empty author, UUID, or URL as none at all. So for example, in repos where the first revision has an empty svn:author property, the first rev will be treated as by "nobody" rather than by a person with empty name and email address created by prepending an @ sign to the repository UUID. Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22vcs-svn: use strbuf for revision logDavid Barr1-20/+8
obj_pool is overkill for this application: all that is needed is a buffer that can resize from rev to rev to accomodate differently-sized strings. In the spirit of commit deadcef4 (2010-11-06), use a strbuf instead. This is a small step towards removing dependence on obj_pool.h. Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22vcs-svn: improve reporting of input errorsJonathan Nieder1-3/+26
Catch input errors and exit early enough to print a reasonable diagnosis based on errno. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: eliminate repo_tree structureJonathan Nieder1-17/+36
Rely on fast-import for information about previous revs. This requires always setting up backward flow of information, even for v2 dumps. On the plus side, it simplifies the code by quite a bit and opens the door to further simplifications. [db: adjusted to support final version of the cat-blob patch] [jn: avoiding hard-coding git's name for the empty tree for portability to other backends] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: add a comment before each commitJonathan Nieder1-7/+22
Current svn-fe produces output like this: blob mark :7382321 data 5 hello blob mark :7382322 data 5 Hello commit mark :3 [...] M 100644 :7382321 hello.c M 100644 :7382322 hello2.c This means svn-fe has to keep track of the paths modified in each commit and the corresponding marks, instead of dealing with each file as it arrives in input and then forgetting about it. A better strategy would be to use inline blobs: commit mark :3 [...] M 100644 inline hello.c data 5 hello [...] As a first step towards that, teach svn-fe to notice when the collection of blobs for each commit starts and write a comment ("# commit 3.") there. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: set up channel to read fast-import cat-blob responseDavid Barr1-0/+5
Set up some plumbing: teach the svndump lib to pass a file descriptor number to the fast_export lib, representing where cat-blob/ls responses can be read from, and add a get_response_line helper function to the fast_export lib to read a line from that file. Unfortunately this means that svn-fe needs file descriptor 3 to be redirected from somewhere (preferrably the cat-blob stream of a fast-import backend); otherwise it will fail: $ svndump <path> | svn-fe fatal: cannot read from file descriptor 3: Bad file descriptor For the moment, "svn-fe 3</dev/null" works as a workaround but it will not work for very long. A fast-import backend that can retrieve old commits is needed in order to be able to fulfill svn "Node-copyfrom-rev" requests that refer to revs from a previous run. [jn: with new change description] Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: simplify repo_modify_path and repo_copyJonathan Nieder1-3/+1
Restrict the repo_tree API to functions that are actually needed. - decouple reading the mode and content of dirents from other operations. - remove repo_modify_path. It is only used to read the mode from dirents. - remove the ability to use repo_read_mode on a missing path. The existing code only errors out in that case, anyway. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07vcs-svn: handle_node: use repo_read_pathJonathan Nieder1-10/+23
svn-fe processes each commit in two stages: first decide on the correct content for all paths and export the relevant blobs, then export a commit with the result. But we can keep less state and simplify svn-fe a great deal by exporting the commit in one step: use 'inline' blobs for each path and remember nothing. This way, the repo_tree structure could be eliminated, and we would get support for incremental imports 'for free'. Reorganize handle_node along these lines. This is just a code cleanup; the changes in repo_tree and handle_revision will come later. [db: backported to apply without text delta support] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26Merge commit 'jn/svn-fe' of git://github.com/gitster/git into svn-feJonathan Nieder1-67/+142
* git://github.com/gitster/git: vcs-svn: Allow change nodes for root of tree (/) vcs-svn: Implement Prop-delta handling vcs-svn: Sharpen parsing of property lines vcs-svn: Split off function for handling of individual properties vcs-svn: Make source easier to read on small screens vcs-svn: More dump format sanity checks vcs-svn: Reject path nodes without Node-action vcs-svn: Delay read of per-path properties vcs-svn: Combine repo_replace and repo_modify functions vcs-svn: Replace = Delete + Add vcs-svn: handle_node: Handle deletion case early vcs-svn: Use mark to indicate nodes with included text vcs-svn: Unclutter handle_node by introducing have_props var vcs-svn: Eliminate node_ctx.mark global vcs-svn: Eliminate node_ctx.srcRev global vcs-svn: Check for errors from open() vcs-svn: Allow simple v3 dumps (no deltas yet) Conflicts: t/t9010-svn-fe.sh vcs-svn/svndump.c
2011-02-26vcs-svn: teach line_buffer to handle multiple input filesJonathan Nieder1-13/+16
Collect the line_buffer state in a newly public line_buffer struct. Callers can use multiple line_buffers to manage input from multiple files at a time. svn-fe's delta applier will use this to stream a delta from svnrdump and the preimage it applies to from fast-import at the same time. The tests don't take advantage of the new features, but I think that's okay. It is easier to find lingering examples of nonreentrant code by searching for "static" in line_buffer.c. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-01-18svndump.c: Fix a printf format compiler warningRamsay Jones1-1/+1
In particular, on systems that define uint32_t as an unsigned long, gcc complains as follows: CC vcs-svn/svndump.o vcs-svn/svndump.c: In function `svndump_read': vcs-svn/svndump.c:215: warning: int format, uint32_t arg (arg 2) In order to suppress the warning we use the C99 format specifier macro PRIu32 from <inttypes.h>. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-07vcs-svn: Allow change nodes for root of tree (/)Jonathan Nieder1-1/+4
It is not uncommon for a svn repository to include change records for properties at the top level of the tracked tree: Node-path: Node-kind: dir Node-action: change Prop-delta: true Prop-content-length: 43 Content-length: 43 K 10 svn:ignore V 11 build-area PROPS-END Unfortunately a recent svn-fe change (vcs-svn: More dump format sanity checks, 2010-11-19) causes such nodes to be rejected with the error message fatal: invalid dump: path to be modified is missing The repo_tree module does not keep a dirent for the root of the tree. Add a block to the dump parser to take care of this case. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Implement Prop-delta handlingDavid Barr1-10/+44
The rules for what file is used as delta source for each file are not documented in dump-load-format.txt. Luckily, the Apache Software Foundation repository has rich enough examples to figure out most of the rules: Node-action: replace implies the empty property set and empty text as preimage for deltas. Otherwise, if a copyfrom source is given, that node is the preimage for deltas. Lastly, if none of the above applies and the node path exists in the current revision, then that version forms the basis. [jn: refactored, with tests] Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Sharpen parsing of property linesJonathan Nieder1-11/+19
Prepare to add a new type of property line (the 'D' line) to handle property deltas. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Split off function for handling of individual propertiesJonathan Nieder1-14/+19
The handle_property function is the part of read_props that would be interesting for most people: semantics of properties rather than the algorithm for parsing them. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Make source easier to read on small screensJonathan Nieder1-8/+0
Remove some newlines from handle_node() that are not needed for clarity. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: More dump format sanity checksJonathan Nieder1-4/+14
Node-action: change is not appropriate when switching between file and directory or adding a new file. Current svn-fe silently accepts such nodes and the resulting tree has missing files in the "changed when meant to add" case. Node-action: add requires some content (text or directory); there is no such thing as an "intent to add" node in svn dumps. Current svn-fe accepts such contentless adds but produces an invalid fast-import stream that refers to nonexistent mark :0 in response. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Reject path nodes without Node-actionJonathan Nieder1-2/+5
It would be better to flag such errors and let the import proceed anyway, but for now it is simpler not to worry about recovery from such weird cases. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Delay read of per-path propertiesJonathan Nieder1-22/+18
The mode for each file in an svn-format dump is kept in the properties section. The properties section is read as soon as possible to allow the correct mode to be filled in when registering the file with the repo_tree lib. To support nodes with a missing properties section, svn-fe determines the mode in three stages: - The kind (directory or file) of the node is read from the dump and used to make an initial estimate (040000 or 100644). - Properties are read in and allowed to override this for symlinks and executables. - If there is no properties section, the mode from the previous content of the path is left alone, overriding the above considerations. This is a bit of a mess, and worse, it would get even more complicated once we start to support property deltas. If we could only register the file with a provisional value for mode and then change it later when properties say so, the procedure would be much simpler. ... oh, right, we can. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Combine repo_replace and repo_modify functionsJonathan Nieder1-4/+4
There are two functions to change the staged content for a path in the svn importer's active commit: repo_replace, which changes the text and returns the mode, and repo_modify, which changes the text and mode and returns nothing. Worse, there are more subtle differences: - A mark of 0 passed to repo_modify means "use the existing content". repo_replace uses it as mark :0 and produces a corrupt stream. - When passed a path that is not part of the active commit, repo_replace returns without doing anything. repo_modify transparently adds a new directory entry. Get rid of both and introduce a new function with the best features of both: repo_modify_path modifies the mode, content, or both for a path, depending on which arguments are zero. If no such dirent already exists, it does nothing and reports the error by returning 0. Otherwise, the return value is the resulting mode. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Replace = Delete + AddJonathan Nieder1-6/+7
Simplify by reducing the "Node-action: replace" case to "Node-action: add". This way, the main part of handle_node() only has to deal with "add" and "change" nodes. Functional change: replacing a symlink or executable without setting properties will reset the mode. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: handle_node: Handle deletion case earlyJonathan Nieder1-3/+8
Take care of "Node-action: delete" as soon as possible, so we can stop worrying about that case in the rest of the function. Functional change: catch deletion nodes with features that would not apply to them (text, properties, or origin data) and error out for those cases. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Use mark to indicate nodes with included textJonathan Nieder1-8/+8
Allocate a mark if needed as soon as possible so later code can use "if (mark)" to check if this node has text attached rather than explicitly checking for Text-content-length. While at it, reject directory nodes with text attached; the presence of such a node would indicate a bug in the dump generator or svn-fe's understanding. In the long term, it would be nice to be able to continue parsing and save the error for later, but for now it is simpler to error out right away. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Unclutter handle_node by introducing have_props varJonathan Nieder1-4/+5
It is possible for a path node in an SVN-format dump file to leave out the properties section. svn-fe handles this by carrying over the properties (in particular, file type) from the old version of that node. To support this, handle_node tests several times whether a Prop-content-length field is present. Ancient Subversion actually leaves out the Prop-content-length field even for nodes with properties, so that's not quite the right check. Besides, this detail of mechanism is distracting when the question at hand is instead what content the new node should have. So introduce a local have_props variable. The semantics are the same as before; the adaptations to support ancient streams that leave out the prop-content-length can wait until someone needs them. Signed-off-by: Jonathan Nieder <jrnieer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Eliminate node_ctx.mark globalJonathan Nieder1-12/+11
The mark variable is only used in handle_node(). Its life is very short and simple: first, a new mark number is allocated if this node has text attached, then that mark is recorded in the in-core tree being built up, and lastly the mark is communicated to fast-import in the stream along with the associated text. A new reader may worry about interaction with other code, especially since mark is not initialized to zero in handle_node() itself. Disperse such worries by making it local. No functional change intended. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Eliminate node_ctx.srcRev globalJonathan Nieder1-7/+8
The srcRev variable is only used in handle_node(); its purpose is to hold the old mode for a path, to only be used if properties are not being changed. Narrow its scope to make its meaningful lifetime more obvious. No functional change intended. Add some tests as a sanity-check for the simplest case (no renames). Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Check for errors from open()Jonathan Nieder1-2/+4
test-svn-fe segfaults when passed a bogus path. Simplify debugging by exiting with a meaningful error message instead. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Allow simple v3 dumps (no deltas yet)David Barr1-3/+18
Since the dumpfile version 1 days, the Subversion dump format gained some new fields: - a unique identifier for the repository (version 2 format) - whether the text and properties for a node should be interpreted as deltas - checksums for a delta's preimage - SHA-1 sums as alternatives to the existing MD5 checksums for copy source and the payload (delta). For now what is relevant to us is the Text-delta and Prop-delta fields, since not noticing these causes a dump file to be misinterpreted (see the previous commit). [jn: with tests] Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24vcs-svn: Error out for v3 dumpsJonathan Nieder1-3/+10
By ignoring the Text-Delta and Prop-Delta node fields, current svn-fe happily mistakes deltas for full text and instead of cleanly erroring out, it produces a valid but semantically bogus fast-import stream when fed a dump file in the modern "svnadmin dump --deltas" format. Dump file parsers are supposed to ignore header fields they don't understand (to allow for backward-compatible extensions), but they are also supposed to check the SVN-fs-dump-format-version header to prevent misinterpretation of non backward-compatible extensions. Do so. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-12vcs-svn: Fix some printf format compiler warningsRamsay Jones1-1/+1
In particular, on systems that define uint32_t as an unsigned long, gcc complains as follows: CC vcs-svn/fast_export.o vcs-svn/fast_export.c: In function `fast_export_modify': vcs-svn/fast_export.c:28: warning: unsigned int format, uint32_t arg (arg 2) vcs-svn/fast_export.c:28: warning: int format, uint32_t arg (arg 3) vcs-svn/fast_export.c: In function `fast_export_commit': vcs-svn/fast_export.c:42: warning: int format, uint32_t arg (arg 5) vcs-svn/fast_export.c:62: warning: int format, uint32_t arg (arg 2) vcs-svn/fast_export.c: In function `fast_export_blob': vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 2) vcs-svn/fast_export.c:72: warning: int format, uint32_t arg (arg 3) CC vcs-svn/svndump.o vcs-svn/svndump.c: In function `svndump_read': vcs-svn/svndump.c:260: warning: int format, uint32_t arg (arg 3) In order to suppress the warnings we use the C99 format specifier macros PRIo32 and PRIu32 from <inttypes.h>. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14SVN dump parserDavid Barr1-0/+302
svndump parses data that is in SVN dumpfile format produced by `svnadmin dump` with the help of line_buffer and uses repo_tree and fast_export to emit a git fast-import stream. Based roughly on com.hydrografix.svndump 0.92 from the SvnToCCase project at <http://svn2cc.sarovar.org/>, by Stefan Hegny and others. [rr: allow input from files other than stdin] [jn: with test, more error reporting] Signed-off-by: David Barr <david.barr@cordelta.com> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>