4

Say I use Windows 7 with code page 950 (Big5, Traditional Chinese), I want to manipulate some files mixed with unicode name such as 简体中文文件.txt (GB2312, Simplified Chinese) with svn.

If I use chcp 950, when I run:

svn add .\简体中文文件.txt

I get an error:

svn: warning: W155010: 'D:\path\to\work-dir\?体中文文件.txt'
not found
svn: E200009: Could not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

If I use chcp 65001 (UTF-8), I get an even worse error:

svn: warning: W155010: 'D:\path\to\work-dir\?体svn: E200009: C
ould not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

I'd like to try chcp 1200 (UCS-LE) but it says:

Invalid code page

It seems that TortoiseSVN can manipulate those files correctly. However I need to write scripts calling svn to run several automated jobs. Is there any solution available?

5
  • Perhaps subversion's --encoding option will be helpful? Commented Oct 7, 2014 at 3:47
  • Is there a detailed demo or documentation about this? I always get a Subcommand 'add' doesn't accept option '--encoding ARG' when I attempt to call svn add --encoding utf8 .\简体中文文件.txt or svn --encoding utf8 add .\简体中文文件.txt... Commented Oct 7, 2014 at 4:01
  • OK, so I guess that option isn't relevant. There's a chance that the file names are in effect being interpreted as UTF-8 anyway; are you sure you are passing the command-line arguments as UTF-8 strings? I don't think you can do that from the console directly, you'll need to use a batch file. The bug tracker says that Unicode filenames should work. Commented Oct 7, 2014 at 4:19
  • Actually I think I see why that wouldn't work; either the batch processor or CreateProcessA would treat the UTF-8 string as being in the current code page and convert it to UTF-16, then the C runtime would convert it to ANSI, and the UTF-8 won't survive that. There's an outside chance it would work if you widen UTF-8 to 16 bits without converting it and call CreateProcessW - but since it turns out that the fix for the file access hasn't actually made it to the release version yet, that won't help you right now. Commented Oct 7, 2014 at 19:24
  • Are you using the TortoiseSVN command-line interface or a different distribution? Commented Oct 7, 2014 at 19:37

2 Answers 2

2

Programs like svn that use the MS implementation of the C standard library's file IO functions cannot read command input or file names containing characters outside the current code page. You would have to chcp to a suitable code page for each file separately (eg 936 for Chinese).

In theory code page 65001 could cover every character, but unfortunately the MS C runtime has serious bugs that usually break applications when this code page is in use. Microsoft's ongoing failure to fix this long-standing problem leaves UTF-8 a second-class citizen under Windows.

In the future it looks like https://issues.apache.org/jira/browse/SVN-1537?issueNumber=1537 should fix the problem by using direct Win32 APIs instead of C stdlib to do console writes, though I can't see where the related code change is to confirm whether console input and file access are similarly addressed.

Sign up to request clarification or add additional context in comments.

Comments

0

First solution: have a look at switching Windows to UTF-8: What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do? It made svn diff provide a correct output on my machine (chcp 65001 was apparently not enough).

Second solution: use svn within WSL.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.