How to manipulate unicode-named files with subversion in Windows?

Question

Say I use Windows 7 with code page 950 (Big5, Traditional Chinese), I want to manipulate some files mixed with unicode name such as 简体中文文件.txt (GB2312, Simplified Chinese) with svn.

If I use chcp 950, when I run:

svn add .\简体中文文件.txt

I get an error:

svn: warning: W155010: 'D:\path\to\work-dir\?体中文文件.txt'
not found
svn: E200009: Could not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

If I use chcp 65001 (UTF-8), I get an even worse error:

svn: warning: W155010: 'D:\path\to\work-dir\?体svn: E200009: C
ould not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

I'd like to try chcp 1200 (UCS-LE) but it says:

Invalid code page

It seems that TortoiseSVN can manipulate those files correctly. However I need to write scripts calling svn to run several automated jobs. Is there any solution available?

Is there a detailed demo or documentation about this? I always get a Subcommand 'add' doesn't accept option '--encoding ARG' when I attempt to call svn add --encoding utf8 .\简体中文文件.txt or svn --encoding utf8 add .\简体中文文件.txt... — Danny Lin
– Danny Lin, Commented Oct 7, 2014 at 4:01
OK, so I guess that option isn't relevant. There's a chance that the file names are in effect being interpreted as UTF-8 anyway; are you sure you are passing the command-line arguments as UTF-8 strings? I don't think you can do that from the console directly, you'll need to use a batch file. The bug tracker says that Unicode filenames should work. — Harry Johnston
– Harry Johnston, Commented Oct 7, 2014 at 4:19
Actually I think I see why that wouldn't work; either the batch processor or CreateProcessA would treat the UTF-8 string as being in the current code page and convert it to UTF-16, then the C runtime would convert it to ANSI, and the UTF-8 won't survive that. There's an outside chance it would work if you widen UTF-8 to 16 bits without converting it and call CreateProcessW - but since it turns out that the fix for the file access hasn't actually made it to the release version yet, that won't help you right now. — Harry Johnston
– Harry Johnston, Commented Oct 7, 2014 at 19:24
Are you using the TortoiseSVN command-line interface or a different distribution? — Harry Johnston
– Harry Johnston, Commented Oct 7, 2014 at 19:37

Gabriel Devillers · Accepted Answer · 2023-06-07 13:37:38Z

2

Programs like svn that use the MS implementation of the C standard library's file IO functions cannot read command input or file names containing characters outside the current code page. You would have to chcp to a suitable code page for each file separately (eg 936 for Chinese).

In theory code page 65001 could cover every character, but unfortunately the MS C runtime has serious bugs that usually break applications when this code page is in use. Microsoft's ongoing failure to fix this long-standing problem leaves UTF-8 a second-class citizen under Windows.

In the future it looks like https://issues.apache.org/jira/browse/SVN-1537?issueNumber=1537 should fix the problem by using direct Win32 APIs instead of C stdlib to do console writes, though I can't see where the related code change is to confirm whether console input and file access are similarly addressed.

edited Jun 7, 2023 at 13:37

Gabriel Devillers

4,3023 gold badges37 silver badges63 bronze badges

answered Oct 7, 2014 at 10:33

bobince

538k111 gold badges675 silver badges846 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Gabriel Devillers · Accepted Answer · 2023-06-07 14:53:30Z

0

First solution: have a look at switching Windows to UTF-8: What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do? It made svn diff provide a correct output on my machine (chcp 65001 was apparently not enough).

Second solution: use svn within WSL.

answered Jun 7, 2023 at 14:53

Gabriel Devillers

4,3023 gold badges37 silver badges63 bronze badges

Collectives™ on Stack Overflow

How to manipulate unicode-named files with subversion in Windows?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related