0

Have run smack into a problem with subprocess.open() when running a batch file with unicode characters in the path name. This barfs in 2.6 and 2.7 but works perfectly in 3.2. Was it really just a bug that lasted all the way until py3k??

# -*- coding: utf-8 -*-

o = u"C:\\temp\\test.bat"        #"control" case
q = u"C:\\temp\\こんにちは.bat"

ho = open(o, 'r')
hq = open(q, 'r')               #so we can open q

ho.close()
hq.close()

import subprocess
subprocess.call(o)              #batch runs
subprocess.call(q)              #nothing from here on down runs
subprocess.call(q, shell=True)
subprocess.call(q.encode('utf8'), shell=True)   
subprocess.call(q.encode('mbcs'), shell=True)  #this was suggested elsewhere for older windows
4
  • BTW there are a number of near-duplicates, but I believe this is slightly different from all of the ones I've looked at. Commented Mar 8, 2012 at 12:11
  • 2
    possible duplicate of Unicode filename to python subprocess.call() Commented Mar 8, 2012 at 12:21
  • 1
    How is this question any different? The subprocess module has troubles with unicode strings in version 2.x. Since 3.0, all strings are unicode and the problem went away. Commented Mar 8, 2012 at 12:23
  • OK your'e right, it seems like quite a famous bug. Maybe I just couldn't bring myself to believe it! Commented Mar 8, 2012 at 12:34

1 Answer 1

2

Filenames are passed to and returned from APIs as (Unicode) strings. This can present platform-specific problems because on some platforms filenames are arbitrary byte strings. (On the other hand, on Windows filenames are natively stored as Unicode.) As a work-around, most APIs (e.g. open() and many functions in the os module) that take filenames accept bytes objects as well as strings, and a few APIs have a way to ask for a bytes return value. Thus, os.listdir() returns a list of bytes instances if the argument is a bytes instance, and os.getcwdb() returns the current working directory as a bytes instance. Note that when os.listdir() returns a list of strings, filenames that cannot be decoded properly are omitted rather than raising UnicodeError.

From the whats new in 3.0 page.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. I should have looked in the 3.0 release notes, but I couldn't find any reference to this in the 2.7 docs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.