8

first post so be kind please, I have searched a lot around but most things I found are relevant to Python 2.

I have a Python3 script that builds a zip file from a file list; it fails with UnicodeEncodeError only when the script is run from crontab, but it works flawlessly when run from interactive console. I guess there must be something in the environment but I just can't seem to figure out what.

This is the code excerpt:

def zipFileList(self, rootfolder, filelist, zip_file, logger):
    count = 0

    logger.info("Generazione file zip {0}: da {1} files".format(zip_file, len(filelist)))
    zip = zipfile.ZipFile(zip_file, "w", compression=zipfile.ZIP_DEFLATED)

    for curfile in filelist:
        zip.write(os.path.join(rootfolder, curfile), curfile, zipfile.ZIP_DEFLATED)
        count = count + 1

    zip.close()
    logger.info("Scrittura terminata: {0} files".format(count))

And this is the log output for this code fragment:

2012-07-31 09:10:03,033: root - ERROR - Traceback (most recent call last):
  File "/usr/local/lib/python3.2/zipfile.py", line 365, in _encodeFilenameFlags
  return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode characters in position 56-57: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "XBE.py", line 45, in main
    pam.executeList(logger)
  File "/home/vte/vtebackup/vte41/scripts/ptActivityManager.py", line 62, in executeList
    self.executeActivity(act, logger)
  File "/home/vte/vtebackup/vte41/scripts/ptActivityManager.py", line 71, in executeActivity
    self.exAct_FileBackup(act, logger)
  File "/home/vte/vtebackup/vte41/scripts/ptActivityManager.py", line 112, in exAct_FileBackup
    ptfs.zipFileList(srcfolder, filelist, arcfilename, logger)
  File "/home/vte/vtebackup/vte41/scripts/ptFileManager.py", line 143, in zipFileList
    zip.write(os.path.join(rootfolder, curfile), curfile, zipfile.ZIP_DEFLATED)
  File "/usr/local/lib/python3.2/zipfile.py", line 1115, in write
    self.fp.write(zinfo.FileHeader())
  File "/usr/local/lib/python3.2/zipfile.py", line 355, in FileHeader
    filename, flag_bits = self._encodeFilenameFlags()
  File "/usr/local/lib/python3.2/zipfile.py", line 367, in _encodeFilenameFlags
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 56: surrogates not allowed

This is the crontab line:

10 9 * * * /home/vte/vtebackup/vte41/scripts/runbackup.sh >/dev/null 2>&1

And this is the content of runbackup.sh:

#! /bin/bash -l

cd /home/vte/vtebackup/vte41/scripts

/usr/local/bin/python3.2 XBE.py

The file on which the exception happens is always the same, but it doesn't seem to include any non ascii chars:

/var/vhosts/vte41/http_docs/vtecrm41/storage/2012/July/week4/169933_Puccini_Gabriele.tif

OS is Ubuntu Linux LTS 10.04, Python version 3.2 (installed side by side as altinstall with other Python versions). All Python source files have this shebang

#!/usr/bin/env python3.2

as very first line

Can you help me finding what's wrong and how to fix this problem?

1
  • For an unknown reason when zipfile tries to encode the filename to embed info in it, filename has a unicode surrogate. Maybe an OS problem? Can you log curfile in your script? Commented Jul 31, 2012 at 8:03

3 Answers 3

21

A team member found the resolution in a Python bug thread.

The issue was fixed by prepending a LANG directive to the script command:

* * * * * LANG=it_IT.UTF-8 /home/vte/vtebackup/vte41/scripts/runbackup.sh >/dev/null 2>&1

I hope this is useful for others because I got myself scratching my head for a while on this :)

Sign up to request clarification or add additional context in comments.

1 Comment

On MacOS X import sys print(sys.getdefaultencoding()) cron job shows utf-8 but still I got the error TS mentioned and this solution solved my problem.
8

Check your locale. On the interactive console, run the command locale. Here is what I get:

LANG=
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

Python determines how to interpret filenames based on either the LC_CTYPE or LANG environment variable, and I strongly suspect that one of these is set to a different encoding in your cron environment.

If that's the case, your filenames will have been decoded to unicode using a different encoding, one that then results in filenames that cannot be encoded to UTF-8 or ASCII.

Simply set the LC_CTYPE variable in your cron definition, either on a line on it's own preceding the time entry, or as part of the command to execute:

LC_CTYPE="en_US.UTF-8"
* * * * * yourscriptcommand.py

As always with python Unicode issues, the answer lies in the Unicode HOWTO, section on filenames.

Comments

2

for chinese

export LANG="zh_CN.utf-8"                                                                            
export LC_CTYPE="zh_CN.utf-8"                                                                        
export PYTHONIOENCODING="utf-8"                                                                      

/export/zhangys/python3.5.2/bin/python3 diff_reporter.py > /home/admin/diff_script/cron_job.log 2>&1 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.