3

I am trying to get a unicode version of calendar.month_abbr[6]. If I don't specify an encoding for the locale, I don't know how to convert the string to unicode. The example code below shows my problem:

>>> import locale
>>> import calendar
>>> locale.setlocale(locale.LC_ALL, ("ru_RU"))
'ru_RU'
>>> print repr(calendar.month_abbr[6])
'\xb8\xee\xdd'
>>> print repr(calendar.month_abbr[6].decode("utf8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 0: unexpected code byte
>>> locale.setlocale(locale.LC_ALL, ("ru_RU", "utf8"))
'ru_RU.UTF8'
>>> print repr(calendar.month_abbr[6])
'\xd0\x98\xd1\x8e\xd0\xbd'
>>> print repr(calendar.month_abbr[6].decode("utf8"))
u'\u0418\u044e\u043d'

Any ideas how to solve this? The solution doesn't have to look like this. Any solution that gives me the abbreviated month name in unicode is fine.

2 Answers 2

12

Change the last line in your code:

>>> print calendar.month_abbr[6].decode("utf8")
Июн

Improperly used repr() hides from you that you already get what you needed.

Also getlocale() can be used to get encoding for current locale:

>>> locale.setlocale(locale.LC_ALL, 'en_US')
'en_US'
>>> locale.getlocale()
('en_US', 'ISO8859-1')

Another modules that might be useful for you:

  • PyICU - a better way for internationalization. While locale produce either initial or inflected form of month name depending on locale database in your OS (so you can't rely on it for such languages like Russian!) and uses some encoding, PyICU has different format specifiers for initial and inflected form (so you can select appropriate in your case) and uses unicode.
  • pytils - a set of tools to work with Russian language, including dates. It has hard-coded month names as workaround for locale limitations.
Sign up to request clarification or add additional context in comments.

1 Comment

If the unicode conversion succeeded I should still be able to do a repr on it. So that shouldn't be the problem. Thanks for the links. I will check them out.
0

What you need is:

…
myencoding= locale.getpreferredencoding()
print repr(calendar.month_abbr[6].decode(myencoding))
…

2 Comments

On my machine locale.getpreferredencoding() returns utf8. So I still have the same problem.
It doesn't seem like locale.getpreferredencoding() returns the encoding that month_abbr names are encoded in.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.