5

I chose to use pickle (+base64+TCP sockets) to communicate data between my python3 code and legacy python2 code, but I am having trouble with datetime objects:

The PY3 object unpickles well on PY2, but the reverse raises a TypeError when calling the datetime constructor, then a UnicodeEncodeError in the load_reduce function.

A short test program & the log, including dis output of both PY2 and PY3 pickles, are available in this gist

  • I am using pickle.dumps(reply, protocol=2) in PY2
    then pickle._loads(pickled, fix_imports=True, encoding='latin1') in PY3
    (tried None and utf-8 without success)

  • Native cPickle loads decoding fails too, I am only using pure python's _loads for debugging.

Is this a datetime bug ? Maybe datetime.__getstate__/__setstate__ implementations are not compatible ?

Any remark on the code is welcome...

Complement

PY-3.4.0 pickle:

 0: \x80 PROTO      2
 2: c    GLOBAL     'datetime datetime'
21: q    BINPUT     0
23: c    GLOBAL     '_codecs encode'
39: q    BINPUT     1
41: X    BINUNICODE u'\x07\xde\x07\x11\x0f\x06\x11\x05\n\x90'
58: q    BINPUT     2
60: X    BINUNICODE u'latin1'
71: q    BINPUT     3
73: \x86 TUPLE2
74: q    BINPUT     4
76: R    REDUCE
77: q    BINPUT     5
79: \x85 TUPLE1
80: q    BINPUT     6
82: R    REDUCE
83: q    BINPUT     7
85: .    STOP

PY-2.7.6 pickle:

 0: \\x80 PROTO      2
 2: c    GLOBAL     'datetime datetime'
21: q    BINPUT     0
23: U    SHORT_BINSTRING '\\x07\xc3\x9e\\x07\\x11\\x0f\\x06\\x11\\x05\\n\\x90'
35: q    BINPUT     1
37: \\x85 TUPLE1
38: q    BINPUT     2
40: R    REDUCE
41: q    BINPUT     3
43: ]    EMPTY_LIST
44: q    BINPUT     4
46: N    NONE
47: \\x87 TUPLE3
48: q    BINPUT     5
50: .    STOP

PY-3.4.0 pickle.load_reduce:

def load_reduce(self):
    stack = self.stack
    args = stack.pop()
    func = stack[-1]
    try:
        value = func(*args)
    except:
        print(sys.exc_info())
        print(func, args)
        raise
    stack[-1] = value
dispatch[REDUCE[0]] = load_reduce

PY-3.4.0 datetime pickle support:

# Pickle support.

def _getstate(self):
    yhi, ylo = divmod(self._year, 256)
    us2, us3 = divmod(self._microsecond, 256)
    us1, us2 = divmod(us2, 256)
    basestate = bytes([yhi, ylo, self._month, self._day,
                       self._hour, self._minute, self._second,
                       us1, us2, us3])
    if self._tzinfo is None:
        return (basestate,)
    else:
        return (basestate, self._tzinfo)

def __setstate(self, string, tzinfo):
    (yhi, ylo, self._month, self._day, self._hour,
     self._minute, self._second, us1, us2, us3) = string
    self._year = yhi * 256 + ylo
    self._microsecond = (((us1 << 8) | us2) << 8) | us3
    if tzinfo is None or isinstance(tzinfo, _tzinfo_class):
        self._tzinfo = tzinfo
    else:
        raise TypeError("bad tzinfo state arg %r" % tzinfo)

def __reduce__(self):
    return (self.__class__, self._getstate())
6
  • I don't know how to solve your problem, but pickle is not meant, in general, for data exchange between different version or long-term storage. Is there an unambiguous format (text?) you could use? Commented Jul 17, 2014 at 14:44
  • If you simply want to keep the information of datetime, I suggest you pickle the formatted text of the datetime, unpickle it somewhere else and parse the string into datetime again. Commented Jul 17, 2014 at 15:17
  • @JimmyK The initial goal was having a flexible solution (i.e. not having to hard-code conversion each time the object changes) without resorting to eternal packages like pyro... Commented Jul 17, 2014 at 15:24
  • @mdurant Given the fix_imports option of Unpickler (and the backwards-compatible protocols), you'd think they planned for a cross-version solution ;-) Commented Jul 17, 2014 at 15:25
  • @eddygeek - how did you produce "PY-3.4.0 pickle" dis? Commented Oct 12, 2015 at 19:23

1 Answer 1

3

The workaround is to use the encoding="bytes" like this:

pickled_bytes = bytes(pickled_str, encoding='latin1')  # If your input is a string(not my case)
data = pickle.loads(pickled_bytes, encoding='bytes')

(Thanks to Tim Peters for the suggestion)

Issue still opened at http://bugs.python.org/issue22005 as to why this is required.

Sign up to request clarification or add additional context in comments.

2 Comments

Nice... as long as you're not stuck on Cygwin's Python 3.2, which doesn't seem to know about encoding='bytes'. Argh.
It should be noted that this issue is "fixed" in the most recent versions of Python 3. However, you must still remember to use encoding='latin-1' when unpickling on Python 3. I find this unintuitive and not all that much better than encoding='bytes', because to me, it is not obvious that datetime would involve strings at all. And if your Python 2 pickle contains both datetimes as well as encoded strings in encodings other than Latin-1, then you're back to having to use bytes anyway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.