29

I'm trying to integrate a project Project A built by a colleague into another python project. Now this colleague has not used relative imports in his code but instead done

from packageA.moduleA import ClassA
from packageA.moduleA import ClassB

and consequently pickled the classes with cPickle. For neatness I'd like to hide the package that his (Project A) built inside my project. This however changes the path of the classes defined in packageA. No problem, I'll just redefine the import using

from ..packageA.moduleA import ClassA
from ..packageA.moduleA import ClassB

but now the un pickling the classes fails with the following message

    with open(fname) as infile: self.clzA = cPickle.load(infile)
ImportError: No module named packageA.moduleA

So why doesn't cPickle apparently see the module defs. Do I need to add the root of packageA to system path? Is this the correct way to solve the problem?

The cPickled file looks something like

ccopy_reg
_reconstructor
p1
(cpackageA.moduleA
ClassA
p2
c__builtin__
object
p3
NtRp4

The old project hierarchy is of the sort

packageA/
    __init__.py
    moduleA.py
    moduleB.py
packageB/
    __init__.py
    moduleC.py
    moduleD.py

I'd like to put all of that into a WrapperPackage

MyPackage/
.. __init__.py
.. myModuleX.py
.. myModuleY.py
WrapperPackage/
.. __init__.py
.. packageA/
   .. __init__.py
   .. moduleA.py
   .. moduleB.py
.. packageB/
   .. __init__.py
   .. moduleC.py
   .. moduleD.py
1
  • I came across this problem writing a plug-in for KRunner. The script engine used by Plasma used a path hook to create a fake package where my code was. Unfortunately I couldn't find any way of solving this. The only thing I could do was to manually remove their path hook, clear the sys caches and reimport everything. But if you have some pickled data then you must unpickle it with the same class name(which means you must keep the from packageA.moduleA import ClassA). Note that once unpickled you can re-pickle them using the correct name. Commented Nov 15, 2012 at 13:37

4 Answers 4

33

You'll need to create an alias for the pickle import to work; the following to the __init__.py file of the WrapperPackage package:

from .packageA import * # Ensures that all the modules have been loaded in their new locations *first*.
from . import packageA  # imports WrapperPackage/packageA
import sys
sys.modules['packageA'] = packageA  # creates a packageA entry in sys.modules

It may be that you'll need to create additional entries though:

sys.modules['packageA.moduleA'] = moduleA
# etc.

Now cPickle will find packageA.moduleA and packageA.moduleB again at their old locations.

You may want to re-write the pickle file afterwards, the new module location will be used at that time. The additional aliases created above should ensure that the modules in question have the new location name for cPickle to pick up when writing the classes again.

Sign up to request clarification or add additional context in comments.

8 Comments

Do I need to do this in WrapperPackege.__init__.py?
@MattiLyra: You can do this anywhere, but the WrapperPackage/__init__.py file is probably the best place.
@MartinPieters According to PEP328 import .something is invalid, it has to be from .something import module? import .something throws a SyntaxError? python.org/dev/peps/pep-0328/#guido-s-decision
@MattiLyra: You are completely right. I avoid relative imports altogether, so I fumbled the syntax. Corrected.
@MartinPieters awesome that seems to work, although I still can't get the newly pickled file to show the changed path, it is still packageA.moduleA but I can load it without any problems.
|
6

In addition to @MartinPieters answer the other way of doing this is to define the find_global method of the cPickle.Unpickler class, or extend the pickle.Unpickler class.

def map_path(mod_name, kls_name):
    if mod_name.startswith('packageA'): # catch all old module names
        mod = __import__('WrapperPackage.%s'%mod_name, fromlist=[mod_name])
        return getattr(mod, kls_name)
    else:
        mod = __import__(mod_name)
        return getattr(mod, kls_name)

import cPickle as pickle
with open('dump.pickle','r') as fh:
    unpickler = pickle.Unpickler(fh)
    unpickler.find_global = map_path
    obj = unpickler.load() # object will now contain the new class path reference

with open('dump-new.pickle','w') as fh:
    pickle.dump(obj, fh) # ClassA will now have a new path in 'dump-new'

A more detailed explanation of the process for both pickle and cPickle can be found here.

2 Comments

The website linked is offline, but archive.org has it here, and it is well worth reading.
That link is archive.org beta, non-beta here
2

One possible solution is to directly edit the pickle file (if you have access). I ran into this same problem of a changed module path, and I had saved the files as pickle.HIGHEST_PROTOCOL so it should be binary in theory, but the module path was sitting at the top of the pickle file in plain text. So I just did a find replace on all of the instances of the old module path with the new one and voila, they loaded correctly.

I'm sure this solution is not for everyone, especially if you have a very complex pickled object, but it is a quick and dirty data fix that worked for me!

Comments

0

This is my basic pattern for flexible unpickling - via an unambiguous and fast transition map - as there are usually just a few known classes besides the primitive data-types relevant for pickling. This also protects unpickling against erroneous or maliciously constructed data, which after all can execute arbitrary python code (!) upon a simple pickle.load() (with or without error-prone sys.modules fiddling).

Python 2 & 3:

from __future__ import print_function
try:    
    import cPickle as pickle, copy_reg as copyreg
except: 
    import pickle, copyreg

class OldZ:
    a = 1
class Z(object):
    a = 2
class Dangerous:
    pass   

_unpickle_map_safe = {
    # all possible and allowed (!) classes & upgrade paths    
    (__name__, 'Z')         : Z,    
    (__name__, 'OldZ')      : Z,
    ('old.package', 'OldZ') : Z,
    ('__main__', 'Z')       : Z,
    ('__main__', 'OldZ')    : Z,
    # basically required
    ('copy_reg', '_reconstructor') : copyreg._reconstructor,    
    ('__builtin__', 'object')      : copyreg._reconstructor,    
    }

def unpickle_find_class(modname, clsname):
    print("DEBUG unpickling: %(modname)s . %(clsname)s" % locals())
    try: 
        return _unpickle_map_safe[(modname, clsname)]
    except KeyError:
        raise pickle.UnpicklingError(
            "%(modname)s . %(clsname)s not allowed" % locals())
if pickle.__name__ == 'cPickle':  # PY2
    def SafeUnpickler(f):
        u = pickle.Unpickler(f)
        u.find_global = unpickle_find_class
        return u
else:  # PY3 & Python2-pickle.py
    class SafeUnpickler(pickle.Unpickler):  
        find_class = staticmethod(unpickle_find_class)

def test(fn='./z.pkl'):
    z = OldZ()
    z.b = 'teststring' + sys.version
    pickle.dump(z, open(fn, 'wb'), 2)
    pickle.dump(Dangerous(), open(fn + 'D', 'wb'), 2)
    # load again
    o = SafeUnpickler(open(fn, 'rb')).load()
    print(pickle, "loaded:", o, o.a, o.b)
    assert o.__class__ is Z
    try: 
        raise SafeUnpickler(open(fn + 'D', 'rb')).load() and AssertionError
    except pickle.UnpicklingError: 
        print('OK: Dangerous not allowed')

if __name__ == '__main__':
    test()

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.