1

I have a Python 3.6 script that trains an SKLearn model and then saves the model using the following code:

with open('filepath', 'wb') as f:
    pickle.dump(trained_model, f, protocol=2)

When I try to load the pickle in python 3.6, things work out just fine:

>>with open('filepath', 'rb') as f:
>>    model = pickle.load(f)
>>
>>model

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
        max_depth=None, max_features='auto', max_leaf_nodes=None,
        min_impurity_decrease=0.0, min_impurity_split=None,
        min_samples_leaf=1, min_samples_split=2,
        min_weight_fraction_leaf=0.0, n_estimators=80, n_jobs=1,
        oob_score=False, random_state=None, verbose=0,
        warm_start=False)

when I run this same pickle.load command in Python 2.7, I get the following error:

>>with open('filepath', 'rb') as f:
>>    model = pickle.load(f)

ValueError: non-string names in Numpy dtype unpickling

Looking at documentation and similar cases, setting protocol to 2 should make the pickle file compatible. What is causing this issue and how can I work around it?

4
  • Is that the full traceback? Commented Oct 30, 2017 at 19:35
  • unfortunately, it is. Commented Oct 30, 2017 at 19:48
  • I cannot say anything else at this point 'cuz you didn't provide any minimal reproducible example for me to diagnose. Commented Oct 31, 2017 at 2:28
  • see update, this might help you a lot. Commented Nov 3, 2017 at 7:23

1 Answer 1

1

You can use pickle._load() instead of .load() to force using a pure-Python implementation and get a more useful traceback.

If the faulty part is in numpy's code though, you're still left to using a C debugger or tracing the source code by hand...
...Or using numpy pickle format spec on the part that is fed to numpy's unpickling routine and try to guess what is wrong with it!

  • pickletools.dis() does this for you! It prints a disassembly of pickle data, complete with offsets. Though you might still need the spec to find out the nature of the violation.

That said, 3.4. Model persistence — scikit-learn 0.19.1 documentation does warn that loading model data in another version and/or architecture is not supported and suggests saving source material instead.

Sign up to request clarification or add additional context in comments.

1 Comment

the links to _load() and 'force using a pure python..' seem to be pointing to the wrong lines. Permalinks should be used instead

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.