5

My machine has 16G RAM and the training program uses memory up to 2.6G. But when I want to save the classifier (trained using sklearn.svm.SVC from a large dataset) as pickle file, it consumes too much memory that my machine cannot give. Eager to know any alternative approaches to save an classifier.

I've tried:

  • pickle and cPickle
  • Dump as w or wb
  • Set fast = True

Neither of them work, always raise a MemoryError. Occasionally the file was saved, but loading it causes ValueError: insecure string pickle.

Thank you in advance!

Update

Thank you all. I didn't try joblib, it works after setting protocol=2.

6
  • Thats wierd. Never happened to me even though i've written similar sizes of dump files. Its interesting that "occasionally" the file gets stored. Since you have 16G of RAM, can you try cPickle.dumps() and see what length of string are you actually getting? Also, you get the "insecure string pickle" error when you are trying to read from a file that has not been closed yet. Commented May 11, 2014 at 16:06
  • @hrs this time I successfully saved the files, it's around 2G. Don't know why the program uses more than 50% of the memory to save it. May be sometimes the memory is occupied by other programs so there isn't enough left. I'll try one more time to see what happens. And regrading the insecure string pickle error, I'm using with to open the file and actually that program is closed before reading. Commented May 11, 2014 at 19:36
  • hmm, okay. cPickle.dump copies the object into a string and then writes the file. That might be the reason that it takes too much memory. Commented May 12, 2014 at 6:24
  • Try pickling with joblib.dump, it should be smarter about large NumPy arrays than standard pickle. Commented May 12, 2014 at 17:24
  • @larsmans I had a nasty MemoryError while using joblib.dump(model,file,compress=3) The scikit-learn model is a random forest with ~thousands of trees. I will try with compress=2 Commented Nov 24, 2015 at 11:20

1 Answer 1

2

I would suggest to use out-of-core classifiers from sci-kit learn. These are batch learning algorithms, stores the model output as compressed sparse matrix and are very time efficient.

To start with, the following link really helped me.

http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.