Saving scikit-learn classifier causes memory error

Question

My machine has 16G RAM and the training program uses memory up to 2.6G. But when I want to save the classifier (trained using sklearn.svm.SVC from a large dataset) as pickle file, it consumes too much memory that my machine cannot give. Eager to know any alternative approaches to save an classifier.

I've tried:

pickle and cPickle
Dump as w or wb
Set fast = True

Neither of them work, always raise a MemoryError. Occasionally the file was saved, but loading it causes ValueError: insecure string pickle.

Thank you in advance!

Update

Thank you all. I didn't try joblib, it works after setting protocol=2.

Thats wierd. Never happened to me even though i've written similar sizes of dump files. Its interesting that "occasionally" the file gets stored. Since you have 16G of RAM, can you try cPickle.dumps() and see what length of string are you actually getting? Also, you get the "insecure string pickle" error when you are trying to read from a file that has not been closed yet. — hrs
– hrs, Commented May 11, 2014 at 16:06
@hrs this time I successfully saved the files, it's around 2G. Don't know why the program uses more than 50% of the memory to save it. May be sometimes the memory is occupied by other programs so there isn't enough left. I'll try one more time to see what happens. And regrading the insecure string pickle error, I'm using with to open the file and actually that program is closed before reading. — iceboal
– iceboal, Commented May 11, 2014 at 19:36
hmm, okay. cPickle.dump copies the object into a string and then writes the file. That might be the reason that it takes too much memory. — hrs
– hrs, Commented May 12, 2014 at 6:24
Try pickling with joblib.dump, it should be smarter about large NumPy arrays than standard pickle. — Fred Foo
– Fred Foo, Commented May 12, 2014 at 17:24
@larsmans I had a nasty MemoryError while using joblib.dump(model,file,compress=3) The scikit-learn model is a random forest with ~thousands of trees. I will try with compress=2 — visoft
– visoft, Commented Nov 24, 2015 at 11:20

Pulkit Jha · Accepted Answer · 2015-10-07 17:39:15Z

2

I would suggest to use out-of-core classifiers from sci-kit learn. These are batch learning algorithms, stores the model output as compressed sparse matrix and are very time efficient.

To start with, the following link really helped me.

http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html

answered Oct 7, 2015 at 17:39

Pulkit Jha

1,7493 gold badges12 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Saving scikit-learn classifier causes memory error

Update

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Update

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related