scikit Memory Error when fitting any type of model

Question

I'm trying to fit a (223129, 108) dataset with scikit's linear models (Ridge(), Lasso(), LinearRegression()) and get the following error. Not sure what to do, the data doesn't seem large enough to run out of memory (I have 16GB). Any ideas?

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-34-8ea705d45c5d> in <module>()
----> 1 cv_loop(T,yn, model=reg, per_test=0.2,cv_random=False,tresh=450)

<ipython-input-1-ea163943e461> in cv_loop(X, y, model, per_test, cv_random, tresh)
     48     preds_all=np.zeros((y_cv.shape))
     49     for i in range(y_n):
---> 50         model.fit(X_train, y_train[:,i])
     51 
     52         preds = model.predict(X_cv)

C:\Users\m&g\AppData\Local\Enthought\Canopy32\User\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\linear_model\coordinate_descent.pyc in fit(self, X, y, Xy, coef_init)
    608                           "estimator", stacklevel=2)
    609         X = atleast2d_or_csc(X, dtype=np.float64, order='F',
--> 610                              copy=self.copy_X and self.fit_intercept)
    611         # From now on X can be touched inplace
    612         y = np.asarray(y, dtype=np.float64)

C:\Users\m&g\AppData\Local\Enthought\Canopy32\User\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\utils\validation.pyc in atleast2d_or_csc(X, dtype, order, copy, force_all_finite)
    122     """
    123     return _atleast2d_or_sparse(X, dtype, order, copy, sparse.csc_matrix,
--> 124                                 "tocsc", force_all_finite)
    125 
    126 

C:\Users\m&g\AppData\Local\Enthought\Canopy32\User\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\utils\validation.pyc in _atleast2d_or_sparse(X, dtype, order, copy, sparse_class, convmethod, force_all_finite)
    109     else:
    110         X = array2d(X, dtype=dtype, order=order, copy=copy,
--> 111                     force_all_finite=force_all_finite)
    112         if force_all_finite:
    113             _assert_all_finite(X)

C:\Users\m&g\AppData\Local\Enthought\Canopy32\User\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\utils\validation.pyc in array2d(X, dtype, order, copy, force_all_finite)
     89         raise TypeError('A sparse matrix was passed, but dense data '
     90                         'is required. Use X.toarray() to convert to dense.')
---> 91     X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
     92     if force_all_finite:
     93         _assert_all_finite(X_2d)

C:\Users\m&g\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.0.3.1262.win-x86\lib\site-packages\numpy\core\numeric.pyc in asarray(a, dtype, order)
    318 
    319     """
--> 320     return array(a, dtype, copy=False, order=order)
    321 
    322 def asanyarray(a, dtype=None, order=None):

MemoryError:

Do you have anything else loaded in the Python session? What if you close Python and restart it and try with the same data? — BrenBarn
– BrenBarn, Commented Nov 17, 2013 at 21:33
That's very strange: np.ones((223129, 108)).astype(np.float64) gives me an array with about 183 megabytes. — Matt
– Matt, Commented Nov 18, 2013 at 15:13

Artem Sobolev · Accepted Answer · 2014-05-10 08:45:52Z

1

Your 16Gb RAM effectively reduces down to 4Gb due to 32bit process (because 32bit mean that you can distinguish only 2^32 memory addresses, which is 4Gb). I'd suggest you to switch to 64bit version if you want to work with large datasets.

If you can't resort to change bitness then you should be ready to be dodgy with you code. You should carefully look at your code seeking for possible memory allocations (feels like C, doesn't it?) and maybe sometimes even do some dels (in case when you don't need a variable anymore but interpreter doesn't know it).

Or, since all of your data is just a 100-dimensional vector and you have a lot of data (200K), probably, you can take only, say, 10% of it and still have it representative. But it depends on nature of your data and further research is needed.

answered May 10, 2014 at 8:45

Artem Sobolev

6,1191 gold badge25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Fred Foo · Accepted Answer · 2014-05-10 09:09:57Z

1

Try SGDRegressor instead of the estimators that you tried. It fits a linear regression model too, but is designed to work with large datasets and uses much less memory.

answered May 10, 2014 at 9:09

Fred Foo

365k80 gold badges765 silver badges852 bronze badges

Collectives™ on Stack Overflow

scikit Memory Error when fitting any type of model

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related