Scikit-Learn GridSearchCV: Avoid function to copy data for each process in parallel

Question

I use sklearn.grid_search.GridSearchCV in parallel with several cpus/cores. Calling the fit method creates several copies (one for each process) of my data. That causes my processes to crash due to memory limitations.

Is there a way to prevent the function from copying the data for each process? Can I use shared memory for all cores?

maybe this answer stackoverflow.com/a/24411581/288875 gives you some hints — Andre Holzner
– Andre Holzner, Commented Oct 2, 2014 at 17:00

Alvaro Ulloa · Accepted Answer · 2017-07-11 18:32:49Z

1

python by default creates a new process for each parallel task. This new process copies the data. I would recommend using the multiprocess shared environment to avoid this. You can see an example in https://github.com/alvarouc/polyssifier/blob/master/polyssifier/polyssifier.py#L87

answered Jul 11, 2017 at 18:32

Alvaro Ulloa

755 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ohumeronen Over a year ago

Thank you for your answer! And thank you for sharing with the community!

Collectives™ on Stack Overflow

Scikit-Learn GridSearchCV: Avoid function to copy data for each process in parallel

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related