Multiprocessing on SKLEARN instance methods

I am used to map and starmap pool methods to distribute a FUNCTION on any kind of iterable object. Here is how I typically extract stem words from the raw content column of a pandas dataframe:

pool = mp.Pool(cpu_nb)
totalvocab_stemmed = pool.map(tokenize_and_stem, site_df["raw_content"])
pool.close()

a good article on function parallelization in python

So far so good. But is there a nice and easy way to parallelize the execution of sklearn METHODS. Here is an example of what I would like to distribute

tfidf_vectorizer = TfidfVectorizer(max_df=0.6, max_features=200000,
                             min_df=0.2, stop_words=stop_words,
                             use_idf=True, tokenizer=tokenize_and_stem, ngram_range=(1,3))

tfidf_matrix = tfidf_vectorizer.fit_transform(self.site_df["raw_content"])

tfidf_matrix is not an element by element list so splitting site_df["raw_content"] in as many elements as I have cores in my CPU to perform a GOF pool and stack everything back together later on is not an option. I saw some interesting options:

the IPython.parallel Client source
use the parallel_backend function of sklearn.externals.joblib as a context source

I might be dumb but I wasn't very successful in both attempts. How would you do this?

asked Feb 17, 2019 at 15:54

zar3bski

3,2118 gold badges33 silver badges69 bronze badges

1

See stackoverflow.com/questions/28396957/… You can just parallelize the transforming process afterwards, but the fitting process needs to be one process I think.

RichieK
– RichieK

2019-02-17 16:48:50 +00:00
Commented Feb 17, 2019 at 16:48

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Multiprocessing on SKLEARN instance methods

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked