Here is my code:
import pandas as pd
from nltk.corpus import wordnet
df = pd.DataFrame({'col_1': ['desk', 'apple', 'run']})
df['synset'] = df.col_1.apply(lambda x: wordnet.synsets(x))
The above code runs fairly slow on 4 core pc with 16 GB ram. I was hoping to speed up and run it on Google Cloud instance with 24 cores and 120 GB ram. And still was running slow (maybe twice as fast as before). And Google Console was showing that only 4.1 cores are utilized.
So I am curios: does Pandas runs computations for each row in parallel? If it does, then I am guessing nltk is a bottleneck here. Can anybody confirm or correct my guesses?
P.S. The above code is just a sample, real dataframe has 100k rows.