So I have a dataframe that has a column like the following:
Fruit
apple;banana
pear;apple;peach
blueberry;durian;apple;peach
banana;grape;orange
.
and so on. I want to create an end list where I can get the following list:
fruitList = ['apple','banana','pear','apple','peach','blueberry','durian','peach','banana','grape','orange']
How would I do this? I managed to do this for a single row like the following:
fruitList.extend(df['Fruit'].iloc[0].split(';'))
#fruitList = ['apple','banana']
But of course, that only works for one row... how do I generalize this? My plan is just to count the fruit and get the top 10 fruit counts. My end goal is just to keep those rows that include a top 10 fruit... but to get there, how would I come up with fruitList in the first place?
pandasif that makes sense. And yes, I can add more data examples