Pandas - Split every element of column in dataframe and add to list

Question

So I have a dataframe that has a column like the following:

Fruit
apple;banana
pear;apple;peach
blueberry;durian;apple;peach
banana;grape;orange
.

and so on. I want to create an end list where I can get the following list:

fruitList = ['apple','banana','pear','apple','peach','blueberry','durian','peach','banana','grape','orange']

How would I do this? I managed to do this for a single row like the following:

 fruitList.extend(df['Fruit'].iloc[0].split(';'))
 #fruitList = ['apple','banana']

But of course, that only works for one row... how do I generalize this? My plan is just to count the fruit and get the top 10 fruit counts. My end goal is just to keep those rows that include a top 10 fruit... but to get there, how would I come up with fruitList in the first place?

iloc[0] refers to the first row. using a for loop you can generalize this. can you add more data? — seralouk
– seralouk, Commented Nov 11, 2017 at 23:25
@sera I guess I could do this with a loop over ever single dataframe row, but with a very large dataframe wouldn't this be slow? I was just wondering if there was an inbuilt way to do this in pandas if that makes sense. And yes, I can add more data examples — ocean800
– ocean800, Commented Nov 11, 2017 at 23:27
@sera In Python we avoid doing loop as much as possible. Alway search for a vectorized way of doing things. Dive into Stackoverflow looking for problems like yours or post a question about. — srodriguex
– srodriguex, Commented Nov 11, 2017 at 23:38
I see I was lazy and didn't read the entire question. Good work @sera. — srodriguex
– srodriguex, Commented Nov 11, 2017 at 23:50

srodriguex · Accepted Answer · 2017-11-11 23:29:42Z

2

df.Fruit.str.split(';').sum()

See full code in Microsft Azure Notebook.

answered Nov 11, 2017 at 23:29

srodriguex

3,0973 gold badges21 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ocean800 Over a year ago

Didn't realize I could use sum() on lists like this thanks :)

seralouk · Accepted Answer · 2017-11-11 23:47:48Z

1

In addition to srodriguex answer:

from collections import Counter

all = df.Fruit.str.split(';').sum()
c = Counter(all)
c.most_common(3)

Now if you want to get the rows:

df[df['Fruit'].str.contains("peach")]

and to get the indices:

list(df[df['Fruit'].str.contains("apple")].index)

Results

[('apple', 3), ('peach', 2), ('pear', 1)]


                         Fruit
1              pear;apple;peach
2  blueberry;durian;apple;peach


[1, 2]

edited Nov 11, 2017 at 23:47

answered Nov 11, 2017 at 23:39

seralouk

33.6k10 gold badges127 silver badges141 bronze badges

2 Comments

seralouk Over a year ago

@ocean800 I just modified my answer. see how you can get the rows

seralouk Over a year ago

@ocean800 glad that i helped. see also my last modification. you can get the indices of the rows

Collectives™ on Stack Overflow

Pandas - Split every element of column in dataframe and add to list

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related