Duplicate DataFrame Rows based on Column values within each cell

Question

I have a Dataframe as shown Below, I have to duplicate rows using the comma as a delimiter. It's easier to understand once you see the dataframes below!:

ID      Fruit
10000   Apple, Orange, Pear
10001   Apple, Banana

I want to Dataframe below:

ID      Fruit
10000   Apple 
10000   Orange
10000   Pear
10001   Apple 
10001   Banana

Does this answer your question? Pandas add new columns based on splitting another column — Georgina Skibinski
– Georgina Skibinski, Commented Mar 20, 2020 at 12:34
I'm not sure it's the same, I'm not splitting the column into more columns, but more rows? — user11357465
– user11357465, Commented Mar 20, 2020 at 12:40
Sorry, you're right - it's a case for explode - check my answer below. — Georgina Skibinski
– Georgina Skibinski, Commented Mar 20, 2020 at 13:02

Georgina Skibinski · Accepted Answer · 2020-03-20 13:01:57Z

1

Try:

df['Fruit']=df['Fruit'].str.split(", ")
df=df.explode('Fruit')

Outputs:

      ID   Fruit
0  10000   Apple
0  10000  Orange
0  10000    Pear
1  10001   Apple
1  10001  Banana

answered Mar 20, 2020 at 13:01

Georgina Skibinski

13.5k2 gold badges16 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jaroslav Bezděk · Accepted Answer · 2020-03-20 12:46:49Z

0

If df looks like this:

>>> df = pd.DataFrame({'ID': [10000, 10001], 'Fruit': ['Apple, Orange, Pear', 'Apple, Banana']})
>>> print(df)
      ID                Fruit
0  10000  Apple, Orange, Pear
1  10001        Apple, Banana

you can use the pandas.DataFrame.apply() method to make a new column of lists consisting of dictionaries with new rows. And after that, you can concatenate these lists in order to make a new data frame out of them. The code is following:

>>> df['new'] = df.apply(lambda row: [{'ID': row.ID, 'Fruit': item} for item in row.Fruit.split(', ')], axis=1)
>>> df_new = pd.DataFrame(df.new.sum())
>>> print(df_new)
      ID   Fruit
0  10000   Apple
1  10000  Orange
2  10000    Pear
3  10001   Apple
4  10001  Banana

edited Mar 20, 2020 at 12:46

answered Mar 20, 2020 at 12:41

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

2 Comments

user11357465 Over a year ago

Thanks, that's really useful.

Georgina Skibinski Over a year ago

That's bad approach- you should rather avoid using .apply(...), unless you really have to: stackoverflow.com/a/54432584/11610186

Collectives™ on Stack Overflow

Duplicate DataFrame Rows based on Column values within each cell

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related