Merging Pandas dataFrame rows

Question

I have a Pandas DataFrame which looks like this:

Time Image_names
0    [a,b,c,d]
0    [a,c,d,e]
0    [c,d,e,f]
1    [e,f,g,h]
1    [f,g,h,i]

What I wish to obtain: All unique image names for a given Time

Time Image_names
0    [a,b,c,d,e]
1    [e,f,g,h,i]

I'm not sure if I have to use groupby or joins.

T

From @jpp, all your need is: df.groupby('Time')['Image_names'].apply(lambda x: set(chain.from_iterable(x))) — YOLO
– YOLO, Commented Feb 26, 2018 at 17:44

BENY · Accepted Answer · 2018-02-26 17:31:43Z

1

You can using set

s=df.groupby('Time',as_index=False).Image_names.sum()
s.Image_names=list(map(set,s.Image_names))
s
Out[2034]: 
   Time         Image_names
0     0  {b, c, d, a, f, e}
1     1     {g, h, f, i, e}

answered Feb 26, 2018 at 17:31

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jagannath Saragadam Over a year ago

this works great! But once I write this data to a CSV using df.to_csv("resultsDf.csv"), the Images_names appears as set([b,c,d,a,f,e])

BENY Over a year ago

@JagannathSaragadam adding s.Image_names=s.Image_names.apply(list) then to_csv :-)

jpp · Accepted Answer · 2018-02-26 18:37:20Z

One way is to use itertools.chain:

from itertools import chain
import pandas as pd


df = pd.DataFrame({'Time': [0, 0, 0, 1, 1],
                   'Image_names': [['a', 'b', 'c', 'd'],
                                   ['a', 'c', 'd', 'e'],
                                   ['c', 'd', 'e', 'f'],
                                   ['e', 'f', 'g', 'h'],
                                   ['f', 'g', 'h', 'i']]})

df = df.groupby('Time')['Image_names'].apply(chain.from_iterable).map(set).reset_index()

#    Time         Image_names
# 0     0  {c, a, f, d, e, b}
# 1     1     {g, h, f, e, i}

Explanation

Applying chain.from_iterable joins the lists from each group into one large list for each group.
Mapping set then creates a set for each group.
reset_index ensures the result is a dataframe with column headers as required.

Mojgan Mazouchi · Accepted Answer · 2018-02-26 18:26:26Z

0

You can use the following:

import pandas as pd
import numpy as np

a=pd.DataFrame([[0,['a','b','c','d']],[0,['a','c','d','e']],
                [0,['c','d','e','f']],[1,['e','f','g','h']],
                [1,['f','g','h','i']]],
                columns=['Time','Image_names'])
a.groupby('Time')['Image_names'].sum().apply(np.unique)

#Out[242]: 
#Time
#0    [a, b, c, d, e, f]
#1       [e, f, g, h, i]
#Name: Image_names, dtype: object

edited Feb 26, 2018 at 18:26

answered Feb 26, 2018 at 18:16

Mojgan Mazouchi

3651 gold badge6 silver badges15 bronze badges

Collectives™ on Stack Overflow

Merging Pandas dataFrame rows

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related