Pandas dataframe groupby

Question

I am a beginner in Pandas so please bear with me. I know this is a very basic question/

I am working with pandas on the following dataframe :

x      y             w  

1      2             5                 
1      2             7         
3      4             3        
5      4             8    
3      4             5    
5      9             9

And I want the following output :

x   y   w   

1   2   5,7    
3   4   2,5    
5   4   8    
5   9   9

Can Anyone tell me how to do it using pandas groupby.

Is the w column of type string or is that an array in it?

musically_ut
– musically_ut

2016-05-05 11:06:59 +00:00
Commented May 5, 2016 at 11:06 — musically_ut
– musically_ut, Commented May 5, 2016 at 11:06

Community · Accepted Answer · 2017-05-23 11:52:43Z

1

You can use groupby with apply join:

#if type of column w is not string, convert it
print type(df.at[0,'w'])
<type 'numpy.int64'>

df['w'] = df['w'].astype(str)

print df.groupby(['x','y'])['w'].apply(','.join).reset_index()
   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

If you have duplicates, use drop_duplicates:

print df
   x  y  w
0  1  2  5
1  1  2  5
2  1  2  5
3  1  2  7
4  3  4  3
5  5  4  8
6  3  4  5
7  5  9  9

df['w'] = df['w'].astype(str)
print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.drop_duplicates()))
        .reset_index()

   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

Or modified EdChum solution:

print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str).drop_duplicates()))
        .reset_index()

   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

edited May 23, 2017 at 11:52

CommunityBot

11 silver badge

answered May 5, 2016 at 11:06

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user324 Over a year ago

Thanks! This works. But this repeats values. Like if I have the same values for w for the same (x,y), it repeats the value in the output. How can I handle that?

jezrael Over a year ago

Glad can help you. Nice day.

EdChum · Accepted Answer · 2016-05-05 12:06:24Z

1

You can groupby on columns 'x' and 'y' and apply a lambda on the 'w' column, if required you need to cast the dtype using astype:

In [220]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str)))

Out[220]:
x  y
1  2    5,7
3  4    3,5
5  4      8
   9      9
Name: w, dtype: object

In [221]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str))).reset_index()

Out[221]:
   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

EDIT

on your modified sample:

In [237]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.unique().astype(str))).reset_index()

Out[237]:
   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

edited May 5, 2016 at 12:06

answered May 5, 2016 at 11:07

EdChum

397k204 gold badges836 silver badges583 bronze badges

2 Comments

user324 Over a year ago

Thanks! This works. But this repeats values. Like if I have the same values for w for the same (x,y), it repeats the value in the output. How can I handle that?

EdChum Over a year ago

The normal etiquette here is to post sample data that is representative of your problem, iteratively incrementing or adding new information after people have answered is counter-productive and annoying

Collectives™ on Stack Overflow

Pandas dataframe groupby

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related