1

I am a beginner in Pandas so please bear with me. I know this is a very basic question/

I am working with pandas on the following dataframe :

x      y             w  

1      2             5                 
1      2             7         
3      4             3        
5      4             8    
3      4             5    
5      9             9   

And I want the following output :

x   y   w   

1   2   5,7    
3   4   2,5    
5   4   8    
5   9   9

Can Anyone tell me how to do it using pandas groupby.

1
  • Is the w column of type string or is that an array in it? Commented May 5, 2016 at 11:06

2 Answers 2

1

You can use groupby with apply join:

#if type of column w is not string, convert it
print type(df.at[0,'w'])
<type 'numpy.int64'>

df['w'] = df['w'].astype(str)

print df.groupby(['x','y'])['w'].apply(','.join).reset_index()
   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

If you have duplicates, use drop_duplicates:

print df
   x  y  w
0  1  2  5
1  1  2  5
2  1  2  5
3  1  2  7
4  3  4  3
5  5  4  8
6  3  4  5
7  5  9  9

df['w'] = df['w'].astype(str)
print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.drop_duplicates()))
        .reset_index()

   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

Or modified EdChum solution:

print df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str).drop_duplicates()))
        .reset_index()

   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! This works. But this repeats values. Like if I have the same values for w for the same (x,y), it repeats the value in the output. How can I handle that?
Glad can help you. Nice day.
1

You can groupby on columns 'x' and 'y' and apply a lambda on the 'w' column, if required you need to cast the dtype using astype:

In [220]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str)))

Out[220]:
x  y
1  2    5,7
3  4    3,5
5  4      8
   9      9
Name: w, dtype: object

In [221]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.astype(str))).reset_index()

Out[221]:
   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

EDIT

on your modified sample:

In [237]:
df.groupby(['x','y'])['w'].apply(lambda x: ','.join(x.unique().astype(str))).reset_index()

Out[237]:
   x  y    w
0  1  2  5,7
1  3  4  3,5
2  5  4    8
3  5  9    9

2 Comments

Thanks! This works. But this repeats values. Like if I have the same values for w for the same (x,y), it repeats the value in the output. How can I handle that?
The normal etiquette here is to post sample data that is representative of your problem, iteratively incrementing or adding new information after people have answered is counter-productive and annoying

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.