pandas: groupby multiple columns, concatenating one column while adding another

Question

If I had the following df:

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        3.0    b      y       h
3        4.0    b      y       j
4        5.0    c      x       k
5        6.0    c      x       l
6        6.0    c      y       p

I want to group by the name and role columns, add up the amount, and also do a concatenation of the desc with a , :

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        7.0    b      y       h,j
4        11.0   c      x       k,l
6        6.0    c      y       p

What would be the correct way of approaching this?

Side question: say if the df was being read from a .csv and it had other unrelated columns, how do I do this calculation and then write to a new .csv along with the other columns (same schema as the one read)?

Vaishali · Accepted Answer · 2018-09-28 17:06:02Z

9

May be not exact dupe but there are a lot of questions related to groupby agg

df.groupby(['name', 'role'], as_index=False)\
.agg({'amount':'sum', 'desc':lambda x: ','.join(x)})


    name    role    amount  desc
0   a       x       1.0     f
1   a       y       2.0     g
2   b       y       7.0     h,j
3   c       x       11.0    k,l
4   c       y       6.0     p

Edit: If there are other columns in the dataframe, you can aggregate them using 'first' or 'last' or if their values are identical, include them in grouping.

Option1:

df.groupby(['name', 'role'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x), 'other1':'first', 'other2':'first'})

Option 2:

df.groupby(['name', 'role', 'other1', 'other2'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x)})

edited Sep 28, 2018 at 17:06

answered Sep 27, 2018 at 23:51

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

averageUsername123 Over a year ago

Thank you for your solution @Vaishali. Do you know how to include the rest of the columns that are in the df ? They haven't been modified nor are they to be used in groupby, but I need them when I want to write to .csv.

Vaishali Over a year ago

When you group the data, you need to aggregate every column using some method. Eg- you have two rows corresponding to name b and role y at index 2 and 3 respectively, so after aggregation what values of those rest of the columns would you like to keep in your resulting data frame?

averageUsername123 Over a year ago

is it possible to just take the value from one of the rows? No sort of aggregation/sum/concatenation needed, there won't be much or any difference between the rows other than the amount and desc columns.

Vaishali Over a year ago

In that case, you can aggregate using 'first' or 'last', they return the first or the last row for each group

perNalin Over a year ago

I had the same question as @averageUsername123 and ended up using a dict to handle the remaining columns. Code in my answer below.

perNalin · Accepted Answer · 2020-07-04 14:11:33Z

1

Extending @Vaishali's answer. To handle the remaining columns without having to specify each one you could create a dictionary and have that as the argument for the agg(regate) function.

dict = {}
for col in df:
    if (col == 'column_you_wish_to_merge'):
        dict[col] = ' '.join
    else:
        dict[col] = 'first' # or any other group aggregation operation

df.groupby(['key1', 'key2'], as_index=False).agg(dict)

answered Jul 4, 2020 at 14:11

perNalin

1619 bronze badges

Collectives™ on Stack Overflow

pandas: groupby multiple columns, concatenating one column while adding another

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related