1

If I had the following df:

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        3.0    b      y       h
3        4.0    b      y       j
4        5.0    c      x       k
5        6.0    c      x       l
6        6.0    c      y       p

I want to group by the name and role columns, add up the amount, and also do a concatenation of the desc with a , :

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        7.0    b      y       h,j
4        11.0   c      x       k,l
6        6.0    c      y       p

What would be the correct way of approaching this?

Side question: say if the df was being read from a .csv and it had other unrelated columns, how do I do this calculation and then write to a new .csv along with the other columns (same schema as the one read)?

2 Answers 2

9

May be not exact dupe but there are a lot of questions related to groupby agg

df.groupby(['name', 'role'], as_index=False)\
.agg({'amount':'sum', 'desc':lambda x: ','.join(x)})


    name    role    amount  desc
0   a       x       1.0     f
1   a       y       2.0     g
2   b       y       7.0     h,j
3   c       x       11.0    k,l
4   c       y       6.0     p

Edit: If there are other columns in the dataframe, you can aggregate them using 'first' or 'last' or if their values are identical, include them in grouping.

Option1:

df.groupby(['name', 'role'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x), 'other1':'first', 'other2':'first'})

Option 2:

df.groupby(['name', 'role', 'other1', 'other2'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x)})
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for your solution @Vaishali. Do you know how to include the rest of the columns that are in the df ? They haven't been modified nor are they to be used in groupby, but I need them when I want to write to .csv.
When you group the data, you need to aggregate every column using some method. Eg- you have two rows corresponding to name b and role y at index 2 and 3 respectively, so after aggregation what values of those rest of the columns would you like to keep in your resulting data frame?
is it possible to just take the value from one of the rows? No sort of aggregation/sum/concatenation needed, there won't be much or any difference between the rows other than the amount and desc columns.
In that case, you can aggregate using 'first' or 'last', they return the first or the last row for each group
I had the same question as @averageUsername123 and ended up using a dict to handle the remaining columns. Code in my answer below.
1

Extending @Vaishali's answer. To handle the remaining columns without having to specify each one you could create a dictionary and have that as the argument for the agg(regate) function.

dict = {}
for col in df:
    if (col == 'column_you_wish_to_merge'):
        dict[col] = ' '.join
    else:
        dict[col] = 'first' # or any other group aggregation operation

df.groupby(['key1', 'key2'], as_index=False).agg(dict)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.