How can I "merge" rows by same value in a column in Pandas with aggregation functions?

Question

I would like to group rows in a dataframe, given one column. Then I would like to receive an edited dataframe for which I can decide which aggregation function makes sense. The default should be just the value of the first entry in the group.

(it would be nice if the solution also worked for a combination of two columns)

Example

#!/usr/bin/env python

"""Test data frame grouping."""

# 3rd party modules
import pandas as pd


df = pd.DataFrame([{'id': 1, 'price': 123, 'name': 'anna', 'amount': 1},
                   {'id': 1, 'price':   7, 'name': 'anna', 'amount': 2},
                   {'id': 2, 'price':  42, 'name': 'bob', 'amount': 30},
                   {'id': 3, 'price':   1, 'name': 'charlie', 'amount': 10},
                   {'id': 3, 'price':   2, 'name': 'david', 'amount': 100}])
print(df)

gives the dataframe:

   amount  id     name  price
0       1   1     anna    123
1       2   1     anna      7
2      30   2      bob     42
3      10   3  charlie      1
4     100   3    david      2

And I would like to get:

amount  id     name  price
     3   1     anna    130
    30   2      bob     42
   110   3  charlie      3

So:

Entries with the same value in the id column belong together. After that operation, there should still be an id column, but it should have only unique values.
All values in amount and price which have the same id get summed up
For name, just the first one (by the current order of the dataframe) is taken.

Is this possible with Pandas?

What's wrong with df_new = df.groupby(df['id']).aggregate({'price': 'sum', 'name': 'first', 'amount': 'sum'})? Does that not work for your use case? — cs95
– cs95, Commented Oct 19, 2017 at 9:31
Hahaha, ok, I didn't try it. I just thought this is how a function should look like. Nice that it accidentially actually works. I'll edit my question and make that an answer. — Martin Thoma
– Martin Thoma, Commented Oct 19, 2017 at 10:17

Martin Thoma · Accepted Answer · 2017-10-19 10:19:36Z

74

You are looking for

aggregation_functions = {'price': 'sum', 'amount': 'sum', 'name': 'first'}
df_new = df.groupby(df['id']).aggregate(aggregation_functions)

which gives

    price     name  amount
id                        
1     130     anna       3
2      42      bob      30
3       3  charlie     110

answered Oct 19, 2017 at 10:19

Martin Thoma

139k174 gold badges687 silver badges1.1k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Daniel Goldfarb Over a year ago

Is there a published list of available aggregate functions that can be applied to a column? For example, how did you know that 'first' was a valid function? I've been googling for such a list. I have found a lot of articles and tutorials that mention many of the valid functions, but no complete listing.

Martin Thoma Over a year ago

I didn't know that first was in there. I just guessed it :-) To me, pandas is super intuitive

Fanglin Over a year ago

@DanielGoldfarb check out this cmdlinetips.com/2019/10/…

Daniel Goldfarb Over a year ago

The full list of available aggregation functions is documented here: pandas.pydata.org/docs/reference/groupby.html

jezrael · Accepted Answer · 2017-10-19 10:30:38Z

25

For same columns ordering is necessary add reindex, because aggregate by dict:

d = {'price': 'sum', 'name': 'first', 'amount': 'sum'}
df_new = df.groupby('id', as_index=False).aggregate(d).reindex(columns=df.columns)
print (df_new)
   amount  id     name  price
0       3   1     anna    130
1      30   2      bob     42
2     110   3  charlie      3

answered Oct 19, 2017 at 10:30

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

2 Comments

Martin Thoma Over a year ago

I don't get what as_index=False does. Could you show me the difference? (+1 for reindex)

jezrael Over a year ago

It is for not return index from column id like in your answer.

Collectives™ on Stack Overflow

How can I "merge" rows by same value in a column in Pandas with aggregation functions?

Example

2 Answers 2

4 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Example

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related