calculate sum of rows in pandas dataframe grouped by date

Question

I have a csv that I loaded into a Pandas Dataframe.

I then select only the rows with duplicate dates in the DF:

df_dups = df[df.duplicated(['Date'])].copy()

I'm trying to get the sum of all the rows with the exact same date for 4 columns (all float values), like this:

df_sum = df_dups.groupby('Date')["Received Quantity","Sent Quantity","Fee Amount","Market Value"].sum()

However, this does not give the desired result. When I examine df_sum.groups, I've noticed that it did not include the first date in the indices. So for two items with the same date, there would only be one index in the groups object.

pprint(df_dups.groupby('Date')["Received Quantity","Sent Quantity","Fee Amount","Market Value"].groups)

I have no idea how to get the sum of all duplicates.

I've also tried:

df_sum = df_dups.groupby('Date')["Received Quantity","Sent Quantity","Fee Amount","Market Value"].apply(lambda x : x.sum())

This gives the same result, which makes sense I guess, as the indices in the groupby object are not complete. What am I missing here?

Please update your post to include a sample of your DataFrame using df.head(10).to_dict() and your expected output. — not_speshal
– not_speshal, Commented Dec 13, 2021 at 17:01
the group by sum looks correct. why do you think it is producing wrong results? have you forward filled nans — ListenSoftware Louise Ai Agent
– ListenSoftware Louise Ai Agent, Commented Dec 13, 2021 at 17:03
df[df.duplicated(['Date'])] by default excludes the first rows for each date. You want df[df.duplicated('Date', keep=False) — Quang Hoang
– Quang Hoang, Commented Dec 13, 2021 at 17:03

Grinjero · Accepted Answer · 2021-12-13 17:05:05Z

1

Check the documentation for the method duplicated. By default duplicates are marked with True except for the first occurence, which is why the first date is not included in your sums.

You only need to pass in keep=False in duplicated for your desired behaviour.

df_dups = df[df.duplicated(['Date'], keep=False)].copy()

After that the sum can be calculated properly with the expression you wrote

df_sum = df_dups.groupby('Date')["Received Quantity","Sent Quantity","Fee Amount","Market Value"].apply(lambda x : x.sum())

answered Dec 13, 2021 at 17:05

Grinjero

4513 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hdries Over a year ago

This was indeed the problem. Thx!

Collectives™ on Stack Overflow

calculate sum of rows in pandas dataframe grouped by date

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related