0

I have a csv that I loaded into a Pandas Dataframe.

I then select only the rows with duplicate dates in the DF:

df_dups = df[df.duplicated(['Date'])].copy()

I'm trying to get the sum of all the rows with the exact same date for 4 columns (all float values), like this:

df_sum = df_dups.groupby('Date')["Received Quantity","Sent Quantity","Fee Amount","Market Value"].sum()

However, this does not give the desired result. When I examine df_sum.groups, I've noticed that it did not include the first date in the indices. So for two items with the same date, there would only be one index in the groups object.

pprint(df_dups.groupby('Date')["Received Quantity","Sent Quantity","Fee Amount","Market Value"].groups)

I have no idea how to get the sum of all duplicates.

I've also tried:

df_sum = df_dups.groupby('Date')["Received Quantity","Sent Quantity","Fee Amount","Market Value"].apply(lambda x : x.sum())

This gives the same result, which makes sense I guess, as the indices in the groupby object are not complete. What am I missing here?

3
  • Please update your post to include a sample of your DataFrame using df.head(10).to_dict() and your expected output. Commented Dec 13, 2021 at 17:01
  • the group by sum looks correct. why do you think it is producing wrong results? have you forward filled nans Commented Dec 13, 2021 at 17:03
  • 1
    df[df.duplicated(['Date'])] by default excludes the first rows for each date. You want df[df.duplicated('Date', keep=False) Commented Dec 13, 2021 at 17:03

1 Answer 1

1

Check the documentation for the method duplicated. By default duplicates are marked with True except for the first occurence, which is why the first date is not included in your sums.

You only need to pass in keep=False in duplicated for your desired behaviour.

df_dups = df[df.duplicated(['Date'], keep=False)].copy()

After that the sum can be calculated properly with the expression you wrote

df_sum = df_dups.groupby('Date')["Received Quantity","Sent Quantity","Fee Amount","Market Value"].apply(lambda x : x.sum())

Sign up to request clarification or add additional context in comments.

1 Comment

This was indeed the problem. Thx!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.