4

I was looking to add subtotals to a pandas dataframe - a question which I found to be asked here often. The answers making use of the deprecated pd.append aren't relevant anymore so I figured a more up-to-date version could be useful, not only for me, but for others as well.

The problem: Have a dataframe of the form (for example) -

df = pd.DataFrame(
    {"A": ["A1", "A1", "A1", "A2", "A2", "A2", "A3", "A3", "A3"],
    "B": ["B1", "B1", "B1", "B2", "B2", "B2", "B3", "B3", "B3"],
    "C": ["C1", "C1", "C2", "C2", "C2", "C3", "C3", "C3", "C3"],
    "D": [1,2,3,4,5,6,7,8,9]})

After a df.pivot_table(), add the subtotals for each hierarchical level (or just the highest level). Thanks for any tips, I didn't figure out a straightforward and general way to accomplish this.

EDIT: I should probably add that I'm interested in the case of index being a list of 2 and more variables. The case for n = 1 is simple enough using margins.

The idea, after a df.pivot_table(columns="C", index = ["A", "B"], values = "D", fill_value=0, aggfunc="sum") , is to get something like

C       C1  C2  C3 ....
A  B              
A1 B1    3   3   0
A1 B2    0   9   6
Totals   3   12  6
A2 B1    0   0  24
...

where the subtotals are over the A1 ... An levels and in the corresponding C1 ... Cn columns (for the general case), not the rows.

The sample output itself is

C      C1  C2  C3
A  B             
A1 B1   3   3   0
A2 B2   0   9   6
A3 B3   0   0  24
6
  • 2
    Can you post what result you expect for the sample input provided? Commented Sep 18 at 19:50
  • @JRiggles My bad, I meant pd.append. I don't know why I got it confused here Commented Sep 18 at 19:52
  • 1
    No worries! Just wanted to put that out there in case you felt like you couldn't use it! FWIW: I think concat is the de-facto replacement for append, so there's a chance you could rework one of those other answers to fit the need. Commented Sep 18 at 19:54
  • That would be the idea, yeah. I'm just struggling to properly place the subtotals, calculating them isn't too hard but I'd rather have it all in one table to present to others instead of two or more. Commented Sep 18 at 20:05
  • your sample output isn't reproducible from the input data. maybe replace "A2" with "A1" and "A3" with "A2" ? Commented Sep 18 at 20:11

2 Answers 2

6

Here is a short and readable solution:

subtotals = pt.groupby(level=0).sum().rename(index=lambda x: (x, 'Total'))
pt = pd.concat([pt, subtotals]).sort_index()

Explaination:

  1. Group by the "a" index and sum it.

  2. Rename the "b" index to total and store it in a helper table

  3. Concat the helper table and the pivot table

  4. Sort by the index

C         C1  C2  C3
A1 B1      3   3   0
   Total   3   3   0
A2 B2      0   9   6
   Total   0   9   6
A3 B3      0   0  24
   Total   0   0  24

Just as you wanted.

If you wanted to subtotal by 'B', then here is the modified code:

subtotals = pt.groupby(level=1).sum().rename(index=lambda x: ("Total", x))
pt = pd.concat([pt, subtotals]).sort_index(level=1)

and

C         C1  C2  C3
A1    B1   3   3   0
Total B1   3   3   0
A2    B2   0   9   6
Total B2   0   9   6
A3    B3   0   0  24
Total B3   0   0  24

MWE (Minimum working example)

import pandas as pd

df = pd.DataFrame({
    "A": ["A1","A1","A1","A2","A2","A2","A3","A3","A3"],
    "B": ["B1","B1","B1","B2","B2","B2","B3","B3","B3"],
    "C": ["C1","C1","C2","C2","C2","C3","C3","C3","C3"],
    "D": [1,2,3,4,5,6,7,8,9]
})

pt = df.pivot_table(index=["A","B"], columns="C", values="D", aggfunc="sum", fill_value=0)

subtotals = pt.groupby(level=0).sum().rename(index=lambda x: (x, "Total"))
pt = pd.concat([pt, subtotals]).sort_index()

print(pt)

And output again:

C         C1  C2  C3
A1 B1      3   3   0
   Total   3   3   0
A2 B2      0   9   6
   Total   0   9   6
A3 B3      0   0  24
   Total   0   0  24
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot, this is exactly what I was looking for.
No problem! Glad to help
2

In pandas, for multi-index pivot tables you can add subtotals like this:

import pandas as pd

df = pd.DataFrame({
    "A": ["A1","A1","A1","A2","A2","A2","A3","A3","A3"],
    "B": ["B1","B1","B1","B2","B2","B2","B3","B3","B3"],
    "C": ["C1","C1","C2","C2","C2","C3","C3","C3","C3"],
    "D": [1,2,3,4,5,6,7,8,9]
})

pt = df.pivot_table(index=["A","B"], columns="C", values="D", aggfunc="sum", fill_value=0)

# Subtotals per A
totals = pt.groupby(level=0).sum()
totals.index = pd.MultiIndex.from_tuples([('Totals','')]*len(totals), names=pt.index.names)

pt = pd.concat([pt, totals]).sort_index()
print(pt)

This correctly adds subtotal rows for the top-level index without using deprecated append.

1 Comment

This is not the sample output the OP provided, the OP wants the total after every switch of the "A" index, not at the end

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.