6

I have a pandas dataframe that looks like this:

import pandas as pd
import numpy as np
data = {
    "Type": ["A", "A", "B", "B", "B"],
    "Project": ["X123", "X123", "X21", "L31", "L31"],
    "Number": [100, 300, 100, 200, 500],
    "Status": ['Y', 'Y', 'N', 'Y', 'N']
}
df = pd.DataFrame.from_dict(data)

I want to group by Type and get count and sum with several conditions and get results as follows:

Type  Total_Count  Total_Number  Count_Status=Y  Number_Status=Y  Count_Status=N  Number_Status=N 
 A        2          400              2               400              0               0
 B        5          800              1               200              2              600

I have tried following but not exactly what i need. Please share any ideas that you might have. Thanks!

df1 = pd.pivot_table(df, index = 'Type', values = 'Number', aggfunc = np.sum)
df2 = pd.pivot_table(df, index = 'Type', values = 'Project', aggfunc = 'count')
pd.concat([df1, df2], axis=1)

5 Answers 5

11

If you want to create a Function:

def my_agg(x):
    names = {
        'Total_Count': x['Type'].count(),
        'Total_Number': x['Number'].sum(),
        'Count_Status=Y': x[x['Status']=='Y']['Type'].count(),
        'Number_Status=Y': x[x['Status']=='Y']['Number'].sum(),
        'Count_Status=N': x[x['Status']=='N']['Type'].count(),
        'Number_Status=N': x[x['Status']=='N']['Number'].sum()}

    return pd.Series(names)

df.groupby('Type').apply(my_agg)

    Total_Count   Total_Number  Count_Status=Y  Number_Status=Y Count_Status=N  Number_Status=N
Type                        
A      2           400                2                400            0             0
B      3           800                1                200            2            600
Sign up to request clarification or add additional context in comments.

Comments

4

Start with pivot_table:

pv = (df.pivot_table(index='Type', 
                     columns='Status', 
                     values='Number', 
                     aggfunc='sum')
        .add_prefix('Number_Status='))

print(pv)
Status  Number_Status=N  Number_Status=Y
Type                                    
A                   NaN            400.0
B                 600.0            200.0

Next, groupby:

totals = df.groupby('Type').Number.agg([
    ('Total_Count', 'count'),  ('Total_Number', 'sum')])

print(totals)
      Total_Count  Total_Number
Type                           
A               2           400
B               3           800

Finally, status counts with OHEs:

cnts = (df.set_index('Type').Status
          .str.get_dummies()
          .sum(level=0)
          .add_prefix('Count_Status='))

      Count_Status=N  Count_Status=Y
Type                                
A                  0               2
B                  2               1

Putting it all together:

pd.concat([pv, totals, cnts], axis=1).sort_index(axis=1)

      Count_Status=N  Count_Status=Y  Number_Status=N  Number_Status=Y  \
Type                                                                                             
A                  0               2              NaN            400.0            
B                  2               1            600.0            200.0            

Total_Count  Total_Number
          2           400
          3           800

Comments

2

You can use the margins argument of pd.pivot_table. Drop the column total at the end as you only want row-wise margins.

import pandas as pd

df1 = df.pivot_table(index='Type', columns='Status', values='Number', 
                     aggfunc=['sum', 'count'], 
                     margins=True, 
                     margins_name='Total').fillna(0).drop('Total')
#          sum              count           
#Status      N      Y Total     N    Y Total
#Type                                       
#A         0.0  400.0   400   0.0  2.0     2
#B       600.0  200.0   800   2.0  1.0     3

If needed, rename the columns:

d = {'Y': 'Status=Y', 'N': 'Status=N', 'Total': 'Total'}
df1.columns = [f'{x}_{d.get(y)}' for x,y in df1.columns]

Output df1:

      sum_Status=N  sum_Status=Y  sum_Total  count_Status=N  count_Status=Y  count_Total
Type                                                                                    
A              0.0         400.0        400             0.0             2.0            2
B            600.0         200.0        800             2.0             1.0            3

Comments

2

Doing with

s1 = df.groupby('Type').Number.agg(['count', 'sum'])
s2 = df.groupby(['Type', 'Status']).Number.agg(['count', 'sum']).unstack(fill_value=0).sort_index(level=1, axis=1)
s2.columns = s2.columns.map('_Status='.join)
s1 = s1.add_prefix('Total_')
s = pd.concat([s1, s2], axis=1)
s
      Total_count  Total_sum  count_Status=N  sum_Status=N  count_Status=Y  \
Type                                                                         
A               2        400               0             0               2   
B               3        800               2           600               1   
      sum_Status=Y  
Type                
A              400  
B              200  

Comments

1

You can use pandas.core.groupby.GroupBy.apply to complete this task. For example, you can write a function to process your data on each column after getting Groupby object.

def compute_metrics(x):
    result = {'Total_Number': x['Number'].sum(), 'Count_Status=Y': len(x['Status'] == "Y")}
    return pd.Series(result)

Then the df.groupby('Type').apply(compute_metrics) will return the dataframe like this:

Type  Total Number Count_Status=Y
A     400          2             
B     800          3

Hope this will be helpful.

Cheers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.