Pandas: group and count columns values per another column

Question

I have a dataframe in following form:


+---------+---------+-------+-------+-----------------+
| country | payment | type  |  err  |      email      |
+---------+---------+-------+-------+-----------------+
| AU      | visa    | type1 | OK    | [email protected] |
| DE      | paypal  | type1 | OK    | [email protected] |
| AU      | visa    | type2 | ERROR | [email protected] |
| US      | visa    | type2 | OK    | [email protected] |
| FR      | visa    | type1 | OK    | [email protected] |
| FR      | visa    | type1 | ERROR | [email protected] |
+---------+---------+-------+-------+-----------------+

df = pd.DataFrame({'country':['AU','DE','AU','US','FR','FR'],
                   'payment':['visa','paypal','visa','visa','visa','visa'], 
             'type':['type1','type1','type2','type2','type1','type1'],
             'err':['OK','OK','ERROR','OK','OK','ERROR'],
                   'email': ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]'] })

My goal is to transform it so that I have group by payment and country, but create new columns:
number_payments - just count for groupby,
num_errors - number of ERROR values for group,
num_type1.. num_type3 - number of corresponding values in column type (only 3 possible values).
num_errors_per_unique_email - Average number of errors per unique email for this group,
num_type1_per_unique_email .. num_type3_per_unique_email - Average number of type per unique email for this group.

Like this:


+---------+---------+-----------------+------------+-----------+-----------+-----------------------------+----------------------------+----------------------------+----------------------------+
| payment | country | number_payments | num_errors | num_type1 | num_type2 | num_errors_per_unique_email | num_type1_per_unique_email | num_type2_per_unique_email | num_type3_per_unique_email |
+---------+---------+-----------------+------------+-----------+-----------+-----------------------------+----------------------------+----------------------------+----------------------------+
| paypal  | DE      |               1 |          0 |         1 |         0 |                           0 |                          1 |                          0 |                          0 |
| visa    | AU      |               2 |          1 |         1 |         1 |                           1 |                          0 |                          1 |                          0 |
| visa    | FR      |               2 |          0 |         1 |         1 |                           1 |                          2 |                          0 |                          0 |
| visa    | US      |               1 |          0 |         0 |         1 |                           0 |                          0 |                          1 |                          0 |
+---------+---------+-----------------+------------+-----------+-----------+-----------------------------+----------------------------+----------------------------+----------------------------+

Thanks to @anky's solution (get dummies, create the group, join the size with sum) I'm able to get first part of task.And receive this:

c = df['err'].eq("ERROR")
g = (df[['payment','country']].assign(num_errors=c,
                   **pd.get_dummies(df[['type']],prefix=['num'])).groupby(['payment','country']))
out = g.size().to_frame("number_payments").join(g.sum()).reset_index()

+---------+---------+-----------------+------------+-----------+-----------+
| payment | country | number_payments | num_errors | num_type1 | num_type2 |
+---------+---------+-----------------+------------+-----------+-----------+
| paypal  | DE      |               1 |          0 |         1 |         0 |
| visa    | AU      |               2 |          1 |         1 |         1 |
| visa    | FR      |               2 |          1 |         2 |         0 |
| visa    | US      |               1 |          0 |         0 |         1 |
+---------+---------+-----------------+------------+-----------+-----------+

But I stuck how to properly add columns like 'num_errors_per_unique_email' and 'num_type_per_unique_email'..

Appreciate any help.

Marco Antunes · Accepted Answer · 2021-09-01 13:26:50Z

1

Like this?

dfemail = df.groupby('email')[['err', 'type']]. count()
dfemail

                    err type
email       
[email protected]     2   2
[email protected]     3   3
[email protected]     1   1

answered Sep 1, 2021 at 13:26

Marco Antunes

414 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Alex_Y Over a year ago

Yes, but I need statistics on this after group by country and payment.

Alex_Y · Accepted Answer · 2021-09-01 13:29:10Z

I've managed to do this but not very efficient proper way, so correct answers appreciated.

c = df['err'].eq("ERROR")
g = (df[['payment','country','email']].assign(num_errors=c,
                   **pd.get_dummies(df[['type']],prefix=['num'])).groupby(['payment','country']))

out = g.size().to_frame("number_payments").join([g.sum(), g['email'].nunique().to_frame("unique_emails")]).reset_index()
out['num_errors_per_unique_email'] = out['num_errors'] / out['unique_emails']
out['num_type1_per_unique_email'] = out['num_type1'] / out['unique_emails']
out['num_type2_per_unique_email'] = out['num_type2'] / out['unique_emails']
out


+---------+---------+-----------------+------------+-----------+-----------+---------------+-----------------------------+----------------------------+----------------------------+
| payment | country | number_payments | num_errors | num_type1 | num_type2 | unique_emails | num_errors_per_unique_email | num_type1_per_unique_email | num_type2_per_unique_email |
+---------+---------+-----------------+------------+-----------+-----------+---------------+-----------------------------+----------------------------+----------------------------+
| paypal  | DE      |               1 |          0 |         1 |         0 |             1 |                         0.0 |                        1.0 |                        0.0 |
| visa    | AU      |               2 |          1 |         1 |         1 |             1 |                         1.0 |                        1.0 |                        1.0 |
| visa    | FR      |               2 |          1 |         2 |         0 |             1 |                         1.0 |                        2.0 |                        0.0 |
| visa    | US      |               1 |          0 |         0 |         1 |             1 |                         0.0 |                        0.0 |                        1.0 |
+---------+---------+-----------------+------------+-----------+-----------+---------------+-----------------------------+----------------------------+----------------------------+

Collectives™ on Stack Overflow

Pandas: group and count columns values per another column

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related