2

I have a dataframe in following form:


+---------+---------+-------+-------+-----------------+
| country | payment | type  |  err  |      email      |
+---------+---------+-------+-------+-----------------+
| AU      | visa    | type1 | OK    | [email protected] |
| DE      | paypal  | type1 | OK    | [email protected] |
| AU      | visa    | type2 | ERROR | [email protected] |
| US      | visa    | type2 | OK    | [email protected] |
| FR      | visa    | type1 | OK    | [email protected] |
| FR      | visa    | type1 | ERROR | [email protected] |
+---------+---------+-------+-------+-----------------+

df = pd.DataFrame({'country':['AU','DE','AU','US','FR','FR'],
                   'payment':['visa','paypal','visa','visa','visa','visa'], 
             'type':['type1','type1','type2','type2','type1','type1'],
             'err':['OK','OK','ERROR','OK','OK','ERROR'],
                   'email': ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]'] })

My goal is to transform it so that I have group by payment and country, but create new columns:
number_payments - just count for groupby,
num_errors - number of ERROR values for group,
num_type1.. num_type3 - number of corresponding values in column type (only 3 possible values).
num_errors_per_unique_email - Average number of errors per unique email for this group,
num_type1_per_unique_email .. num_type3_per_unique_email - Average number of type per unique email for this group.

Like this:


+---------+---------+-----------------+------------+-----------+-----------+-----------------------------+----------------------------+----------------------------+----------------------------+
| payment | country | number_payments | num_errors | num_type1 | num_type2 | num_errors_per_unique_email | num_type1_per_unique_email | num_type2_per_unique_email | num_type3_per_unique_email |
+---------+---------+-----------------+------------+-----------+-----------+-----------------------------+----------------------------+----------------------------+----------------------------+
| paypal  | DE      |               1 |          0 |         1 |         0 |                           0 |                          1 |                          0 |                          0 |
| visa    | AU      |               2 |          1 |         1 |         1 |                           1 |                          0 |                          1 |                          0 |
| visa    | FR      |               2 |          0 |         1 |         1 |                           1 |                          2 |                          0 |                          0 |
| visa    | US      |               1 |          0 |         0 |         1 |                           0 |                          0 |                          1 |                          0 |
+---------+---------+-----------------+------------+-----------+-----------+-----------------------------+----------------------------+----------------------------+----------------------------+

Thanks to @anky's solution (get dummies, create the group, join the size with sum) I'm able to get first part of task.And receive this:

c = df['err'].eq("ERROR")
g = (df[['payment','country']].assign(num_errors=c,
                   **pd.get_dummies(df[['type']],prefix=['num'])).groupby(['payment','country']))
out = g.size().to_frame("number_payments").join(g.sum()).reset_index()
+---------+---------+-----------------+------------+-----------+-----------+
| payment | country | number_payments | num_errors | num_type1 | num_type2 |
+---------+---------+-----------------+------------+-----------+-----------+
| paypal  | DE      |               1 |          0 |         1 |         0 |
| visa    | AU      |               2 |          1 |         1 |         1 |
| visa    | FR      |               2 |          1 |         2 |         0 |
| visa    | US      |               1 |          0 |         0 |         1 |
+---------+---------+-----------------+------------+-----------+-----------+

But I stuck how to properly add columns like 'num_errors_per_unique_email' and 'num_type_per_unique_email'..

Appreciate any help.

2 Answers 2

1

Like this?

dfemail = df.groupby('email')[['err', 'type']]. count()
dfemail

                    err type
email       
[email protected]     2   2
[email protected]     3   3
[email protected]     1   1
Sign up to request clarification or add additional context in comments.

1 Comment

Yes, but I need statistics on this after group by country and payment.
0

I've managed to do this but not very efficient proper way, so correct answers appreciated.

c = df['err'].eq("ERROR")
g = (df[['payment','country','email']].assign(num_errors=c,
                   **pd.get_dummies(df[['type']],prefix=['num'])).groupby(['payment','country']))

out = g.size().to_frame("number_payments").join([g.sum(), g['email'].nunique().to_frame("unique_emails")]).reset_index()
out['num_errors_per_unique_email'] = out['num_errors'] / out['unique_emails']
out['num_type1_per_unique_email'] = out['num_type1'] / out['unique_emails']
out['num_type2_per_unique_email'] = out['num_type2'] / out['unique_emails']
out

+---------+---------+-----------------+------------+-----------+-----------+---------------+-----------------------------+----------------------------+----------------------------+
| payment | country | number_payments | num_errors | num_type1 | num_type2 | unique_emails | num_errors_per_unique_email | num_type1_per_unique_email | num_type2_per_unique_email |
+---------+---------+-----------------+------------+-----------+-----------+---------------+-----------------------------+----------------------------+----------------------------+
| paypal  | DE      |               1 |          0 |         1 |         0 |             1 |                         0.0 |                        1.0 |                        0.0 |
| visa    | AU      |               2 |          1 |         1 |         1 |             1 |                         1.0 |                        1.0 |                        1.0 |
| visa    | FR      |               2 |          1 |         2 |         0 |             1 |                         1.0 |                        2.0 |                        0.0 |
| visa    | US      |               1 |          0 |         0 |         1 |             1 |                         0.0 |                        0.0 |                        1.0 |
+---------+---------+-----------------+------------+-----------+-----------+---------------+-----------------------------+----------------------------+----------------------------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.