Aggregation of DataFrame in Python Pandas?

Question

I have DataFrame like below:

df = pd.DataFrame({"ID" : ["1", "1", "1", "2", "2", "2", "1"],
                   "status" : ["ac", "not", "not", "ac", np.NaN, "ac", "oth"]})

And I need to build DataFrame with columns like below:

NumberAcc - Number of ID with status = "ac"
NumberNaN - Number of ID with status = NanN (missing -> np.nan)
NumberOther - Number of ID with staatus other than "ac" or np.nan (means "not" or "oth")

Could you help me to build DF like below?

anky · Accepted Answer · 2021-01-09 06:50:03Z

6

You can use a conditional mask to replace anything which is not ac or np.nan as Other and groupby.value_counts , then unstack and format with add_prefix

u = df['status'].where(df['status'].eq("ac")|df['status'].isna(),"Other")

out = (u.groupby(df['ID']).value_counts(dropna=False).unstack(fill_value=0)
        .add_prefix("Number_").reset_index().rename_axis(None,axis=1))

Or;

a = pd.Series(np.select([df['status'].eq("ac"),df['status'].isna()],
              ['acc',np.nan],'other'))
out = (a.groupby(df['ID']).value_counts(dropna=True).unstack(fill_value=0)
        .add_prefix("Numnber_").reset_index())

print(out)

  ID  Number_nan  Number_Other  Number_ac
0  1           0             3          1
1  2           1             0          2

A similar logic but with crosstab as suggested by @Shubham:

u = df['status'].where(df['status'].eq("ac")|df['status'].isna(),"Other")
out = (pd.crosstab(df['ID'],u.fillna("NAN"),dropna=False)
   .add_prefix("Number_").rename_axis(None).reset_index())

edited Jan 9, 2021 at 6:50

answered Jan 9, 2021 at 6:21

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

dingaro Over a year ago

anky, great, but can i add a list of column in .eq() ? for instance if i would like to add there more than only "ac" for example .eq("ac" ,"bc") and so on ?

anky Over a year ago

@jack55 yes try isin insead of eq for multiple values: u = df['status'].where(df['status'].isin(["ac","bc"])|df['status'].isna(),"Other")

Shubham Sharma Over a year ago

@anky May be try crosstab like pd.crosstab(df['ID'], df['status'].fillna('NaN'))..

dingaro Over a year ago

GREAT! thank you I gave you best answer! :)

MaxYarmolinsky Over a year ago

this is incredible, how do you figure this out anky? guess i'm not so faimilar with unstack function

|

sammywemmy · Accepted Answer · 2021-01-09 07:56:10Z

2

You could create the columns via assign, before grouping on the 'ID' and summing:

     (df.assign(NumberAcc=df.status.eq("ac"),
                NumberNaN=df.status.isna(),
                NumberOther=lambda df: ~(df.NumberAcc | df.NumberNaN))
        .groupby("ID")
        .sum())

    NumberAcc   NumberNaN   NumberOther
ID          
1       1           0           3
2       2           1           0

answered Jan 9, 2021 at 7:56

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

Aggregation of DataFrame in Python Pandas?

2 Answers 2

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related