counting values and strings in different columns in a python dataframe separately?

Question

first post here for me (I've been googling all day and couldn't find anything), be gentle please.

so I am working with a dataframe with multiple columns, some floats, some booleans.

    col_1       col_2       col_3      col_4       col_5      col_6
0   38.109375   37.515625   True       False       (64, 69)   F
1   27.265625   28.484375   True       False       (74, 79)   M
2   26.843750   27.015625   False      True        (64, 69)   F

I want to re-order/make a new df which:

is groupby col_6 AND col_5 (check)
has the mean values of col_1 and col_2 (check)
counts 'True' in col_3 and col_4 (doesn't work)

my approach so far:

new_df = df.groupby(['col_6', 'col_5']).agg({'col_5' : ['count'], 'col_1' : ['mean'], 'col_2' : ['mean']})

Image of table.

but I could not figure out how can I count the "trues" also related to col_5 and col_6? hope this makes sense and someone might help.

Mehdi Golzadeh · Accepted Answer · 2020-10-11 22:59:35Z

0

you can count True items with lambda in agg function

new_df = (
    df
    .assign(
         col_3 = lambda x: x['col_3'].astype(int),
         col_4 = lambda x: x['col_4'].astype(int)
    )
    .groupby(['col_6', 'col_5'])
    .agg({'col_5' : ['count'], 
          'col_1' : ['mean'], 
          'col_2' : ['mean'],
          'col_3' : lambda x: len([1 for item in x if item ==True]),
          'col_4' : lambda x: len([1 for item in x if item ==True])}
     )
)

edited Oct 11, 2020 at 22:59

answered Oct 9, 2020 at 23:26

Mehdi Golzadeh

2,5931 gold badge18 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Muesticmo Over a year ago

I tried your approach, worked very well. Thank you!

Mehdi Golzadeh Over a year ago

You're welcome. Please check as correct answer so that other people have access to the correct answer ;)

Muesticmo Over a year ago

a problem occurred with the code: if there was none or only one True in den respective columns, it wasn't counted as a number but true or false. Any explanation for that? do I like have to change the column type or something?

Mehdi Golzadeh Over a year ago

Yes that was because the column type was boolean, I updated the solution to fix the problem.

dsaxton · Accepted Answer · 2020-10-09 23:30:32Z

0

You can sum booleans as you would ints:

[ins] In [15]: df["y"]
Out[15]: 
0     True
1     True
2    False
Name: y, dtype: bool

[ins] In [16]: df["y"].sum()
Out[16]: 2

So you could use for instance "col_3": ["sum"] in your dictionary.

answered Oct 9, 2020 at 23:30

dsaxton

1,0152 gold badges12 silver badges23 bronze badges

1 Comment

Muesticmo Over a year ago

thank you very much for your suggestion, I've already used the suggestion by MhDG7

Collectives™ on Stack Overflow

counting values and strings in different columns in a python dataframe separately?

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related