0

first post here for me (I've been googling all day and couldn't find anything), be gentle please.

so I am working with a dataframe with multiple columns, some floats, some booleans.

    col_1       col_2       col_3      col_4       col_5      col_6
0   38.109375   37.515625   True       False       (64, 69)   F
1   27.265625   28.484375   True       False       (74, 79)   M
2   26.843750   27.015625   False      True        (64, 69)   F

I want to re-order/make a new df which:

  • is groupby col_6 AND col_5 (check)
  • has the mean values of col_1 and col_2 (check)
  • counts 'True' in col_3 and col_4 (doesn't work)

my approach so far:

new_df = df.groupby(['col_6', 'col_5']).agg({'col_5' : ['count'], 'col_1' : ['mean'], 'col_2' : ['mean']})

Image of table.

but I could not figure out how can I count the "trues" also related to col_5 and col_6? hope this makes sense and someone might help.

2 Answers 2

0

you can count True items with lambda in agg function

new_df = (
    df
    .assign(
         col_3 = lambda x: x['col_3'].astype(int),
         col_4 = lambda x: x['col_4'].astype(int)
    )
    .groupby(['col_6', 'col_5'])
    .agg({'col_5' : ['count'], 
          'col_1' : ['mean'], 
          'col_2' : ['mean'],
          'col_3' : lambda x: len([1 for item in x if item ==True]),
          'col_4' : lambda x: len([1 for item in x if item ==True])}
     )
)
Sign up to request clarification or add additional context in comments.

4 Comments

I tried your approach, worked very well. Thank you!
You're welcome. Please check as correct answer so that other people have access to the correct answer ;)
a problem occurred with the code: if there was none or only one True in den respective columns, it wasn't counted as a number but true or false. Any explanation for that? do I like have to change the column type or something?
Yes that was because the column type was boolean, I updated the solution to fix the problem.
0

You can sum booleans as you would ints:

[ins] In [15]: df["y"]
Out[15]: 
0     True
1     True
2    False
Name: y, dtype: bool

[ins] In [16]: df["y"].sum()
Out[16]: 2

So you could use for instance "col_3": ["sum"] in your dictionary.

1 Comment

thank you very much for your suggestion, I've already used the suggestion by MhDG7

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.