I am trying to limit the output returned by the describe output to a subset of only those records with a count great than or equal to any given number.
My dataframe is a subset of a larger one, and is defined as:
df = evaluations[['score','garden_id']]
When I run describe on this,
df.groupby('garden_id').describe()
I would like to filter the returned data to those records where count > a specified number.
My output looks like:
Out[39]:
score
garden_id
37254 count 6
mean 20
std 0
min 20
25% 20
50% 20
75% 20
max 20
37273 count 1
mean 17
std NaN
min 17
25% 17
50% 17
75% 17
max 17
37284 count 1
mean 19
std NaN
min 19
25% 19
50% 19
75% 19
max 19
37288 count 1
mean 11
std NaN
min 11
25% 11
50% 11
...
I was going to try modifying something like: modify-output-from-python-pandas-describe, but I only get this:
Out[40]:
Empty DataFrame
Columns: [score]
Index: []
when I run df.groupby('garden_id').describe().loc[['count']]
I did try newframe = df.groupby('garden_id').describe().count() > 3, but I get a mask showing which records are true and which false, so not really sure how to use this, either.
Is there a way I can filter out desired records directly using the describe method without having to deal with data masking, etc.?