1

I am trying to limit the output returned by the describe output to a subset of only those records with a count great than or equal to any given number.

My dataframe is a subset of a larger one, and is defined as:

 df = evaluations[['score','garden_id']]

When I run describe on this,

df.groupby('garden_id').describe()

I would like to filter the returned data to those records where count > a specified number.

My output looks like:

Out[39]: 
                 score
garden_id             
37254     count      6
          mean      20
          std        0
          min       20
          25%       20
          50%       20
          75%       20
          max       20
37273     count      1
          mean      17
          std      NaN
          min       17
          25%       17
          50%       17
          75%       17
          max       17
37284     count      1
          mean      19
          std      NaN
          min       19
          25%       19
          50%       19
          75%       19
          max       19
37288     count      1
          mean      11
          std      NaN
          min       11
          25%       11
          50%       11
               ...

I was going to try modifying something like: modify-output-from-python-pandas-describe, but I only get this:

Out[40]: 
Empty DataFrame
Columns: [score]
Index: []

when I run df.groupby('garden_id').describe().loc[['count']]

I did try newframe = df.groupby('garden_id').describe().count() > 3, but I get a mask showing which records are true and which false, so not really sure how to use this, either.

Is there a way I can filter out desired records directly using the describe method without having to deal with data masking, etc.?

5
  • .@horcle What is the output you are after? Commented Jun 13, 2016 at 4:03
  • @jezrael apparently understood what I wanted: I want to set a threshold for the counts in the describe output, and then find out which garden_ids match this criterion. Commented Jun 13, 2016 at 19:24
  • .@horcle. You are not limiting the output of .describe(). The output are the same. Commented Jun 13, 2016 at 19:31
  • And, I didn't want just the counts: I also wanted all the other summary statistics associated with each garden_id. Commented Jun 13, 2016 at 20:03
  • @Merlin, I edited the question to reflect what I really was looking to do. Commented Jun 15, 2016 at 0:54

2 Answers 2

3

I think you can use slicers for selecting and then boolean indexing for finding index idx1, where mask is True:

import pandas as pd

df = pd.DataFrame({'score':[1,2,3,3,1,2],
                   'garden_id':[1,1,1,1,2,2]})

print (df)
   garden_id  score
0          1      1
1          1      2
2          1      3
3          1      3
4          2      1
5          2      2

newframe = df.groupby('garden_id').describe()
print (newframe)
                    score
garden_id                
1         count  4.000000
          mean   2.250000
          std    0.957427
          min    1.000000
          25%    1.750000
          50%    2.500000
          75%    3.000000
          max    3.000000
2         count  2.000000
          mean   1.500000
          std    0.707107
          min    1.000000
          25%    1.250000
          50%    1.500000
          75%    1.750000
          max    2.000000
idx = pd.IndexSlice
mask = newframe.loc[idx[:,'count'],:] > 3
print (mask)
                 score
garden_id             
1         count   True
2         count  False

idx1 = mask[mask.values].index.get_level_values('garden_id')
print (idx1)
Int64Index([1], dtype='int64', name='garden_id')

print (newframe.loc[idx[idx1,:],:])
                    score
garden_id                
1         count  4.000000
          mean   2.250000
          std    0.957427
          min    1.000000
          25%    1.750000
          50%    2.500000
          75%    3.000000
          max    3.000000
Sign up to request clarification or add additional context in comments.

Comments

0

If you looking for counts of score greater than 3 Try this:

df2 =  df.groupby('garden_id').count()
df2[df2['score']> 3]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.