30

Is there a way to omit some of the output from the pandas describe? This command gives me exactly what I want with a table output (count and mean of executeTime's by a simpleDate)

df.groupby('simpleDate').executeTime.describe().unstack(1)

However that's all I want, count and mean. I want to drop std, min, max, etc... So far I've only read how to modify column size.

I'm guessing the answer is going to be to re-write the line, not using describe, but I haven't had any luck grouping by simpleDate and getting the count with a mean on executeTime.

I can do count by date:

df.groupby(['simpleDate']).size()

or executeTime by date:

df.groupby(['simpleDate']).mean()['executeTime'].reset_index()

But can't figure out the syntax to combine them.

My desired output:

            count  mean  
09-10-2013      8  20.523   
09-11-2013      4  21.112  
09-12-2013      3  18.531
...            ..  ...

6 Answers 6

48

.describe() attribute generates a Dataframe where count, std, max ... are values of the index, so according to the documentation you should use .loc to retrieve just the index values desired:

df.describe().loc[['count','max']]
Sign up to request clarification or add additional context in comments.

Comments

33

Describe returns a series, so you can just select out what you want

In [6]: s = Series(np.random.rand(10))

In [7]: s
Out[7]: 
0    0.302041
1    0.353838
2    0.421416
3    0.174497
4    0.600932
5    0.871461
6    0.116874
7    0.233738
8    0.859147
9    0.145515
dtype: float64

In [8]: s.describe()
Out[8]: 
count    10.000000
mean      0.407946
std       0.280562
min       0.116874
25%       0.189307
50%       0.327940
75%       0.556053
max       0.871461
dtype: float64

In [9]: s.describe()[['count','mean']]
Out[9]: 
count    10.000000
mean      0.407946
dtype: float64

2 Comments

thanks so much, I tried something like that but had the syntax off. works great
Describe returns Series or Dataframe depending upon what you apply it at... This method just works in case of Series
21

Looking at the answers, I don't see one that actually works on a DataFrame returned from describe() after using groupby().

The documentation on MultiIndex selection gives a hint at the answer. The .xs() function works for one but not multiple selections, but .loc works.

df.groupby(['simpleDate']).describe().loc[:,(slice(None),['count','max'])]

This keeps the nice MultiIndex returned by .describe() but with only the columns selected.

2 Comments

This is great but the loc syntax is wrong. It should be loc[…] (ie with square brackets).
This should be the accepted answer since it's the only one that works for DFs (with multiple columns that are "described"). The accepted answer only works for series
6

The solution @Jeff provided just works for series.

@Rafa is on the point: df.describe().info() reveals that the resulting dataframe has Index: 8 entries, count to max

df.describe().loc[['count','max']] does work, but df.groupby('simpleDate').describe().loc[['count','max']], which is what the OP asked, does not work.

I think a solution may be this:

df = pd.DataFrame({'Y': ['A', 'B', 'B', 'A', 'B'],
                    'Z': [10, 5, 6, 11, 12],
                                        })

grouping the df by Y:

df_grouped=df.groupby(by='Y')     


In [207]df_grouped.agg([np.mean, len])

Out[207]: 
        Z    
     mean len
Y            
A  10.500   2
B   7.667   3

Comments

1

Sticking with describe, you can unstack the indexes and then slice normally too

df.describe().unstack()[['count','max']]

Comments

1

Why do you want to use describe in first hand and generating more than you need to just discard it? Just generate agg instead and get directly what you want:

df.groupby('simpleDate').executeTime.agg(['count','max'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.