Modify output from Python Pandas describe

Question

Is there a way to omit some of the output from the pandas describe? This command gives me exactly what I want with a table output (count and mean of executeTime's by a simpleDate)

df.groupby('simpleDate').executeTime.describe().unstack(1)

However that's all I want, count and mean. I want to drop std, min, max, etc... So far I've only read how to modify column size.

I'm guessing the answer is going to be to re-write the line, not using describe, but I haven't had any luck grouping by simpleDate and getting the count with a mean on executeTime.

I can do count by date:

df.groupby(['simpleDate']).size()

or executeTime by date:

df.groupby(['simpleDate']).mean()['executeTime'].reset_index()

But can't figure out the syntax to combine them.

My desired output:

            count  mean  
09-10-2013      8  20.523   
09-11-2013      4  21.112  
09-12-2013      3  18.531
...            ..  ...

Rafa · Accepted Answer · 2022-07-17 09:52:40Z

48

.describe() attribute generates a Dataframe where count, std, max ... are values of the index, so according to the documentation you should use .loc to retrieve just the index values desired:

df.describe().loc[['count','max']]

edited Jul 17, 2022 at 9:52

answered Sep 11, 2015 at 7:26

Rafa

3,0432 gold badges24 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jeff · Accepted Answer · 2013-10-01 19:31:15Z

33

Describe returns a series, so you can just select out what you want

In [6]: s = Series(np.random.rand(10))

In [7]: s
Out[7]: 
0    0.302041
1    0.353838
2    0.421416
3    0.174497
4    0.600932
5    0.871461
6    0.116874
7    0.233738
8    0.859147
9    0.145515
dtype: float64

In [8]: s.describe()
Out[8]: 
count    10.000000
mean      0.407946
std       0.280562
min       0.116874
25%       0.189307
50%       0.327940
75%       0.556053
max       0.871461
dtype: float64

In [9]: s.describe()[['count','mean']]
Out[9]: 
count    10.000000
mean      0.407946
dtype: float64

answered Oct 1, 2013 at 19:31

Jeff

130k21 gold badges223 silver badges189 bronze badges

2 Comments

KHibma Over a year ago

thanks so much, I tried something like that but had the syntax off. works great

MANU Over a year ago

Describe returns Series or Dataframe depending upon what you apply it at... This method just works in case of Series

Josh Ziegler · Accepted Answer · 2020-12-16 20:09:38Z

21

Looking at the answers, I don't see one that actually works on a DataFrame returned from describe() after using groupby().

The documentation on MultiIndex selection gives a hint at the answer. The .xs() function works for one but not multiple selections, but .loc works.

df.groupby(['simpleDate']).describe().loc[:,(slice(None),['count','max'])]

This keeps the nice MultiIndex returned by .describe() but with only the columns selected.

edited Dec 16, 2020 at 20:09

answered Nov 18, 2020 at 21:26

Josh Ziegler

4664 silver badges9 bronze badges

2 Comments

Seth Over a year ago

This is great but the loc syntax is wrong. It should be loc[…] (ie with square brackets).

dopexxx Over a year ago

This should be the accepted answer since it's the only one that works for DFs (with multiple columns that are "described"). The accepted answer only works for series

st19297 · Accepted Answer · 2016-11-22 23:45:22Z

6

The solution @Jeff provided just works for series.

@Rafa is on the point: df.describe().info() reveals that the resulting dataframe has Index: 8 entries, count to max

df.describe().loc[['count','max']] does work, but df.groupby('simpleDate').describe().loc[['count','max']], which is what the OP asked, does not work.

I think a solution may be this:

df = pd.DataFrame({'Y': ['A', 'B', 'B', 'A', 'B'],
                    'Z': [10, 5, 6, 11, 12],
                                        })

grouping the df by Y:

df_grouped=df.groupby(by='Y')     


In [207]df_grouped.agg([np.mean, len])

Out[207]: 
        Z    
     mean len
Y            
A  10.500   2
B   7.667   3

edited Nov 22, 2016 at 23:45

answered Nov 22, 2016 at 23:13

st19297

6391 gold badge7 silver badges18 bronze badges

Comments

Geoff Counihan · Accepted Answer · 2017-10-12 03:49:14Z

1

Sticking with describe, you can unstack the indexes and then slice normally too

df.describe().unstack()[['count','max']]

answered Oct 12, 2017 at 3:49

Geoff Counihan

111 bronze badge

Comments

Steffen · Accepted Answer · 2024-07-03 06:13:09Z

1

Why do you want to use describe in first hand and generating more than you need to just discard it? Just generate agg instead and get directly what you want:

df.groupby('simpleDate').executeTime.agg(['count','max'])

answered Jul 3, 2024 at 6:13

Steffen

395 bronze badges

Collectives™ on Stack Overflow

Modify output from Python Pandas describe

6 Answers 6

Comments

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related