9

I am trying to get the .describe() function to output in a reformatted way. Here is the csv data (testProp.csv)

'name','prop'
A,1
A,2
B,  4
A,  3
B,  5
B,  2

when I type in the following:

from pandas import *

data = read_csv('testProp.csv')

temp = data.groupby('name')['prop'].describe()
temp.to_csv('out.csv')

the output is:

name       
A     count    3.000000
      mean     2.000000
      std      1.000000
      min      1.000000
      25%      1.500000
      50%      2.000000
      75%      2.500000
      max      3.000000
B     count    3.000000
      mean     3.666667
      std      1.527525
      min      2.000000
      25%      3.000000
      50%      4.000000
      75%      4.500000
      max      5.000000
dtype: float64

However, I want the data in the format below. I have tried transpose() and would like to maintain using the describe() and manipulate that instead of a .agg([np.mean(), np.max(), etc.... ):

    count   mean    std min 25% 50% 75% max
A   3   2   1   1   1.5 2   2.5 3
B    3  3.666666667 1.527525232 2   3   4   4.5 5

3 Answers 3

6

You can achieve that by running below code :

from pandas import *
data = read_csv('testProp.csv')
data.describe().T
Sign up to request clarification or add additional context in comments.

Comments

4

One way to do this would be to first do .reset_index() , to reset the index for your temp DataFrame, and then use DataFrame.pivot as you want . Example -

In [24]: df = pd.read_csv(io.StringIO("""name,prop
   ....: A,1
   ....: A,2
   ....: B,  4
   ....: A,  3
   ....: B,  5
   ....: B,  2"""))

In [25]: temp = df.groupby('name')['prop'].describe().reset_index()

In [26]: newdf = temp.pivot(index='name',columns='level_1',values=0)

In [27]: newdf.columns.name = ''   #This is needed so that the name of the columns is not `'level_1'` .

In [28]: newdf
Out[28]:
      25%  50%  75%  count  max      mean  min       std
name
A     1.5    2  2.5      3    3  2.000000    1  1.000000
B     3.0    4  4.5      3    5  3.666667    2  1.527525

Then you can save this newdf to csv.

Comments

1

In pandas v0.22, you can use the unstack feature. Building on from @Kumar answer above, you can use the pandas stack/unstack feature and play around with it's variation.

from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO("""name,prop
   A,1
   A,2
   B,  4
   A,  3
   B,  5
   B,  2"""))

df.shape
df
temp = df.groupby(['name'])['prop'].describe()
temp
temp.stack() #unstack(),unstack(level=-1) level can be -1, 0

Check out the documentation pandas unstack for more details

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.