2

Here is my Python question:

I am asked to generate an output table which contains the number of Nan in each variables (there are more than 10 variables in the data), min, max, mean, std, 25%, 50%,and 70%. I used the describe function in panda to created the describe table which gave me everything i want but the number of Nan in each variables. I am thinking about adding the number of Nan as a new row into the output generated from the describe output.

Anyone can help with this?

output = input_data.describe(include=[np.number]) # this gives the table output

count_nan = input_data.isnull().sum(axis=0) # this counts the number of Nan of each variable

How can I add the second as a row into the first table?

1 Answer 1

2

You could use .append to append a new row to a DataFrame:

In [21]: output.append(pd.Series(count_nan, name='nans'))
Out[21]: 
              0         1         2         3         4
count  4.000000  4.000000  4.000000  4.000000  4.000000
mean   0.583707  0.578610  0.566523  0.480307  0.540259
std    0.142930  0.358793  0.309701  0.097326  0.277490
min    0.450488  0.123328  0.151346  0.381263  0.226411
25%    0.519591  0.406628  0.478343  0.406436  0.429003
50%    0.549012  0.610845  0.607350  0.478787  0.516508
75%    0.613127  0.782827  0.695530  0.552658  0.627764
max    0.786316  0.969421  0.900046  0.582391  0.901610
nans   0.000000  0.000000  0.000000  0.000000  0.000000
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.