How to replace every NaN in a column with different random values using pandas?

Question

I have been playing with pandas lately and I now I tried to replace NaN value inside a dataframe with different random value of normal distribution.

Assuming I have this CSV file without header

My expected result should be something like this

       0
0     343
1     483
2     101
3     randomnumber1
4     randomnumber2
5     randomnumber3

But instead I got the following :

       0
0     343
1     483
2     101
3     randomnumber1
4     randomnumber1
5     randomnumber1    # all NaN filled with same number

My code so far

import numpy as np
import pandas as pd

df = pd.read_csv("testfile.csv", header=None)
mu, sigma = df.mean(), df.std()
norm_dist = np.random.normal(mu, sigma, 1)
for i in norm_dist:
    print df.fillna(i)

I am thinking to get the number of NaN row from the dataframe, and replace the number 1 in np.random.normal(mu, sigma, 1) with the total of NaN row so each NaN might have different value.

But I want to ask if there is other simple method to do this?

Thank you for your help and suggestion.

Did either of the posted solutions work for you?

Divakar
– Divakar

2017-10-07 05:54:40 +00:00
Commented Oct 7, 2017 at 5:54 — Divakar
– Divakar, Commented Oct 7, 2017 at 5:54
both solutions are working just fine.

Fang
– Fang

2017-10-07 13:22:32 +00:00
Commented Oct 7, 2017 at 13:22 — Fang
– Fang, Commented Oct 7, 2017 at 13:22

Divakar · Accepted Answer · 2017-10-03 11:20:24Z

9

Here's one way working with underlying array data -

def fillNaN_with_unifrand(df):
    a = df.values
    m = np.isnan(a) # mask of NaNs
    mu, sigma = df.mean(), df.std()
    a[m] = np.random.normal(mu, sigma, size=m.sum())
    return df

In essence, we are generating all random numbers in one go with the count of NaNs using the size param with np.random.normal and assigning them in one go with the mask of the NaNs again.

Sample run -

In [435]: df
Out[435]: 
       0
0  343.0
1  483.0
2  101.0
3    NaN
4    NaN
5    NaN

In [436]: fillNaN_with_unifrand(df)
Out[436]: 
            0
0  343.000000
1  483.000000
2  101.000000
3  138.586483
4  223.454469
5  204.464514

edited Oct 3, 2017 at 11:20

answered Oct 3, 2017 at 11:11

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Fang Over a year ago

I take that you are showing me what I should done if I want to use my method of counting the NaN row right? I did not think of this way at first. Thank you for showing it

Divakar Over a year ago

@Fang Yes that m.sum() basically gets you the count of NaNs that could be fed to np.random.normal() as the size param, thus giving us exactly the number of rand numbers needed in one go and thus achieve a vectorized solution.

Mir_Murtaza · Accepted Answer · 2018-03-05 17:50:20Z

4

It is simple to impute random values in place of missing values in a pandas DataFrame column.

mean = df['column'].mean()
std = df['column'].std()

def fill_missing_from_Gaussian(column_val):
    if np.isnan(column_val) == True: 
        column_val = np.random.normal(mean, std, 1)
    else:
         column_val = column_val
return column_val

Now just apply the above method to a column with missing values.

df['column'] = df['column'].apply(fill_missing_from_Gaussian)

answered Mar 5, 2018 at 17:50

Mir_Murtaza

3212 silver badges4 bronze badges

Comments

jezrael · Accepted Answer · 2017-10-03 11:18:06Z

1

I think you need:

mu, sigma = df.mean(), df.std()
#get mask of NaNs
a = df[0].isnull()
#get random values by sum ot Trues, processes like 1
norm_dist = np.random.normal(mu, sigma, a.sum())
print (norm_dist)
[ 184.90581318  364.89367364  181.46335348]
#assign values by mask
df.loc[a, 0] = norm_dist
print (df)

            0
0  343.000000
1  483.000000
2  101.000000
3  184.905813
4  364.893674
5  181.463353

edited Oct 3, 2017 at 11:18

answered Oct 3, 2017 at 11:09

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

How to replace every NaN in a column with different random values using pandas?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related