Replace NaN with a random value every row

Question

I have a dataset with a column 'Self_Employed'. In these columns are values 'Yes', 'No' and 'NaN. I want to replace the NaN values with a value that is calculated in calc(). I've tried some methods I found on here, but I couldn't find one that was applicable to me. Here is my code, I put the things i've tried in comments.:

    # Handling missing data - Self_employed
SEyes = (df['Self_Employed']=='Yes').sum()
SEno = (df['Self_Employed']=='No').sum()

def calc():
    rand_SE = randint(0,(SEno+SEyes))
    if rand_SE > 81:
        return 'No'
    else:
        return 'Yes'


> # df['Self_Employed'] = df['Self_Employed'].fillna(randint(0,100))
> #df['Self_Employed'].isnull().apply(lambda v: calc())
> 
> 
> # df[df['Self_Employed'].isnull()] = df[df['Self_Employed'].isnull()].apply(lambda v: calc())  
> # df[df['Self_Employed']]
> 
> # df_nan['Self_Employed'] = df_nan['Self_Employed'].isnull().apply(lambda v: calc())
> # df_nan
> 
> #  for i in range(df['Self_Employed'].isnull().sum()):
> #      print(df.Self_Employed[i]


df[df['Self_Employed'].isnull()] = df[df['Self_Employed'].isnull()].apply(lambda v: calc())
df

now the line where i tried it with df_nan seems to work, but then I have a separate set with only the former missing values, but I want to fill the missing values in the whole dataset. For the last row I'm getting an error, i linked to a screenshot of it. Do you understand my problem and if so, can you help?

This is the set with only the rows where Self_Employed is NaN

This is the original dataset

This is the error

Charles R · Accepted Answer · 2018-11-08 14:17:59Z

1

Make shure that SEno+SEyes != null use the .loc method to set the value for Self_Employed when it is empty

SEyes = (df['Self_Employed']=='Yes').sum() + 1
SEno = (df['Self_Employed']=='No').sum()

def calc():
    rand_SE = np.random.randint(0,(SEno+SEyes))
    if(rand_SE >= 81):
        return 'No'
    else:
        return 'Yes'

df.loc[df['Self_Employed'].isna(), 'Self_Employed'] = df.loc[df['Self_Employed'].isna(), 'Self_Employed'].apply(lambda x: calc())

answered Nov 8, 2018 at 14:17

Charles R

1,6511 gold badge11 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Manolo Viso Romero Over a year ago

This worked! I thank you for your help. Why the +1 though?

Charles R Over a year ago

just in case SEno+SEyes == 0 because np.random.randint(0,0) doesn't work

Josh Friedlander · Accepted Answer · 2018-11-08 14:12:27Z

0

What about df['Self_Employed'] = df['Self_Employed'].fillna(calc())?

answered Nov 8, 2018 at 14:12

Josh Friedlander

11.8k7 gold badges42 silver badges89 bronze badges

1 Comment

Manolo Viso Romero Over a year ago

This just does calc() once and used that for every row, instead of doing the calculation per row. I want the NaN's to be filled with Yes's and No's semi-random.

Lukas Thaler · Accepted Answer · 2018-11-08 15:01:25Z

0

You could first identify the locations of your NaNs like

na_loc = df.index[df['Self_Employed'].isnull()]

Count the amount of NaNs in your column like

num_nas = len(na_loc)

Then generate an according amount of random numbers, readily indexed and set up

fill_values = pd.DataFrame({'Self_Employed': [random.randint(0,100) for i in range(num_nas)]}, index = na_loc)

And finally substitute those values in your dataframe

df.loc[na_loc]['Self_Employed'] = fill_values

edited Nov 8, 2018 at 15:01

answered Nov 8, 2018 at 14:23

Lukas Thaler

2,7205 gold badges19 silver badges35 bronze badges

2 Comments

Manolo Viso Romero Over a year ago

So this in fact did fill the NaN's i intended to in my df, but it did also replace all the other values in the same row as the intended NaN row to NaN. So row 11 for example now is: NaN NaN NaN NaN NaN No NaN NaN NaN NaN NaN.

Lukas Thaler Over a year ago

That is because I forgot to select the Self_Employed column in the assign statement. It is corrected now

Collectives™ on Stack Overflow

Replace NaN with a random value every row

3 Answers 3

2 Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related