Random number from column

Question

The goal is to fill the nan values in a column with a random number chosen from that same column.

I can do this one column as a time but when iterating through all the columns in the data frame I get a variety of errors. When I use "random.choice" I get letters rather than column values.

 df1 = df_na
 df2 = df_nan.dropna()

 for i in range(5):
    for j in range(len(df1)):
        if np.isnan(df1.iloc[j,i]):
           df1.iloc[j,i] = np.random.choice(df2.columns[i])

 df1

Any suggestions on how to move forward?

Please add a small sample input and the corresponding expected output — Dani Mesejo
– Dani Mesejo, Commented Jan 23, 2019 at 21:45

YOLO · Accepted Answer · 2019-01-23 22:33:56Z

1

You can do:

# sample data
df =pd.DataFrame({'a':[1,2,None,18,20,None],
                  'b': [22,33,44,None,100,32]})

# fill missing with a random value from that column
for col in df.columns:
    df[col].fillna(df[col].dropna().sample().values[0], inplace=True)

      a      b
0   1.0     22.0
1   2.0     33.0
2   20.0    44.0
3   18.0    100.0
4   20.0    100.0
5   20.0    32.0

answered Jan 23, 2019 at 22:33

YOLO

22k5 gold badges25 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JustHereChillin Over a year ago

Thanks that worked perfectly! This approach is different than others I have seen so it was helpful and informative.

JustHereChillin Over a year ago

Follow up question. This method made is so that all nan values in a given column were replaced by the same values. Is there a method so that each each row of a column is treated independently and a new random sample is taken to fill each individual nan value?

jpp · Accepted Answer · 2019-01-24 01:05:17Z

1

You can use pd.DataFrame.apply with np.random.choice:

df = df.apply(lambda s: s.fillna(np.random.choice(s.dropna())))

answered Jan 24, 2019 at 1:05

jpp

166k37 gold badges301 silver badges363 bronze badges

5 Comments

JustHereChillin Over a year ago

This worked and it using the same .apply function I was trying to use originally. I was getting errors when trying to iterate through columns using the for loop. Thank you for the insight!

JustHereChillin Over a year ago

One more question, is "s" referencing data frame df? Will the variable also reference the data frame? For example in: speeds_df.apply(lambda sp: sp.fillna(0)) Will sp reference data frame speeds_df?

jpp Over a year ago

I've used s to stand for "series", it represents each column, you can choose any letter you like though.

JustHereChillin Over a year ago

Follow up question. This method made is so that all nan values in a given column were replaced by the same values. Is there a method so that each each row of a column is treated independently and a new random sample is taken to fill each individual nan value?

jpp Over a year ago

@Dee, Probably, but that's a new question which you should ask separately. If an answer here solves your original problem, do accept it (tick on left) so other users know.

Collectives™ on Stack Overflow

Random number from column

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related