Fill nan values with random value from another DataFrame pandas

Question

I have a DataFrame with millon of rows and a lot of NaN values. Some example:

index     Company        Area
    0     Google         Technology
    1     Coca Cola      Drinks
    2     NaN            Drinks
    3     Apple          Technology
    4     NaN            Technology
    5     Gatorade       Drinks
    6     Dell           Technology
    7     Apple          Technology
    8     Coca Cola      Drinks
    9     NaN            Drinks
    10    Google         Technology

My idea is to fill Companies NaN values with one of the 2 most common values for its Area.

From example: If the most frequent Companies in Technology area are Apple and Google, I Would like to fill the "df['Area'] == 'Technology'" NaN values with one of that values (randomly)

I've already created a Group By DataFrame with the most common values, it is something like this:

Area          Company
Technology    Google
Technology    Apple
Drinks        Coca Cola
Drinks        Pepsi

The result should be something like this:

index     Company        Area
    0     Google         Technology
    1     Coca Cola      Drinks
    2     Pepsi          Drinks
    3     Apple          Technology
    4     Google         Technology
    5     Gatorade       Drinks
    6     Dell           Technology
    7     Apple          Technology
    8     Coca Cola      Drinks
    9     Pepsi          Drinks
    10    Google         Technology

I hope you can help me.

Thanks!!!

Should all NaN values for a given key be filled by the same value (chosen randomly)? Your question isn't that clear. — cs95
– cs95, Commented Jun 12, 2018 at 2:18
@coldspeed not, it should be random filled with one of the top 2 values into its Category. For example, some Technologies NaN values should be filled with "Google" and some others with "Apple". — Raul Dip
– Raul Dip, Commented Jun 12, 2018 at 2:26

BENY · Accepted Answer · 2018-06-12 02:29:08Z

0

I come up with this solution by using random.choice

import random

s=df1.groupby('Area').Company.apply(list).reindex(df.Area).apply(lambda x :random.choice(x) )
s.index=df.index

df.Company=df.Company.fillna(s)

df
Out[200]: 
    index   Company        Area
0       0    Google  Technology
1       1  CocaCola      Drinks
2       2  CocaCola      Drinks
3       3     Apple  Technology
4       4    Google  Technology
5       5  Gatorade      Drinks
6       6      Dell  Technology
7       7     Apple  Technology
8       8  CocaCola      Drinks
9       9     Pepsi      Drinks
10     10    Google  Technology

answered Jun 12, 2018 at 2:29

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Raul Dip Over a year ago

It doesn't work. I have an error: TypeError: object of type 'float' has no len()

Collectives™ on Stack Overflow

Fill nan values with random value from another DataFrame pandas

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related