filling out missing values in pandas dataframe

Question

new to python and can't seem to find the exact answer I am looking for though I believe there is an easier way to fill this info

I have df1 and df2

df1: FirstName  LastName  PhNo  uniqueid

df2: uniqueid PhNo

I want to fill values missing in df1['PhNo'], with matching values in df2 based on matching uniqueid == uniqueid

Codes I used are as follows

dff = pd.merge(df1,df2,on = 'uniqueid', how = 'Left')
dff['PhNo'] = 0
dff['PhNo'][df1['PhNo_x'] >= 1] = df1['PhNo_x']
df1['PhNo'][df2['PhNo_y'] >= 1] = df1['PhNo_y']

this seems to do the work but does not seem like an efficient way of doing this. I am looking for a less number of lines and better technique than merge

df1

FirstName  LastName  PhNo    uniqueid
Sam        R         123x    1
John       S         345x    2
Paul       K         np.Nan  3
Laney      P         no.NaN  4

df2

uniqueid  PhNo
1         213x
3         675x
4         987x

desired output: df1

FirstName  LastName  PhNo    uniqueid
Sam        R         123x    1
John       S         345x    2
Paul       K         **675x**    3
Laney      P         **987x**    4

Can you add some data sample, 4-5 rows with expected output? — jezrael
– jezrael, Commented Mar 10, 2019 at 17:21

jezrael · Accepted Answer · 2019-03-10 17:44:14Z

4

I believe you need Series.map with Series.fillna:

df1 = pd.DataFrame({
        'FirstName':list('abcdef'),
        'LastName':list('aaabbb'),
         'PhNo':[7,np.nan,9,4,np.nan,np.nan],
         'uniqueid':[5,3,6,9,2,4],

})

print (df1)
  FirstName LastName  PhNo  uniqueid
0         a        a   7.0         5
1         b        a   NaN         3
2         c        a   9.0         6
3         d        b   4.0         9
4         e        b   NaN         2
5         f        b   NaN         4

df2 = pd.DataFrame({
         'PhNo':[10,90,30,20],
         'uniqueid':[3,6,9,4],

})
print (df2)
   PhNo  uniqueid
0    10         3
1    90         6
2    30         9
3    20         4

s = df2.set_index('uniqueid')['PhNo']
df1['PhNo'] = df1['PhNo'].fillna(df1['uniqueid'].map(s))
print (df1)
  FirstName LastName  PhNo  uniqueid
0         a        a   7.0         5
1         b        a  10.0         3
2         c        a   9.0         6
3         d        b   4.0         9
4         e        b   NaN         2
5         f        b  20.0         4

edited Mar 10, 2019 at 17:44

answered Mar 10, 2019 at 17:20

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jezrael Over a year ago

@anky_91 - I ask for data for 100% verification :)

Shri Over a year ago

@jezrael I am getting 0 rather than the value from df2

Shri Over a year ago

@jezrael found the error, my database had "0" rather than empty string, df1['PhNo'].replace(0,np.nan,inplace=True) did the trick though. Would the similar solution work for "0" values or should i post it as separate question?

jezrael Over a year ago

@Shri - With 0 is solution df1['PhNo'] = np.where(df1['PhNo'] == 0, df1['uniqueid'].map(s), df1['PhNo'])

younus · Accepted Answer · 2019-03-10 18:14:57Z

0

DataFrame.fillna(value= &n)

answered Mar 10, 2019 at 18:14

younus

4742 gold badges13 silver badges23 bronze badges

Collectives™ on Stack Overflow

filling out missing values in pandas dataframe

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related