3

I have gone through the posts that are similar to filling out the multiple columns for pandas in one go, however it appears that my problem here is a little different, in the sense that I need to be able to populate a missing column value with a specific column value and be able to do that for multiple columns in one go.

Eg: I can use the commands as below individually to fill the NA's

result1_copy['BASE_B'] = np.where(pd.isnull(result1_copy['BASE_B']), result1_copy['BASE_S'], result1_copy['BASE_B'])

result1_copy['QWE_B'] = np.where(pd.isnull(result1_copy['QWE_B']), result1_copy['QWE_S'], result1_copy['QWE_B'])

However, if I were to try populating it one go, it does not work:

result1_copy['BASE_B','QWE_B'] = result1_copy['BASE_B', 'QWE_B'].fillna(result1_copy['BASE_S','QWE_S'])

Do we know why ? Please note I have only used 2 columns here for ease of purpose, however I have 10s of columns to impute. And they are either object, float or datetime. Is datatypes the issue here ?

0

1 Answer 1

2

You need add [] for filtered DataFrame and for align columns add rename:

d = {'BASE_S':'BASE_B', 'QWE_S':'QWE_B'}
result1_copy[['BASE_B','QWE_B']] = result1_copy[['BASE_B', 'QWE_B']]
                                     .fillna(result1_copy[['BASE_S','QWE_S']]
                                     .rename(columns=d))

More dynamic solution:

L = ['BASE_','QWE_']
orig = ['{}B'.format(x) for x in L]
new =  ['{}S'.format(x) for x in L]

d = dict(zip(new, orig))
result1_copy[orig] = (result1_copy[orig].fillna(result1_copy[new]
                                        .rename(columns=d)))

Another solution if match columns with B and S:

for x in ['BASE_','QWE_']:
    result1_copy[x + 'B'] = result1_copy[x + 'B'].fillna(result1_copy[x + 'S'])

Sample:

result1_copy = pd.DataFrame({'A':list('abcdef'),
                   'BASE_B':[np.nan,5,4,5,5,np.nan],
                   'QWE_B':[np.nan,8,9,4,2,np.nan],
                   'BASE_S':[1,3,5,7,1,0],
                   'QWE_S':[5,3,6,9,2,4],
                   'F':list('aaabbb')})


print (result1_copy)
   A  BASE_B  BASE_S  F  QWE_B  QWE_S
0  a     NaN       1  a    NaN      5
1  b     5.0       3  a    8.0      3
2  c     4.0       5  a    9.0      6
3  d     5.0       7  b    4.0      9
4  e     5.0       1  b    2.0      2
5  f     NaN       0  b    NaN      4

d = {'BASE_S':'BASE_B', 'QWE_S':'QWE_B'}
result1_copy[['BASE_B','QWE_B']] = (result1_copy[['BASE_B', 'QWE_B']]
                                      .fillna(result1_copy[['BASE_S','QWE_S']]
                                      .rename(columns=d)))
print (result1_copy) 
   A  BASE_B  BASE_S  F  QWE_B  QWE_S
0  a     1.0       1  a    5.0      5
1  b     5.0       3  a    8.0      3
2  c     4.0       5  a    9.0      6
3  d     5.0       7  b    4.0      9
4  e     5.0       1  b    2.0      2
5  f     0.0       0  b    4.0      4
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @jezrael, however am a little confused over the use of rename method in this case ? Do we need it ?
@asimo - try omit it and no change. :) Problem is need same columns names for both dataframes in fillna
This is nice and tidy :) for x in ['BASE_','QWE_']: result1_copy[x + 'B'] = result1_copy[x + 'B'].fillna(result1_copy[x + 'S'])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.