Conditionally Set Multiple Column Values with Dynamic Values Using Loc or Apply

Question

I'm still a novice in Pandas and cannot seem to combine these few basic steps.

Goal:

I would like to perform an efficient lookup and replace of multiple columns based on a conditional.

I have dataframe df, and need an index lookup from another dataframe lookup if columns lower_limit and upper_limit are both NaN.

I couldn't get merge/join to work because there is a difference between index names (think C_something, F_something from DataFrame lookup), left out for simplicity.

Input:

DataFrames:

import pandas as pd; import numpy as np
df = pd.DataFrame([['A', 3, 5],['B', 2, np.NaN],['C', np.NaN, np.NaN],['D', np.NaN, np.NaN]])
df.columns = ['Name','lower_limit','upper_limit']
df = df.set_index('Name')

lookup = pd.DataFrame([['C_Male', 4, 6],['C_Female', 5, 7],['E_Male', 2, 3],['E_Female', 3, 4]])
lookup.columns = ['Name', 'lower', 'upper']
lookup = lookup.set_index('Name')

# index: Name + index_modifier is the lookup index of interest for example
index_modifier = '_Male'

DataFrames visualized:

# df                                  # lookup
      lower_limit  upper_limit                  lower  upper
Name                                  Name              
A             3.0          5.0        C_Male        4      6
B             2.0          NaN        C_Female      5      7
C             NaN          NaN        E_Male        2      3
D             NaN          NaN        E_Female      3      4

Expected output:

# df
      lower_limit  upper_limit
Name                                     
A             3.0          5.0
B             2.0          NaN  #<-- Does not meet conditional
C             4.0          6.0  #<-- Looked-up with index_modifier and changed
D             NaN          NaN  #<-- Looked-up with index_modifier and left unchanged

Broken Code:

I have tried using df.loc() docs and this answer to mask and set values, but cannot seem to get unique values based on that row's index.

Mask and Set Using df.loc

# error: need get index of each row only
df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull()), ['lower_limit','upper_limit'] ] = lookup.loc[df.index + index_modifier]

Mask with df.loc and Then Set

ix_of_interest = df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull())].index

# only keep index values that are in DataFrame 'lookup'
ix_of_interest = [ix for ix in ix_of_interest if ((ix + index_modifier) in lookup.index)]
lookup_ix = [ix + index_modifier for ix in lookup_ix]

# error: Not changing values. I think there is a mismatch of bracket depths for one
df.loc[ix_of_interest, ['lower_limit','upper_limit'] ] = lookup.loc[lookup_ix]

I have also tried to use df.apply() to set the values. See this question.

def do_lookup(row):
    # error:'numpy.float64' object has no attribute 'is_null'
    if row.lower_limit.isnull() and row.upper_limit.isnull():
        if (row.name + index_modifier) in lookup.index:
            return lookup.loc[row.name + index_modifier]

df['lower_limit', 'upper_limit'] = df.apply(do_lookup, axis=1)

or lambda

df['lower_limit', 'upper_limit'] = df.apply(lambda x: lookup.loc[x.name + index_modifier].to_list()
        # isnull() or isnan() would be better
        if ((x.lower_limit == np.NaN) and (x.upper_limit == np.NaN)) 
        # else may not be needed here
        else [np.NaN, np.NaN], 
    axis=1)

This seems like it should be a series of simple steps, but I cannot get them to work correctly. Any insight would be greatly appreciated - my rubber ducky is tired and confused.

Yes. I added index_modifier as the constant to be added to the string. — rocket_brain
– rocket_brain, Commented Sep 30, 2019 at 23:20
Side Question: Why does the df.apply(do_lookup,axis=1) code throw a "'numpy.float64' object has no attribute 'isnull'" error? — rocket_brain
– rocket_brain, Commented Sep 30, 2019 at 23:40

ansev · Accepted Answer · 2019-09-30 23:27:49Z

0

You can use Series.fillna with DataFrame.add_suffix:

index_modifier = '_Male'

init_index=df.index
df=df.T.add_suffix(index_modifier).T
df['lower_limit'].fillna(lookup['lower'],inplace=True)
df['upper_limit'].fillna(lookup['upper'],inplace=True)
df.index=init_index
print(df)


   lower_limit  upper_limit
A          3.0          5.0
B          2.0          NaN
C          4.0          6.0
D          NaN          NaN

edited Sep 30, 2019 at 23:27

answered Sep 30, 2019 at 22:31

ansev

31k5 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

rocket_brain Over a year ago

Upon first glance this seems to share the issue with merge. Namely it requires the index to be identical (which I took out for simplicity). If lookup has a modified name, 'C_someModifier', how could I apply fillna?

ansev Over a year ago

Depending on the modifications of the index, one form or another would be acted upon. It is best that you update your dataframe with an example of the modified index

ansev Over a year ago

if _something is always the same, perhaps the easiest would be to use df = df.T.add_prefix ('somethin _'). T and then merge or fillna ...

rocket_brain Over a year ago

I could possibly do an index modifier for the appropriate rows and then un-modify after the lookup. There are multiple layers of complexity though and I'd rather a straightforward use of apply/loc mapping/equivalent if possible.

Collectives™ on Stack Overflow

Conditionally Set Multiple Column Values with Dynamic Values Using Loc or Apply

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related