0

I'm still a novice in Pandas and cannot seem to combine these few basic steps.

Goal:

I would like to perform an efficient lookup and replace of multiple columns based on a conditional.

I have dataframe df, and need an index lookup from another dataframe lookup if columns lower_limit and upper_limit are both NaN.

I couldn't get merge/join to work because there is a difference between index names (think C_something, F_something from DataFrame lookup), left out for simplicity.

Input:

DataFrames:

import pandas as pd; import numpy as np
df = pd.DataFrame([['A', 3, 5],['B', 2, np.NaN],['C', np.NaN, np.NaN],['D', np.NaN, np.NaN]])
df.columns = ['Name','lower_limit','upper_limit']
df = df.set_index('Name')

lookup = pd.DataFrame([['C_Male', 4, 6],['C_Female', 5, 7],['E_Male', 2, 3],['E_Female', 3, 4]])
lookup.columns = ['Name', 'lower', 'upper']
lookup = lookup.set_index('Name')

# index: Name + index_modifier is the lookup index of interest for example
index_modifier = '_Male'

DataFrames visualized:

# df                                  # lookup
      lower_limit  upper_limit                  lower  upper
Name                                  Name              
A             3.0          5.0        C_Male        4      6
B             2.0          NaN        C_Female      5      7
C             NaN          NaN        E_Male        2      3
D             NaN          NaN        E_Female      3      4

Expected output:

# df
      lower_limit  upper_limit
Name                                     
A             3.0          5.0
B             2.0          NaN  #<-- Does not meet conditional
C             4.0          6.0  #<-- Looked-up with index_modifier and changed
D             NaN          NaN  #<-- Looked-up with index_modifier and left unchanged

Broken Code:

I have tried using df.loc() docs and this answer to mask and set values, but cannot seem to get unique values based on that row's index.

Mask and Set Using df.loc

# error: need get index of each row only
df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull()), ['lower_limit','upper_limit'] ] = lookup.loc[df.index + index_modifier]

Mask with df.loc and Then Set

ix_of_interest = df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull())].index

# only keep index values that are in DataFrame 'lookup'
ix_of_interest = [ix for ix in ix_of_interest if ((ix + index_modifier) in lookup.index)]
lookup_ix = [ix + index_modifier for ix in lookup_ix]

# error: Not changing values. I think there is a mismatch of bracket depths for one
df.loc[ix_of_interest, ['lower_limit','upper_limit'] ] = lookup.loc[lookup_ix]

I have also tried to use df.apply() to set the values. See this question.

def do_lookup(row):
    # error:'numpy.float64' object has no attribute 'is_null'
    if row.lower_limit.isnull() and row.upper_limit.isnull():
        if (row.name + index_modifier) in lookup.index:
            return lookup.loc[row.name + index_modifier]

df['lower_limit', 'upper_limit'] = df.apply(do_lookup, axis=1)

or lambda

df['lower_limit', 'upper_limit'] = df.apply(lambda x: lookup.loc[x.name + index_modifier].to_list()
        # isnull() or isnan() would be better
        if ((x.lower_limit == np.NaN) and (x.upper_limit == np.NaN)) 
        # else may not be needed here
        else [np.NaN, np.NaN], 
    axis=1)

This seems like it should be a series of simple steps, but I cannot get them to work correctly. Any insight would be greatly appreciated - my rubber ducky is tired and confused.

4
  • Is ` _something` always the same str? Commented Sep 30, 2019 at 23:10
  • Yes. I added index_modifier as the constant to be added to the string. Commented Sep 30, 2019 at 23:20
  • Thanks!. Please check my solution! Commented Sep 30, 2019 at 23:25
  • Side Question: Why does the df.apply(do_lookup,axis=1) code throw a "'numpy.float64' object has no attribute 'isnull'" error? Commented Sep 30, 2019 at 23:40

1 Answer 1

0

You can use Series.fillna with DataFrame.add_suffix:

index_modifier = '_Male'

init_index=df.index
df=df.T.add_suffix(index_modifier).T
df['lower_limit'].fillna(lookup['lower'],inplace=True)
df['upper_limit'].fillna(lookup['upper'],inplace=True)
df.index=init_index
print(df)


   lower_limit  upper_limit
A          3.0          5.0
B          2.0          NaN
C          4.0          6.0
D          NaN          NaN
Sign up to request clarification or add additional context in comments.

4 Comments

Upon first glance this seems to share the issue with merge. Namely it requires the index to be identical (which I took out for simplicity). If lookup has a modified name, 'C_someModifier', how could I apply fillna?
Depending on the modifications of the index, one form or another would be acted upon. It is best that you update your dataframe with an example of the modified index
if _something is always the same, perhaps the easiest would be to use df = df.T.add_prefix ('somethin _'). T and then merge or fillna ...
I could possibly do an index modifier for the appropriate rows and then un-modify after the lookup. There are multiple layers of complexity though and I'd rather a straightforward use of apply/loc mapping/equivalent if possible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.