replace value of a dataframe based on value from another dataframe

Question

How can I merge between one dataframe based on the other lookup dataframe.

This is dataframe A where I want to replace the values :

  InfoType  IncidentType    DangerType
0   NaN          A             NaN
1   NaN          C             NaN
2   NaN          B            C
3   NaN          B            NaN

This is the lookup table :

    ID  ParamCode   ParamValue  ParmDesc1   ParamDesc2  SortOrder   ParamStatus
0   1   IncidentType    A       ABC            DEF          1            1
1   2   IncidentType    B       GHI            JKL          2            1
2   3   IncidentType    C       MNO            PQR          7            1
2   3   DangerType      C       STU            VWX          6            1

The expected input :

  InfoType  IncidentType    DangerType
0   NaN          ABC           NaN
1   NaN          MNO           NaN
2   NaN          GHI           STU
3   NaN          GHI           NaN

Note that ParamCode is the column names and I need to replace ParamDesc1 into respective columns in dataframe A. Every column in dataframe A may have NaN and I don't intend to remove them. Just ignore them.

This is what I have done :

ntf_cols = ['InfoType','IncidentType','DangerType']
for c in ntf_cols:
    if (c in ntf.columns) & (c in param['ParamCode'].values):
        paramValue = param['ParamValue'].unique()
        for idx, pv in enumerate(paramValue):
            ntf['NewIncidentType'] = pd.np.where(ntf.IncidentType.str.contains(pv), param['ParmDesc1'].values, "whatever")

Error :

ValueError: operands could not be broadcast together with shapes (25,) (13,) ()

This is a common duplicate. See, e.g., stackoverflow.com/questions/36413993/… — Eric Truett
– Eric Truett, Commented Apr 21, 2020 at 2:47
@Eric Truett, Before this gets voted closed as duplicate, is there a better matching question? OP appears to need to reference the values of two columns in the other df not one. @ dee would improve your question some if your provided what you have tried thus far to solve this problem. — Phillyclause89
– Phillyclause89, Commented Apr 21, 2020 at 2:53
How is this a duplicate? @EricTruett My question needs to compare between a column with values within the dataframe. It is quite different. It is not between two columns. — dee
– dee, Commented Apr 21, 2020 at 2:58
@Phillyclause89 I tried multiple times but I keep deleting them. I cant wrap my head around since the comparison is between columns name and values in another dataframe. I am not sure how to map them in this case. — dee
– dee, Commented Apr 21, 2020 at 3:00

Lambda · Accepted Answer · 2020-04-21 04:39:57Z

0

Use the lookup table to make a dict, and then replace the column values of the original dataframe. Assume the original dataframe is df1 and the lookup table is df2

...
dict_map = dict(zip(df2.ParamCode + "-" + df2.ParamValue, df2.ParmDesc1))

df1['IncidentType'] = ("IncidentType" +'-'+ df1.IncidentType).replace(dict_map)
df1['DangerType'] = ("DangerType" +'-'+ df1.DangerType).replace(dict_map)
...

answered Apr 21, 2020 at 4:39

Lambda

1,3941 gold badge9 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dee Over a year ago

this seems to work. I have a question how can I replace df1.IncidentType with variable. Rather than hardcode it? Because I need to apply this to every column. I can show you my code.

Lambda Over a year ago

use a for loop through the columns you want to apply, something like for col in df1.columns: df1[col] = (col+"-"+df1[col]).replace(dict_map)

Lambda Over a year ago

Is there some columns which dtype is float64, you should change it to str. for col in df1.columns: df1[col] = (col+"-"+df1[col].astype(str)).replace(dict_map)

Phillyclause89 · Accepted Answer · 2020-04-21 06:01:47Z

0

EDIT: Lambda's answer gave me an idea for how you could do this for many columns that you want to apply this logical pattern to:

import pandas as pd

df1 = pd.DataFrame(dict(
    InfoType = [None, None, None, None],
    IncidentType = 'A C B B'.split(),
    DangerType = [None, None, 'C', None],
))

df2 = pd.DataFrame(dict(
    ParamCode = 'IncidentType IncidentType IncidentType DangerType'.split(),
    ParamValue  = 'A B C C'.split(),
    ParmDesc1 = 'ABC GHI MNO STU'.split(),
))


for col in df1.columns[1:]:
    dict_map = dict(
        df2[df2.ParamCode == col][['ParamValue','ParmDesc1']].to_records(index=False)
    )
    df1[col] = df1[col].replace(dict_map)

print(df1)

This assumes every column after the first column in df1 is one that needs updating and the to-be updated column names exists as a values in the 'ParamCode' column of df2.

Python tutor link to code

This problem could be solved using some custom functions and pandas.Series.apply():

import pandas as pd

def find_incident_type(x):
    if pd.isna(x):
        return x
    return df2[
        (df2['ParamCode'] == 'IncidentType') & (df2['ParamValue']==x)
    ]["ParmDesc1"].values[0]


def find_danger_type(x):
    if pd.isna(x):
        return x
    return df2[
        (df2['ParamCode'] == 'DangerType') & (df2['ParamValue']==x)
    ]["ParmDesc1"].values[0]


df1 = pd.DataFrame(dict(
    InfoType = [None, None, None, None],
    IncidentType = 'A C B B'.split(),
    DangerType = [None, None, 'C', None],
))

df2 = pd.DataFrame(dict(
    ParamCode = 'IncidentType IncidentType IncidentType DangerType'.split(),
    ParamValue  = 'A B C C'.split(),
    ParmDesc1 = 'ABC GHI MNO STU'.split(),
))

df1['IncidentType'] = df1['IncidentType'].apply(find_incident_type)
df1['DangerType'] = df1['DangerType'].apply(find_danger_type)

print(df1)

step through the code in python tutor

It is very possible there is a more efficient way to do this. Hopefully some one who knows it will share it.

Also the ref to df2 from the outer scope is hard coded into the custom functions and thus will only work for that variable name from the outer scope. You'll need to use an argument for pandas.Series.apply's args param if you don't want these functions to be dependent on that ref.

edited Apr 21, 2020 at 6:01

answered Apr 21, 2020 at 3:43

Phillyclause89

6784 silver badges15 bronze badges

3 Comments

dee Over a year ago

Thank you very much. I need to look at this soon! But I try to avoid apply because of performance issue. I'm still trying to write up my own and do comparison with yours as yoy stated above, I can't hard code the columns, as I have other columns to transform as well.

Phillyclause89 Over a year ago

@dee My idea here was to make a custom function for each column in df1 that you need to replace values for based on specific values from df2 (your lookup table) you then call the pandas.Series.apply method with the respective function for each column that needs values replaced.

dee Over a year ago

Hey, I also tried your second solution and it worked just as expected! Thank you!

Collectives™ on Stack Overflow

replace value of a dataframe based on value from another dataframe

2 Answers 2

3 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related