2

I have two Multi-indexed DataFrames. One is my reference (about 37000 rows) and the other has fewer rows (e.g., 10).

I want to replace the rows of the big one with the values from the second one.

Sample df1:

lvl1    lvl2 lvl3   Value   Value2

A       1   I   0,862877333 0,181795348
        1   II  0,787022218 0,292046262
        1   III 0,40516176  0,445079108
        2   I   0,882167166 0,683954412
        2   IV  0,743618024 0,103097267
        3   I   0,901062673 0,729188996
        3   II  0,529989452 0,715379923
        3   IV  0,740272198 0,792457421
B       1   I   0,548587694 0,637462653
        1   II  0,201284924 0,084391963
        2   I   0,999118031 0,558207224
        2   II  0,63353019  0,251377184
        2   V   0,694294638 0,685050861
        3   V   0,436723389 0,310871641
        3   VI  0,630832871 0,869957421
        3   VII 0,157874482 0,639308814

Sample df 2:

lvl1    lvl2    lvl3    Value   Value2
A       1       I       0,8654  1
B       2       II      0,264   2

Resulting df3:

lvl1    lvl2 lvl3   Value   Value2

A       1   I   **0,8654**  0,181795348
        1   II  0,787022218 0,292046262
        1   III 0,40516176  0,445079108
        2   I   0,882167166 0,683954412
        2   IV  0,743618024 0,103097267
        3   I   0,901062673 0,729188996
        3   II  0,529989452 0,715379923
        3   IV  0,740272198 0,792457421
        1   I   0,548587694 0,637462653
B       1   II  0,201284924 0,08439196
        2   I   0,999118031 0,558207224
        2   II  **0,264**   0,251377184
        2   V   0,694294638 0,685050861
        3   V   0,436723389 0,310871641
        3   VI  0,630832871 0,869957421
        3   VII 0,157874482 0,639308814

3 Answers 3

3

You can try to replace values on index matching like this:

for ind in df2.index:
    df1.loc[ind, 'Value'] = df2.loc[ind, 'Value']

If you like to replace rows:

for ind in df2.index:
    df1.loc[ind,] = df2.loc[ind,]
Sign up to request clarification or add additional context in comments.

3 Comments

This way with the .loc is relly efficient ! no complicated merging nor column drop !
I used your 2nd code to replace rows, it worked in one PC but on another PC, it has Traceback: tuple index out of range
To replace individual values, at[...] may be faster than loc[...]. If you're only replacing a few values it won't make much difference, but on my machine replacing 50,000 values was 6 or 7 times slower using loc than using at.
0

You can maybe use pd.merge

import numpy as np
import pandas as pd
temp = pd.DataFrame({"lvl1": ["A","A","B","B"], "lvl2": [1,2,1,2], "lvl3":  ["I","II","I","II"], "Value": [0.8628773,0.7870, 0.63353, 0.6998]})
replace = pd.DataFrame({"lvl1": ["A","B"], "lvl2": [1,2], "lvl3": ["I","II"], "Value": [0.8654, 0.264], "Value2": [1,2]})
df = pd.merge(temp, replace, how="left", on=["lvl1","lvl2","lvl3"])
df["Value_x"] = np.where(df["Value_y"].notnull(), df["Value_y"], df["Value_x"])
# df.drop(["Value_y", "Value2"], axis=1, inplace=True)

2 Comments

I tryed merge (and lookup to join append...) it add a new colum instead of replacing the Value in the original df
it adds 2 columns, then you just need to replace values using np.where as mentioned. If you want to delete these columns use pd.drop
0

You could use

df1.update(df2['Value'])

or if you want to replace all the columns,

df1.update(df2)

Note that unlike many data frame operations, this works in-place – it modifies df1 instead of returning a copy.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.