replace values in dataframe based in other dataframe filter

Question

I have 2 DataFrames, and I want to replace the values in one dataframe, with the values of the other dataframe, base on the columns on the first one. I put the compositions to clarify.

DF1:

             A  B   C   D   E
Date
01/01/2019  1   2   3   4   5
02/01/2019  1   2   3   4   5
03/01/2019  1   2   3   4   5

DF2:

          name1 name2   name3
Date
01/01/2019  A       B       D
02/01/2019  B       C       E
03/01/2019  A       D       E

THE RESULT I WANT:

          name1 name2   name3   
Date
01/01/2019  1       2        4  
02/01/2019  2       3        5  
03/01/2019  1       4        5

divingTobi · Accepted Answer · 2020-03-10 15:28:12Z

1

Try:

result = df2.melt(id_vars="index").merge(
    df1.melt(id_vars="index"),
    left_on=["index", "value"],
    right_on=["index", "variable"],
).drop(columns=["value_x", "variable_y"]).pivot(
    index="index", columns="variable_x", values="value_y"
)

print(result)

The two melt's transform your dataframes to only contain the numbers in one column, and an additional column for the orignal column names:

df1.melt(id_vars='index')

         index variable  value
0   01/01/2019        A      1
1   02/01/2019        A      1
2   03/01/2019        A      1
3   01/01/2019        B      2
4   02/01/2019        B      2
5   03/01/2019        B      2
...

These you can now join on index and value/variable. The last part is just removing a couple of columns and then reshaping the table back to the desired form.

The result is

variable_x  name1  name2  name3
index                          
01/01/2019      1      2      4
02/01/2019      2      3      5
03/01/2019      1      4      5

edited Mar 10, 2020 at 15:28

answered Mar 10, 2020 at 13:59

divingTobi

2,3502 gold badges16 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Poojan Over a year ago

consider adding 1-2 line description explaining your code

divingTobi Over a year ago

Sorry, I was running to a meeting and planned to come back later with the explanation :-)

jezrael · Accepted Answer · 2020-03-11 07:05:11Z

1

Use DataFrame.lookup for each column separately:

for c in df2.columns:
    df2[c] = df1.lookup(df1.index, df2[c])
print (df2)
            name1  name2  name3
01/01/2019      1      2      4
02/01/2019      2      3      5
03/01/2019      1      4      5

General solution is possible different index and columns names:

print (df1)
            A  B  C  D  G
01/01/2019  1  2  3  4  5
02/01/2019  1  2  3  4  5
05/01/2019  1  2  3  4  5

print (df2)
           name1 name2 name3
01/01/2019     A     B     D
02/01/2019     B     C     E
08/01/2019     A     D     E

df1.index = pd.to_datetime(df1.index, dayfirst=True)
df2.index = pd.to_datetime(df2.index, dayfirst=True)

cols = df2.stack().unique()
idx = df2.index
df11 = df1.reindex(columns=cols, index=idx)
print (df11)
              A    B    D    C   E
2019-01-01  1.0  2.0  4.0  3.0 NaN
2019-01-02  1.0  2.0  4.0  3.0 NaN
2019-01-08  NaN  NaN  NaN  NaN NaN

for c in df2.columns:
    df2[c] = df11.lookup(df11.index, df2[c])
print (df2)
            name1  name2  name3
2019-01-01    1.0    2.0    4.0
2019-01-02    2.0    3.0    NaN
2019-01-08    NaN    NaN    NaN

edited Mar 11, 2020 at 7:05

answered Mar 10, 2020 at 14:05

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

Fede Over a year ago

try both I get this error: Row labels must have same size as column labels

Collectives™ on Stack Overflow

replace values in dataframe based in other dataframe filter

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related