Replace values in a dataframe column, based on another dataframe

Question

I have two DataFrames:

df1
PRA_Kod;WorkerId
1049;9024
0014;60260
0608;10506
20270/9;20270
9511;64473
0639;60264
0767;10509
.. .. .. . .
df2
WorkerId;Day;Time;W
1049;11;u.wyp;WE
1049;14;u.wyp;WE
64454;11;u.wyp;WE
0014;15;u.wyp;WE
64471;12;u.wyp;WE
64471;13;u.wyp;WE
0639;06;u.wyp;WE
0639;01;u.wyp;WE
0639;02;u.wyp;WE
0639;03;u.wyp;WE
64465;04;u.wyp;WE
64465;05;u.wyp;WE
 ... ... .. .

I need to change the value in the column df2['WorkerId'] to the value df1['WorkerId'] if df2['WorkerId'] appears in df1['PRA_Kod'].

A note: PRA_Kod value "20270/9" is no typo. These tables result from combining data from different tables. Some users happen to have different indexes accross different tables.

Desired output:

df_result
WorkerId;Day;Time;W
9024;11;u.wyp;WE
9024;14;u.wyp;WE
64454;11;u.wyp;WE
60260;15;u.wyp;WE
64471;12;u.wyp;WE
64471;13;u.wyp;WE
60264;06;u.wyp;WE
60264;01;u.wyp;WE
60264;02;u.wyp;WE
60264;03;u.wyp;WE
64465;04;u.wyp;WE
64465;05;u.wyp;WE
... ... .. .

How can this be done?

My attempt below results raises ValueError: Can only compare identically-labeled Series objects.

df_result.loc[ df2['WorkerId'].eq == df1['PRA_Kod'].eq, 
               df2['WorkerId']                         ] = df1['WorkerId']

query can do this. statology.org/…

Tim Roberts
– Tim Roberts

2023-09-20 20:42:23 +00:00
Commented Sep 20, 2023 at 20:42 — Tim Roberts
– Tim Roberts, Commented Sep 20, 2023 at 20:42

OCa · Accepted Answer · 2023-09-21 15:51:22Z

The main issue with your line, is that

you mean to use df1 as a dictionary (look for a value in one column, then get the value in the second column on the same row),
but what you do in fact when you write ==, is requesting a screening for a row-by-row match of both columns in df1 and df2.

Things will go much smoother when you first adapt df1 for use as dictionary.

Method:

Convert dataframe df1 into a dictionary where key : val pairs are PRA_Kod : WorkerId, as in Convert a Pandas DataFrame to a dictionary
Map this dictionary onto dataframe df2, as in Set Pandas Value Based on Dictionary

1. Dictionary

It's not directly df1.to_dict(), see? Either we write a loop, or we transpose it and rework columns and index, so that key:value pairs do land as needed:

map_dict = df1.set_index('PRA_Kod').T.to_dict('records')[0]

{'1049': '9024',
 '0014': '60260',
 '0608': '10506',
 '20270/9': '20270',
 '9511': '64473',
 '0639': '60264',
 '0767': '10509'}

2. Map

The mapping finds cells that are also dictionary keys, and replaces those with dictionary values. Other cells are kept as they were.

df_result = df2.assign(WorkerId = df2.WorkerId.map(map_dict).fillna(df2.WorkerId))

   WorkerId Day   Time   W
0      9024  11  u.wyp  WE
1      9024  14  u.wyp  WE
2     64454  11  u.wyp  WE
3     60260  15  u.wyp  WE
4     64471  12  u.wyp  WE
5     64471  13  u.wyp  WE
6     60264  06  u.wyp  WE
7     60264  01  u.wyp  WE
8     60264  02  u.wyp  WE
9     60264  03  u.wyp  WE
10    64465  04  u.wyp  WE
11    64465  05  u.wyp  WE

Refer to each linked question for more details about how both steps work. Without finding this method for step 2, I might have gone for an np.where, testing for value .isin dictionary keys.

Thank you for your help. This is my third python program. While looking for information, I saw df.to_dict() and map and np.where but no one explained it like you did. Thanks for the clarification.

Collectives™ on Stack Overflow

Replace values in a dataframe column, based on another dataframe

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related