Replace dataframe rows with identical rows from another dataframe on a column value

Question

I have a dataframe data in which I took a subset of it g2_data to perform some operations on. How would I go about replacing values in the original dataframe with values from the subset, using values from one of the columns as the reference?

The column structure from data is retained in the subset g2_data shown below.

data:
         idx  group   x1   y1
0         27      1  0.0  0.0
1         28      1  0.0  0.0
2         29      1  0.0  0.0
3         73      1  0.0  0.0
4         74      1  0.0  0.0
     ...    ...  ...  ...
14612  14674      8  0.0  0.0
14613  14697      8  0.0  0.0
14614  14698      8  0.0  0.0
14615  14721      8  0.0  0.0
14616  14722      8  0.0  0.0

[14617 rows x 4 columns]

g2_data:
       idx  group     x1     y1
1125  1227      2  115.0    0.0
1126  1228      2    0.0  220.0
1127  1260      2    0.0    0.0
1128  1294      2    0.0    0.0
1129  1295      2    0.0    0.0
   ...    ...    ...    ...
3269  3277      2    0.0    0.0
3270  3308      2    0.0    0.0
3271  3309      2    0.0    0.0
3272  3342      2    0.0    0.0
3273  3343      2    0.0    0.0

[2149 rows x 4 columns]

Replace rows in Dataframe using index from another Dataframe has an answer to do this using the index values of the rows, but I would like to do it using the values from the idx column incase I need to reset the index in the subset later on (i.e. starting from 0 instead of using the index values from the original dataframe). It is important to note that the values in the idx column are all unique as they pertain to info about each observation.

why not just use that answer, but temporarily set the index to your idx column? data.set_index('idx').loc[g2_data.set_index('idx').index, :] = g2_data[:] — StevenS
– StevenS, Commented Oct 15, 2021 at 20:56
@StevenS it seems this retains the original info from data. — tcolbert
– tcolbert, Commented Oct 15, 2021 at 21:20
Is it possible an index idx exists in g2_data and not in data? — Corralien
– Corralien, Commented Oct 15, 2021 at 22:01
@Corralien I wouldn't think so as g2_data was taken directly from data with the condition that group == 2. — tcolbert
– tcolbert, Commented Oct 15, 2021 at 23:49

StevenS · Accepted Answer · 2021-10-15 22:07:29Z

1

This probably isn't optimal, but you can convert g2_data to a dictionary and then map the other columns based on idx, filtering the update to just those ids in the g2_data subset.

g2_data_dict = g2_data.set_index('idx').to_dict()
g2_data_ids = g2_data['idx'].to_list()

for k in g2_data_dict.keys():
    data.loc[df['idx'].isin(g2_data_ids), k] = data['idx'].map(g2_data_dict[k])

answered Oct 15, 2021 at 22:07

StevenS

6823 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Corralien · Accepted Answer · 2021-10-15 22:07:30Z

1

Use combine_first:

out = g2_data.set_index('idx').combine_first(data.set_index('idx')).reset_index()

answered Oct 15, 2021 at 22:07

Corralien

121k8 gold badges44 silver badges69 bronze badges

2 Comments

tcolbert Over a year ago

This does fill in the updated values, but it reorders the dataframe. Does setting the index to 'idx' inherently reorder to fit the new index? Is there a way to maintain the original order while also setting a new index?

StevenS Over a year ago

No, setting an index doesn't reorder a dataframe. What's happening here is that g2_data is the starting point, so the extra IDs in data are added to the bottom. If you want to keep your data sorted by idx then you can just add sort_index() before reset_index().

Collectives™ on Stack Overflow

Replace dataframe rows with identical rows from another dataframe on a column value

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related