0

I have a dataframe data in which I took a subset of it g2_data to perform some operations on. How would I go about replacing values in the original dataframe with values from the subset, using values from one of the columns as the reference?

The column structure from data is retained in the subset g2_data shown below.

data:
         idx  group   x1   y1
0         27      1  0.0  0.0
1         28      1  0.0  0.0
2         29      1  0.0  0.0
3         73      1  0.0  0.0
4         74      1  0.0  0.0
     ...    ...  ...  ...
14612  14674      8  0.0  0.0
14613  14697      8  0.0  0.0
14614  14698      8  0.0  0.0
14615  14721      8  0.0  0.0
14616  14722      8  0.0  0.0

[14617 rows x 4 columns]
g2_data:
       idx  group     x1     y1
1125  1227      2  115.0    0.0
1126  1228      2    0.0  220.0
1127  1260      2    0.0    0.0
1128  1294      2    0.0    0.0
1129  1295      2    0.0    0.0
   ...    ...    ...    ...
3269  3277      2    0.0    0.0
3270  3308      2    0.0    0.0
3271  3309      2    0.0    0.0
3272  3342      2    0.0    0.0
3273  3343      2    0.0    0.0

[2149 rows x 4 columns]

Replace rows in Dataframe using index from another Dataframe has an answer to do this using the index values of the rows, but I would like to do it using the values from the idx column incase I need to reset the index in the subset later on (i.e. starting from 0 instead of using the index values from the original dataframe). It is important to note that the values in the idx column are all unique as they pertain to info about each observation.

4
  • why not just use that answer, but temporarily set the index to your idx column? data.set_index('idx').loc[g2_data.set_index('idx').index, :] = g2_data[:] Commented Oct 15, 2021 at 20:56
  • @StevenS it seems this retains the original info from data. Commented Oct 15, 2021 at 21:20
  • Is it possible an index idx exists in g2_data and not in data? Commented Oct 15, 2021 at 22:01
  • @Corralien I wouldn't think so as g2_data was taken directly from data with the condition that group == 2. Commented Oct 15, 2021 at 23:49

2 Answers 2

1

This probably isn't optimal, but you can convert g2_data to a dictionary and then map the other columns based on idx, filtering the update to just those ids in the g2_data subset.

g2_data_dict = g2_data.set_index('idx').to_dict()
g2_data_ids = g2_data['idx'].to_list()

for k in g2_data_dict.keys():
    data.loc[df['idx'].isin(g2_data_ids), k] = data['idx'].map(g2_data_dict[k])
Sign up to request clarification or add additional context in comments.

Comments

1

Use combine_first:

out = g2_data.set_index('idx').combine_first(data.set_index('idx')).reset_index()

2 Comments

This does fill in the updated values, but it reorders the dataframe. Does setting the index to 'idx' inherently reorder to fit the new index? Is there a way to maintain the original order while also setting a new index?
No, setting an index doesn't reorder a dataframe. What's happening here is that g2_data is the starting point, so the extra IDs in data are added to the bottom. If you want to keep your data sorted by idx then you can just add sort_index() before reset_index().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.