Take for example the following dataframe:
df = pd.DataFrame({"val":np.random.rand(8),
"id1":[1,2,3,4,1,2,3,4],
"id2":[1,2,1,2,2,1,2,2],
"id3":[1,1,1,1,2,2,2,2]})
I would like to replace the id2 rows where id3 does not equal an arbitrary reference with the corresponding id2 values which have the same id1
I have a solution which partially works but does not operate using the 2nd condition (replcae id2 based on same values as id1 when id3 is equal to the reference). This prevents my solution from being very robust, as discussed below.
import pandas as pd
import numpy as np
df = pd.DataFrame({"val":np.random.rand(8),
"id1":[1,2,3,4,1,2,3,4],
"id2":[1,2,1,2,2,1,2,2],
"id3":[1,1,1,1,2,2,2,2]})
reference = 1
df.loc[df['id3'] != reference, "id2"] = df[df["id3"]==reference]["id2"].values
print(df)
Output:
val id1 id2 id3
0 0.580965 1 1 1
1 0.941297 2 2 1
2 0.001142 3 1 1
3 0.479363 4 2 1
4 0.732861 1 1 2
5 0.650075 2 2 2
6 0.776919 3 1 2
7 0.377657 4 2 2
This solution does work, but only under the condition that id3 has two distinct values. If there are three id3 values, i.e.
df = pd.DataFrame({"val":np.random.rand(12),
"id1":[1,2,3,4,1,2,3,4,1,2,3,4],
"id2":[1,2,1,2,2,1,2,2,1,1,2,2],
"id3":[1,1,1,1,2,2,2,2,3,3,3,3]})
Expected/desired output:
val id1 id2 id3
0 0.800934 1 1 1
1 0.505645 2 2 1
2 0.268300 3 1 1
3 0.295300 4 2 1
4 0.564372 1 1 2
5 0.154572 2 2 2
6 0.591691 3 1 2
7 0.896055 4 2 2
8 0.275267 1 1 3
9 0.840533 2 2 3
10 0.192257 3 1 3
11 0.543342 4 2 3
Then unfortunately my solution ceases to work. If anyone could provide some tips how to circumvent this issue, I would be very appreciative.