1

Take for example the following dataframe:

df = pd.DataFrame({"val":np.random.rand(8),
                   "id1":[1,2,3,4,1,2,3,4],
                   "id2":[1,2,1,2,2,1,2,2],
                   "id3":[1,1,1,1,2,2,2,2]})

I would like to replace the id2 rows where id3 does not equal an arbitrary reference with the corresponding id2 values which have the same id1

I have a solution which partially works but does not operate using the 2nd condition (replcae id2 based on same values as id1 when id3 is equal to the reference). This prevents my solution from being very robust, as discussed below.

import pandas as pd
import numpy as np

df = pd.DataFrame({"val":np.random.rand(8),
                   "id1":[1,2,3,4,1,2,3,4],
                   "id2":[1,2,1,2,2,1,2,2],
                   "id3":[1,1,1,1,2,2,2,2]})

reference = 1
df.loc[df['id3'] != reference, "id2"] = df[df["id3"]==reference]["id2"].values
print(df)

Output:

        val  id1  id2  id3
0  0.580965    1    1    1
1  0.941297    2    2    1
2  0.001142    3    1    1
3  0.479363    4    2    1
4  0.732861    1    1    2
5  0.650075    2    2    2
6  0.776919    3    1    2
7  0.377657    4    2    2

This solution does work, but only under the condition that id3 has two distinct values. If there are three id3 values, i.e.

df = pd.DataFrame({"val":np.random.rand(12),
                   "id1":[1,2,3,4,1,2,3,4,1,2,3,4],
                   "id2":[1,2,1,2,2,1,2,2,1,1,2,2],
                   "id3":[1,1,1,1,2,2,2,2,3,3,3,3]})

Expected/desired output:

         val  id1  id2  id3
0   0.800934    1    1    1
1   0.505645    2    2    1
2   0.268300    3    1    1
3   0.295300    4    2    1
4   0.564372    1    1    2
5   0.154572    2    2    2
6   0.591691    3    1    2
7   0.896055    4    2    2
8   0.275267    1    1    3
9   0.840533    2    2    3
10  0.192257    3    1    3
11  0.543342    4    2    3

Then unfortunately my solution ceases to work. If anyone could provide some tips how to circumvent this issue, I would be very appreciative.

1 Answer 1

2

If id1 column is like counter of groups create helper Series by reference group by filtering and DataFrame.set_index first and then use Series.map:

reference = 1
s = df[df['id3'] == reference].set_index('id1')['id2']
df['id2'] = df['id1'].map(s)
print (df)
         val  id1  id2  id3
0   0.986277    1    1    1
1   0.873392    2    2    1
2   0.509746    3    1    1
3   0.271836    4    2    1
4   0.336919    1    1    2
5   0.216954    2    2    2
6   0.276477    3    1    2
7   0.343316    4    2    2
8   0.862159    1    1    3
9   0.156700    2    2    3
10  0.140887    3    1    3
11  0.757080    4    2    3

If not counter column create new one by GroupBy.cumcount:

reference = 1

df['g'] = df.groupby('id3').cumcount()
s = df[df['id3'] == reference].set_index('g')['id2']
df['id2'] = df['g'].map(s)
print (df)
         val  id1  id2  id3  g
0   0.986277    1    1    1  0
1   0.873392    2    2    1  1
2   0.509746    3    1    1  2
3   0.271836    4    2    1  3
4   0.336919    1    1    2  0
5   0.216954    2    2    2  1
6   0.276477    3    1    2  2
7   0.343316    4    2    2  3
8   0.862159    1    1    3  0
9   0.156700    2    2    3  1
10  0.140887    3    1    3  2
11  0.757080    4    2    3  3
Sign up to request clarification or add additional context in comments.

1 Comment

damn beat me to it, nice one +1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.