1

I have the following dataset:

df = pd.DataFrame( {'user': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 2}, 
    'date': {0: '1995-09-01', 1: '1995-09-02', 2: '1995-10-03', 3: '1995-10-04', 4: '1995-10-05', 5: '1995-11-07', 6: '1995-11-08'}, 
    'x': {0: '1995-09-02', 1: '1995-09-02', 2: '1995-09-02', 3: '1995-10-05', 4: '1995-10-05', 5: '1995-10-05', 6: '1995-10-05'}, 
    'y': {0: '1995-10-03', 1: '1995-10-03', 2: '1995-10-03', 3: '1995-11-08', 4: '1995-11-08', 5: '1995-11-08', 6: '1995-11-08'}, 
    'c1': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'}, 
    'c2': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'}, 
    'c3': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'}, 
    'VTX1': {0: 1, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0}, 
    'VTY1': {0: 0, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 0}} )

which gives me:

    user    date         x           y     c1   c2 c3 VTX1 VTY1
0   1   1995-09-01  1995-09-02  1995-10-03  1   1   1   1   0
1   1   1995-09-02  1995-09-02  1995-10-03  0   0   0   0   1
2   1   1995-10-03  1995-09-02  1995-10-03  0   0   0   0   0
3   2   1995-10-04  1995-10-05  1995-11-08  2   2   2   1   0
4   2   1995-10-05  1995-10-05  1995-11-08  0   0   0   0   0
5   2   1995-11-07  1995-10-05  1995-11-08  9   9   9   0   1
6   2   1995-11-08  1995-10-05  1995-11-08  0   0   0   0   0

I want to replaces df[‘c1’] as follows.

- When df[‘date’]=df[‘x’], 
       change df[‘c1’] for the df[‘c1’] value when df[‘VTX1’]=1
    

In this example, for user 1, when df[‘date’]=df[‘x’] it happens to be on index 1. Here we want df['c1'] to be 1. Note that 1 is the value that user 1 has on df['c1'] when df['VTX1'] = 1.

So the end result would be:

   user    date          x         y       c1   c2 c3  VTX1 VTY1
0   1   1995-09-01  1995-09-02  1995-10-03  1   1   1   1   0
1   1   1995-09-02  1995-09-02  1995-10-03  0   0   0   0   1
2   1   1995-10-03  1995-09-02  1995-10-03  0   0   0   0   0
3   2   1995-10-04  1995-10-05  1995-11-08  2   2   2   1   0
4   2   1995-10-05  1995-10-05  1995-11-08  2   0   0   0   0
5   2   1995-11-07  1995-10-05  1995-11-08  9   9   9   0   1
6   2   1995-11-08  1995-10-05  1995-11-08  0   0   0   0   0
4
  • 1
    For the first condition df[‘date’]=df[‘x’]... doesnot match for any row, can you explain how does it match with the second row? Commented Aug 24, 2021 at 16:16
  • @anky true, amended accordingly Commented Aug 24, 2021 at 17:02
  • 1
    In each group can there be more than one VTX1 value which is equal to 1? Commented Aug 25, 2021 at 8:04
  • @ShubhamSharma in each user group there is only 1 entry with VTX equal to 1 Commented Aug 25, 2021 at 8:44

1 Answer 1

1

For each unique user select the row where the column VTX1 has the value 1, this can be done by setting the index to user and using query to select the required rows. Then mask the values in c1 where date is equal x and substitute the masked values using the mapping series d

d = df.set_index('user').query('VTX1 == 1')['c1']
df['c1'] = df['c1'].mask(df['date'].eq(df['x']), df['user'].map(d))

   user        date           x           y c1 c2 c3  VTX1  VTY1
0     1  1995-09-01  1995-09-02  1995-10-03  1  1  1     1     0
1     1  1995-09-02  1995-09-02  1995-10-03  1  0  0     0     1
2     1  1995-10-03  1995-09-02  1995-10-03  0  0  0     0     0
3     2  1995-10-04  1995-10-05  1995-11-08  2  2  2     1     0
4     2  1995-10-05  1995-10-05  1995-11-08  2  0  0     0     0
5     2  1995-11-07  1995-10-05  1995-11-08  9  9  9     0     1
6     2  1995-11-08  1995-10-05  1995-11-08  0  0  0     0     0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.