0

In the given data frame, I am trying to perform a row-wise replace operation where 1 should be replaced by the value in Values.

Input:

import pandas as pd
df = pd.DataFrame({'ID': [1,1,1,2,3,3,4,5,6,7], 
                   'A': [0,1,0,1,0,0,1,0,np.nan,0],
                   'B': [0,0,0,0,1,1,0,0,0,0],
                   'C': [1,0,1,0,0,0,0,0,1,1],
                   'Values': [10, 2, 3,4,9,3,4,5,2,3]})

Expected Output:

   ID   A   B   C   Values
0   1   0.0 0   10  10
1   1   2.0 0   0   2
2   1   0.0 0   3   3
3   2   4.0 0   0   4
4   3   0.0 9   0   9
5   3   0.0 3   0   3
6   4   4.0 0   0   4
7   5   0.0 0   0   5
8   6   NaN 0   2   2
9   7   0.0 0   3   3

**Note: The data is very large.

2 Answers 2

1

Use df.where

 df[['A','B','C']]=df[['A','B','C']].where(df[['A','B','C']].ne(1),df['Values'], axis=0)



ID    A  B   C  Values
0   1  0.0  0  10      10
1   1  2.0  0   0       2
2   1  0.0  0   3       3
3   2  4.0  0   0       4
4   3  0.0  9   0       9
5   3  0.0  3   0       3
6   4  4.0  0   0       4
7   5  0.0  0   0       5
8   6  NaN  0   2       2
9   7  0.0  0   3       3

Or

df[['A','B','C']]=df[['A','B','C']].mask(df[['A','B','C']].eq(1),df['Values'], axis=0)
Sign up to request clarification or add additional context in comments.

1 Comment

My data is really large and it is very slow.
0

My data is really large and it is very slow.

If we exploit the nature of your dataset (A, B, C columns have 1s or 0s or Nans), you simply have to multiple df['values'] with each column independently. This should be super fast as it is vectorized.

df['A'] = df['A']*df['Values']
df['B'] = df['B']*df['Values']
df['C'] = df['C']*df['Values']

print(df)
   ID    A  B   C  Values
0   1  0.0  0  10      10
1   1  2.0  0   0       2
2   1  0.0  0   3       3
3   2  4.0  0   0       4
4   3  0.0  9   0       9
5   3  0.0  3   0       3
6   4  4.0  0   0       4
7   5  0.0  0   0       5
8   6  NaN  0   2       2
9   7  0.0  0   3       3

If you want to explicitly check the condition where values of A, B, C are 1 (maybe because those columns could have values other than Nans or 0s), then you can use this -

df[['A','B','C']] = (df[['A','B','C']] == 1)*df[['Values']].values

This will replace the columns A, B, C in the original data but, also replaces Nans with 0.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.