1

Based on dataframe column del value, looked into other columns col_0-14 and then set value as 100 for that row (not to update nan value).

Dataframe look like:

id  val_1   del col_1   col_2   col_3   col_4   col_5   col_6   col_7   col_8   col_9   col_10  col_11  col_12  col_13  col14
1   13      0   0   0   0   0   0   0   0   0   0   0   0   0   
2   11      0   0   0   0   0   0   0   0   0   0   0           
3   8   1   0   0   0   0   0   7   7   8                       
4   6   1   500 1000    1500    2000    2500    3000

As del on 3rd, 4th row, code should replace values to 100 up till val_1 value.

3   8   1   100 100 100 100 100 100 100 100 
4   6   1   100 100 100 100 100 100

I tried:

df.loc[df['del'] == 1, df.columns.str.startswith('col')] = 100

It will replace all row values (col1-14) to 100. Is there a way, I can control it to only till val_1 value and rest of col remains as nan values. OR

After above code, I could use val_1 value to update row again with nan values with using loop logic.

def cut_data(row):
    for i in range(1, 15):
        if i > row['val_1']:
            row['col_' + str(i)] = np.NaN
            
    return row

df = df.apply(cut_data, axis=1)

Please suggest any alternative to loop logic i.e without loop.

DDL to generate DataFrame:

df1 = pd.DataFrame({'id': [1, 2, 3, 4],
                   'val_1': [13, 11, 8, 6],
                   'del' : [np.nan,np.nan,1,1], 
                   'col1': [0, 0, 0, 500],
                   'col2': [0, 0, 0, 1000],
                   'col3': [0, 0, 0, 1500],
                   'col4': [0, 0, 0, 2000],
                   'col5' : [0, 0, 0, 2500],
                   'col6': [0, 0, 7, 3000],
                   'col7': [0, 0, 7,np.nan ],
                   'col8': [0, 0, 7, np.nan],
                   'col9': [0, 0, np.nan, np.nan],
                   'col10': [0, 0, np.nan, np.nan],
                   'col11': [0, 0, np.nan, np.nan],
                   'col12': [0, np.nan, np.nan, np.nan],
                   'col13': [0, np.nan, np.nan, np.nan],
                   'col14': [ np.nan, np.nan, np.nan, np.nan]})

Thanks!

1 Answer 1

1

Solution by last EDIT:

#extract columns names starting by col
c = df.columns[df.columns.str.startswith('col')]

#created 2d mask by compare 1d arrange array by length of c for 2d mask
colsarray = np.arange(len(c))
max1 = df['val_1'].to_numpy()[:, None]

print (max1)
[[13]
 [11]
 [ 8]
 [ 6]]

mask1 = colsarray < max1

print (mask1)
[[ True  True  True  True  True  True  True  True  True  True  True  True
   True False]
 [ True  True  True  True  True  True  True  True  True  True  True False
  False False]
 [ True  True  True  True  True  True  True  True False False False False
  False False]
 [ True  True  True  True  True  True False False False False False False
  False False]]

mask2 = df['del'] == 1
print (mask2)
0    False
1    False
2     True
3     True
Name: del, dtype: bool

#chain by mask2 by & for bitwise AND - first 2 rows are set to False
mask = mask1 & mask2.to_numpy()[:, None]
print (mask)
[[False False False False False False False False False False False False
  False False]
 [False False False False False False False False False False False False
  False False]
 [ True  True  True  True  True  True  True  True False False False False
  False False]
 [ True  True  True  True  True  True False False False False False False
  False False]]

#repalce only filtered rows by columns c
df[c] = np.where(mask, 100, df[c])

print (df)
   id  val_1  del   col1   col2   col3   col4   col5   col6   col7   col8  \
0   1     13  NaN    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
1   2     11  NaN    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
2   3      8  1.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0   
3   4      6  1.0  100.0  100.0  100.0  100.0  100.0  100.0    NaN    NaN   

   col9  col10  col11  col12  col13  col14  
0   0.0    0.0    0.0    0.0    0.0    NaN  
1   0.0    0.0    0.0    NaN    NaN    NaN  
2   NaN    NaN    NaN    NaN    NaN    NaN  
3   NaN    NaN    NaN    NaN    NaN    NaN  
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks jezrael! it worked when del == 1 exist on 1 row. What if del == 1 exist in multiple rows? Also Can you suggest any alternative to above mentioned loop ?
@Anku - Tested and working nice if multiple 1 values in del column.
Apologies @jezrael, your answer is correct! But I missed to point out that value 6 is based on column val_1 value. For each row, its has different value and if I have multiple del on rows like your example, row1 and row3, col1-14 values should get updated up till val_1 value. (Meanwhile I am editing my question for the same).
@Anku - I add numpy solution, complicated like loops, but really fast, because vectorized.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.