Python Pandas set value for specific rows matching condition

Question

Based on dataframe column del value, looked into other columns col_0-14 and then set value as 100 for that row (not to update nan value).

Dataframe look like:

id  val_1   del col_1   col_2   col_3   col_4   col_5   col_6   col_7   col_8   col_9   col_10  col_11  col_12  col_13  col14
1   13      0   0   0   0   0   0   0   0   0   0   0   0   0   
2   11      0   0   0   0   0   0   0   0   0   0   0           
3   8   1   0   0   0   0   0   7   7   8                       
4   6   1   500 1000    1500    2000    2500    3000

As del on 3rd, 4th row, code should replace values to 100 up till val_1 value.

3   8   1   100 100 100 100 100 100 100 100 
4   6   1   100 100 100 100 100 100

I tried:

df.loc[df['del'] == 1, df.columns.str.startswith('col')] = 100

It will replace all row values (col1-14) to 100. Is there a way, I can control it to only till val_1 value and rest of col remains as nan values. OR

After above code, I could use val_1 value to update row again with nan values with using loop logic.

def cut_data(row):
    for i in range(1, 15):
        if i > row['val_1']:
            row['col_' + str(i)] = np.NaN
            
    return row

df = df.apply(cut_data, axis=1)

Please suggest any alternative to loop logic i.e without loop.

DDL to generate DataFrame:

df1 = pd.DataFrame({'id': [1, 2, 3, 4],
                   'val_1': [13, 11, 8, 6],
                   'del' : [np.nan,np.nan,1,1], 
                   'col1': [0, 0, 0, 500],
                   'col2': [0, 0, 0, 1000],
                   'col3': [0, 0, 0, 1500],
                   'col4': [0, 0, 0, 2000],
                   'col5' : [0, 0, 0, 2500],
                   'col6': [0, 0, 7, 3000],
                   'col7': [0, 0, 7,np.nan ],
                   'col8': [0, 0, 7, np.nan],
                   'col9': [0, 0, np.nan, np.nan],
                   'col10': [0, 0, np.nan, np.nan],
                   'col11': [0, 0, np.nan, np.nan],
                   'col12': [0, np.nan, np.nan, np.nan],
                   'col13': [0, np.nan, np.nan, np.nan],
                   'col14': [ np.nan, np.nan, np.nan, np.nan]})

Thanks!

jezrael · Accepted Answer · 2020-07-10 11:37:07Z

1

Solution by last EDIT:

#extract columns names starting by col
c = df.columns[df.columns.str.startswith('col')]

#created 2d mask by compare 1d arrange array by length of c for 2d mask
colsarray = np.arange(len(c))
max1 = df['val_1'].to_numpy()[:, None]

print (max1)
[[13]
 [11]
 [ 8]
 [ 6]]

mask1 = colsarray < max1

print (mask1)
[[ True  True  True  True  True  True  True  True  True  True  True  True
   True False]
 [ True  True  True  True  True  True  True  True  True  True  True False
  False False]
 [ True  True  True  True  True  True  True  True False False False False
  False False]
 [ True  True  True  True  True  True False False False False False False
  False False]]

mask2 = df['del'] == 1
print (mask2)
0    False
1    False
2     True
3     True
Name: del, dtype: bool

#chain by mask2 by & for bitwise AND - first 2 rows are set to False
mask = mask1 & mask2.to_numpy()[:, None]
print (mask)
[[False False False False False False False False False False False False
  False False]
 [False False False False False False False False False False False False
  False False]
 [ True  True  True  True  True  True  True  True False False False False
  False False]
 [ True  True  True  True  True  True False False False False False False
  False False]]

#repalce only filtered rows by columns c
df[c] = np.where(mask, 100, df[c])

print (df)
   id  val_1  del   col1   col2   col3   col4   col5   col6   col7   col8  \
0   1     13  NaN    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
1   2     11  NaN    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
2   3      8  1.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0   
3   4      6  1.0  100.0  100.0  100.0  100.0  100.0  100.0    NaN    NaN   

   col9  col10  col11  col12  col13  col14  
0   0.0    0.0    0.0    0.0    0.0    NaN  
1   0.0    0.0    0.0    NaN    NaN    NaN  
2   NaN    NaN    NaN    NaN    NaN    NaN  
3   NaN    NaN    NaN    NaN    NaN    NaN

edited Jul 10, 2020 at 11:37

answered Jul 10, 2020 at 10:45

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Anku Over a year ago

Thanks jezrael! it worked when del == 1 exist on 1 row. What if del == 1 exist in multiple rows? Also Can you suggest any alternative to above mentioned loop ?

jezrael Over a year ago

@Anku - Tested and working nice if multiple 1 values in del column.

Anku Over a year ago

Apologies @jezrael, your answer is correct! But I missed to point out that value 6 is based on column val_1 value. For each row, its has different value and if I have multiple del on rows like your example, row1 and row3, col1-14 values should get updated up till val_1 value. (Meanwhile I am editing my question for the same).

jezrael Over a year ago

@Anku - I add numpy solution, complicated like loops, but really fast, because vectorized.

Anku Over a year ago

Let us continue this discussion in chat.

Collectives™ on Stack Overflow

Python Pandas set value for specific rows matching condition

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related