Replace values in column with NaN based on other values in same column

Question

This is particular case of question in header.

I have following dataframe:

values = [[100,54,25,26,32,33,15,2],[1,2,3,4,5,6,7,8]]
columns = ["numbers", "order"]
zipped = dict(zip(columns,values))
df = pd.DataFrame(zipped)
print(df)

   numbers  order
0      100      1
1       54      2
2       25      3
3       26      4
4       32      5
5       33      6
6       15      7
7        2      8

Imagine that dataframe ascendingly sorted by column order. In column numbers I want to replace values with NaN if there is a bigger value present down the rows, and achieve following result:

   numbers  order
0      100      1
1       54      2
2      NaN      3
3      NaN      4
4      NaN      5
5       33      6
6       15      7
7        2      8

What will be the best approach to achieve it without going through the loop?

Update: Probably better example for the initial DF and expected results (to add discontiguous blocks of values to be replaced):

values = [[100,54,25,26,34,32,31,33,15,2],[1,2,3,4,5,6,7,8,9,10]]

   numbers  order
0      100      1
1       54      2
2       25      3
3       26      4
4       34      5
5       32      6
6       31      7
7       33      8
8       15      9
9        2     10

Results:

   numbers  order
0    100.0      1
1     54.0      2
2      NaN      3
3      NaN      4
4     34.0      5
5      NaN      6
6      NaN      7
7     33.0      8
8     15.0      9
9      2.0     10

What is the expected output for a case like this: values = [[100,54,25,26,21,27,32,33,15,2],[1,2,3,4,5,6,7,8,9,10]]? is everything between 25-32 still replaced with np.NaN? — ALollz
– ALollz, Commented Feb 5, 2019 at 19:26

Andy Hayden · Accepted Answer · 2019-02-05 19:30:36Z

6

I read this slightly differently, if the numbers are bigger below that means their reversed cummax is higher:

In [11]: df.at[3, 'numbers'] = 24  # more illustrative example 

In [12]: df.numbers[::-1].cummax()[::-1]
Out[12]:
0    100
1     54
2     33
3     33
4     33
5     33
6     15
7      2
Name: numbers, dtype: int64

In [13]: df.loc[df.numbers < df.numbers[::-1].cummax()[::-1], 'numbers'] = np.nan

In [14]: df
Out[14]:
   numbers  order
0    100.0      1
1     54.0      2
2      NaN      3
3      NaN      4
4      NaN      5
5     33.0      6
6     15.0      7
7      2.0      8

answered Feb 5, 2019 at 19:30

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mongolio Over a year ago

That exactly what I was looking for. While previous answers also works for the dataset I provided initially, they will not handle cases with discontiguous segments of smaller values like values = [[100,54,25,26,34,32,31,33,15,2],[1,2,3,4,5,6,7,8,9]] , but your solution does. Thanks!

busybear · Accepted Answer · 2019-02-05 19:32:35Z

1

You can loop through the values of your columns and check if it's greater than all the elements that come after:

arr = df['numbers'].values
df['numbers'] = [x if all(x > arr[n+1:]) else np.nan for n, x in enumerate(arr)]
df

Output:

   numbers  order
0    100.0      1
1     54.0      2
2      NaN      3
3      NaN      4
4      NaN      5
5     33.0      6
6     15.0      7
7      2.0      8

answered Feb 5, 2019 at 19:32

busybear

10.7k1 gold badge29 silver badges44 bronze badges

Collectives™ on Stack Overflow

Replace values in column with NaN based on other values in same column

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related