3

This is particular case of question in header.

I have following dataframe:

values = [[100,54,25,26,32,33,15,2],[1,2,3,4,5,6,7,8]]
columns = ["numbers", "order"]
zipped = dict(zip(columns,values))
df = pd.DataFrame(zipped)
print(df)

   numbers  order
0      100      1
1       54      2
2       25      3
3       26      4
4       32      5
5       33      6
6       15      7
7        2      8

Imagine that dataframe ascendingly sorted by column order. In column numbers I want to replace values with NaN if there is a bigger value present down the rows, and achieve following result:

   numbers  order
0      100      1
1       54      2
2      NaN      3
3      NaN      4
4      NaN      5
5       33      6
6       15      7
7        2      8

What will be the best approach to achieve it without going through the loop?

Update: Probably better example for the initial DF and expected results (to add discontiguous blocks of values to be replaced):

values = [[100,54,25,26,34,32,31,33,15,2],[1,2,3,4,5,6,7,8,9,10]]

   numbers  order
0      100      1
1       54      2
2       25      3
3       26      4
4       34      5
5       32      6
6       31      7
7       33      8
8       15      9
9        2     10

Results:

   numbers  order
0    100.0      1
1     54.0      2
2      NaN      3
3      NaN      4
4     34.0      5
5      NaN      6
6      NaN      7
7     33.0      8
8     15.0      9
9      2.0     10
1
  • What is the expected output for a case like this: values = [[100,54,25,26,21,27,32,33,15,2],[1,2,3,4,5,6,7,8,9,10]]? is everything between 25-32 still replaced with np.NaN? Commented Feb 5, 2019 at 19:26

2 Answers 2

6

I read this slightly differently, if the numbers are bigger below that means their reversed cummax is higher:

In [11]: df.at[3, 'numbers'] = 24  # more illustrative example 

In [12]: df.numbers[::-1].cummax()[::-1]
Out[12]:
0    100
1     54
2     33
3     33
4     33
5     33
6     15
7      2
Name: numbers, dtype: int64

In [13]: df.loc[df.numbers < df.numbers[::-1].cummax()[::-1], 'numbers'] = np.nan

In [14]: df
Out[14]:
   numbers  order
0    100.0      1
1     54.0      2
2      NaN      3
3      NaN      4
4      NaN      5
5     33.0      6
6     15.0      7
7      2.0      8
Sign up to request clarification or add additional context in comments.

1 Comment

That exactly what I was looking for. While previous answers also works for the dataset I provided initially, they will not handle cases with discontiguous segments of smaller values like values = [[100,54,25,26,34,32,31,33,15,2],[1,2,3,4,5,6,7,8,9]] , but your solution does. Thanks!
1

You can loop through the values of your columns and check if it's greater than all the elements that come after:

arr = df['numbers'].values
df['numbers'] = [x if all(x > arr[n+1:]) else np.nan for n, x in enumerate(arr)]
df

Output:

   numbers  order
0    100.0      1
1     54.0      2
2      NaN      3
3      NaN      4
4      NaN      5
5     33.0      6
6     15.0      7
7      2.0      8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.