0

How to find the first of several minimum values in a dataset? I want to eventually find values that are at least 2 greater than the minimum value, sequentially.

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})

I would like to identify df['value'][0], or simply (0.6), as the first minimum in this array. Then identify df['value'][4], or (2.8), as the value at least 2 greater than the first identified minimum (0.6).

df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})
df['loc_min'] = df.value[(df.value.shift(1) >= df.value) & (df.value.shift(-1) >= df.value)]
df['loc_min']= df.groupby(['ID'], sort=False)['loc_min'].apply(lambda x: x.ffill()) 
df['condition'] = (df['value'] >= df['loc_min'] + 2)

This works for other datasets but not when the minimums are first.

The ideal output would be:

    ID  value loc_min condition
0   1   0.6   nan     False
1   1   1.5   0.6     False
2   1   1.6   0.6     False
3   1   1.2   0.6     False
4   1   2.8   0.6     True
5   1   0.3   0.3     False
6   1   0.2   0.2     False

As suggested in a comment, a loop would be a better way to go about this.

8
  • Are you asking how to find local minima in a 1D array? If so, is one of the answers to this question (or one of the others linked from there) what you're looking for? Commented Aug 19, 2018 at 23:44
  • Please add in your expected output to make it clear what it is you want. Commented Aug 19, 2018 at 23:45
  • I should point out that in general, in Numpy, you don't usually find "the first of…", you find "all of…" (maybe ever in parallel), and then just use the first one or vectorize (or sometimes iterate over) all of them. So, if short-circuiting at the first one is important for correctness, or is expected to give you more performance gain than vectorizing does, you may need to loop. Commented Aug 19, 2018 at 23:45
  • Can you explain why the first value is NaN? Also, what if the array is [1.5, 0.6, ...]? Where 0.6 is the second element? Commented Aug 19, 2018 at 23:55
  • @abarnert thank you for your input & i've updated my question accordingly. Unfortunately, the working data is not a 1d array, but a large dataset. Commented Aug 19, 2018 at 23:57

1 Answer 1

1

Seems like you need cummin and a simple loc

df['cummin_'] = df.groupby('ID').value.cummin()
df['condition'] = df.value >= df.cummin_ + 2


    ID  value   cummin_ condition
0   1   0.6     0.6     False
1   1   1.5     0.6     False
2   1   1.6     0.6     False
3   1   1.2     0.6     False
4   1   2.8     0.6     True
5   1   0.3     0.3     False
6   1   0.2     0.2     False

Another option is to use expanding. Take, for example,

df = pd.DataFrame({'ID': [1,1,1,1,1,1,1,2,2], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2,0.4,2.9]})

Then

df.groupby('ID').value.expanding(2).min()

    ID   
1   0    NaN
    1    0.6
    2    0.6
    3    0.6
    4    0.6
    5    0.3
    6    0.2
2   7    NaN
    8    0.4

The expanding function yields your NaNs at first while cummin accounts for the first value. Just a matter of understanding how you want results to be interpreted.

Sign up to request clarification or add additional context in comments.

1 Comment

the cummin function was precisely what i needed. thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.