Finding first minimum values python

Question

How to find the first of several minimum values in a dataset? I want to eventually find values that are at least 2 greater than the minimum value, sequentially.

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})

I would like to identify df['value'][0], or simply (0.6), as the first minimum in this array. Then identify df['value'][4], or (2.8), as the value at least 2 greater than the first identified minimum (0.6).

df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})
df['loc_min'] = df.value[(df.value.shift(1) >= df.value) & (df.value.shift(-1) >= df.value)]
df['loc_min']= df.groupby(['ID'], sort=False)['loc_min'].apply(lambda x: x.ffill()) 
df['condition'] = (df['value'] >= df['loc_min'] + 2)

This works for other datasets but not when the minimums are first.

The ideal output would be:

    ID  value loc_min condition
0   1   0.6   nan     False
1   1   1.5   0.6     False
2   1   1.6   0.6     False
3   1   1.2   0.6     False
4   1   2.8   0.6     True
5   1   0.3   0.3     False
6   1   0.2   0.2     False

As suggested in a comment, a loop would be a better way to go about this.

Are you asking how to find local minima in a 1D array? If so, is one of the answers to this question (or one of the others linked from there) what you're looking for? — abarnert
– abarnert, Commented Aug 19, 2018 at 23:44
Please add in your expected output to make it clear what it is you want. — cs95
– cs95, Commented Aug 19, 2018 at 23:45
I should point out that in general, in Numpy, you don't usually find "the first of…", you find "all of…" (maybe ever in parallel), and then just use the first one or vectorize (or sometimes iterate over) all of them. So, if short-circuiting at the first one is important for correctness, or is expected to give you more performance gain than vectorizing does, you may need to loop. — abarnert
– abarnert, Commented Aug 19, 2018 at 23:45
Can you explain why the first value is NaN? Also, what if the array is [1.5, 0.6, ...]? Where 0.6 is the second element? — cs95
– cs95, Commented Aug 19, 2018 at 23:55
@abarnert thank you for your input & i've updated my question accordingly. Unfortunately, the working data is not a 1d array, but a large dataset. — Ramy Saad
– Ramy Saad, Commented Aug 19, 2018 at 23:57

rafaelc · Accepted Answer · 2018-08-20 00:04:05Z

1

Seems like you need cummin and a simple loc

df['cummin_'] = df.groupby('ID').value.cummin()
df['condition'] = df.value >= df.cummin_ + 2


    ID  value   cummin_ condition
0   1   0.6     0.6     False
1   1   1.5     0.6     False
2   1   1.6     0.6     False
3   1   1.2     0.6     False
4   1   2.8     0.6     True
5   1   0.3     0.3     False
6   1   0.2     0.2     False

Another option is to use expanding. Take, for example,

df = pd.DataFrame({'ID': [1,1,1,1,1,1,1,2,2], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2,0.4,2.9]})

Then

df.groupby('ID').value.expanding(2).min()

    ID   
1   0    NaN
    1    0.6
    2    0.6
    3    0.6
    4    0.6
    5    0.3
    6    0.2
2   7    NaN
    8    0.4

The expanding function yields your NaNs at first while cummin accounts for the first value. Just a matter of understanding how you want results to be interpreted.

edited Aug 20, 2018 at 0:04

answered Aug 19, 2018 at 23:58

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ramy Saad Over a year ago

the cummin function was precisely what i needed. thank you

Collectives™ on Stack Overflow

Finding first minimum values python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related