Pandas dataframe first instance of value in column

Question

I have df:

                     Voltage
01-02-2017 00:00       13.1
01-02-2017 00:01       13.2
01-02-2017 00:02       13.3
01-02-2017 00:03       14.1
01-02-2017 00:04       14.3
01-02-2017 00:04       13.5

I would like the time (hh:mm) of the first instance of when the value in the Voltage column >=14.0. There should only be one time value in column 'Time of Full Charge'.

                     Voltage   Time of Full Charge
01-02-2017 00:00       13.1
01-02-2017 00:01       13.2
01-02-2017 00:02       13.3
01-02-2017 00:03       14.1         00:03
01-02-2017 00:04       14.3
01-02-2017 00:04       13.5

I am trying something along these lines, but cannot figure it out:

df.index = pd.to_datetime(df.index)
df.['Time of Full Charge'] = np.where(df.['Voltage'] >= 14.0), (df.index.hour:df.index.minute))

jezrael · Accepted Answer · 2017-04-26 13:54:00Z

11

You need idxmax for first index value by condition, only is necessary index has to be unique:

idx = (df['Voltage'] >= 14.0).idxmax()
df.loc[mask, 'Time of Full Charge'] = mask.idxmax().strftime('%H:%M')
print (df)
                     Voltage Time of Full Charge
2017-01-02 00:00:00     13.1                 NaN
2017-01-02 00:01:00     13.2                 NaN
2017-01-02 00:02:00     13.3                 NaN
2017-01-02 00:03:00     14.1               00:03
2017-01-02 00:04:00     14.3                 NaN
2017-01-02 00:04:00     13.5                 NaN

Or:

idx = (df['Voltage'] >= 14.0).idxmax()
df['Time of Full Charge'] = np.where(df.index == idx, idx.strftime('%H:%M'), '')
print (df)
                     Voltage Time of Full Charge
2017-01-02 00:00:00     13.1                    
2017-01-02 00:01:00     13.2                    
2017-01-02 00:02:00     13.3                    
2017-01-02 00:03:00     14.1               00:03
2017-01-02 00:04:00     14.3                    
2017-01-02 00:04:00     13.5

For non unique index is possible use MultiIndex:

df.index = [np.arange(len(df.index)), df.index]

idx = (df['Voltage'] >= 14.0).idxmax()
df['Time of Full Charge'] = np.where(df.index.get_level_values(0) == idx[0], 
                                     idx[1].strftime('%H:%M'),
                                     '')

df.index = df.index.droplevel(0)
print (df)
                     Voltage Time of Full Charge
2017-01-02 00:00:00     13.1                    
2017-01-02 00:01:00     13.2                    
2017-01-02 00:02:00     13.3                    
2017-01-02 00:03:00     14.1               00:03
2017-01-02 00:04:00     14.3                    
2017-01-02 00:04:00     13.5

edited Apr 26, 2017 at 13:54

answered Apr 26, 2017 at 13:31

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

warrenfitzhenry Over a year ago

Thanks @jezrael. I only need the first instance of when that column reaches 14 or above (there should only be one value in the new column. Is this possible?

warrenfitzhenry Over a year ago

Yes, index is essentially a 24 hour day, so will be unique. thanks!

CuriousLearner Over a year ago

Shouldn't this be idxmin() instead of idxmax()? Because when you say that df['Voltage'] >= 14, it means only those rows with values greater than or equal to 14 will be present. Now, amongst those rows, we just need the minimum one. Kindly let me know where am I wrong.

jezrael Over a year ago

@ArchanJoshi - It is a bit different, because working with boolen mask - Trues with Falses. So for match first True need idxmax, because Trues is processes like 1, False like 0. So first index of '1' (True) is extracted by idxmax. And there is no filtering, (df['Voltage'] >= 14.0) does not filter.

CuriousLearner Over a year ago

Source doesn't matter. Just pick any good one and start. You'll be done in no time. It is not at all a difficult language.

|

MaxU - stand with Ukraine · Accepted Answer · 2017-04-26 13:50:07Z

2

You can use numpy.searchsorted() if Voltage column is sorted:

In [260]: df.index[np.searchsorted(df.Voltage, 14)]
Out[260]: DatetimeIndex(['2017-01-02 00:03:00'], dtype='datetime64[ns]', freq=None)

answered Apr 26, 2017 at 13:50

MaxU - stand with Ukraine

212k37 gold badges402 silver badges437 bronze badges

Collectives™ on Stack Overflow

Pandas dataframe first instance of value in column

2 Answers 2

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related