Conditional counting in pandas df

Question

I have a dataframe of stock prices:

df = pd.DataFrame([100, 101, 99, 100,105,104,106], columns=['P'])

I would like to create a counter column, that counts either if the current price is higher than the previous row's price, BUT if the current price is lower than the previous row's price, only counts again, once that price is exceeded (like a watermark). Below is the desired column:

df['counter'] = [np.nan, 1, 1, 1,2,2,3]

So the second row's price is 101 which exceeds 100, so the counter is 1, then the price drops to 99 and comes back to 100, but the counter is still 1, because we have not reached the 101 price (which is the watermark), then once we exceed 101 in row 4, with a price of 105, the counter goes to 2, then the price drops to 104 again, so we stay at 2, and then when it goes to 106 we increase the counter to 3.

Ben G · Accepted Answer · 2022-01-04 01:18:02Z

3

Algorithm:

Find what current maximum previously observed value was at each row (inclusive of the current row).
See what the maximum previously observed value was for the preceding row.
Each time a difference exists between these two values, we know that a new water mark has been hit within the current row.

Calculate the cumulative sum of the number of times a new water mark has been hit.

 df["current_observed_max"] = df["p"].cummax()
 df["previous_observed_max"] = df["current_observed_max"].shift(1)
 df["is_new_watermark"] =(df["current_observed_max"] != df["previous_observed_max"]).astype(int)
 df["counter"] = df["is_new_watermark"].cumsum()

With this you may need to subtract 1 depending on how you would like to handle the first observed number.

edited Jan 4, 2022 at 1:18

answered Jan 4, 2022 at 1:15

Ben G

2982 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Niccola Tartaglia Over a year ago

Very nice, that works perfectly. Yeah, I can just subtract 1, that works.

mozway · Accepted Answer · 2022-01-04 08:19:56Z

3

A very simple and efficient method is to combine pandas.factorize and cummax:

df['counter'] = pd.factorize(df['P'].cummax())[0]

Output:

     P  counter
0  100        0
1  101        1
2   99        1
3  100        1
4  105        2
5  104        2
6  106        3

edited Jan 4, 2022 at 8:19

answered Jan 4, 2022 at 4:21

mozway

267k13 gold badges56 silver badges106 bronze badges

1 Comment

Niccola Tartaglia Over a year ago

That is very elegant, thank you for suggesting this!!!

wwnde · Accepted Answer · 2022-01-04 01:26:03Z

2

Another way: Find if the row value is equal to the cummulative maximum and cumsum() to create unique groups

df['newP']=(df['P'].cummax()==df['P']).cumsum()-1

     P  newP
0  100     0
1  101     1
2   99     1
3  100     1
4  105     2
5  104     2
6  106     3

answered Jan 4, 2022 at 1:26

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

1 Comment

Niccola Tartaglia Over a year ago

Ah, also very nice. Thank you!!

Collectives™ on Stack Overflow

Conditional counting in pandas df

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related