Pandas: Increment or reset count based on another column

Question

I have a Pandas DataFrame representing a time series of scores. I want to use that score to calculate a CookiePoints column based on the following criteria:

Every time the score improves compared to the previous score, a CookiePoint is given.
Every time the score does not improve, all CookiePoints are taken away as punishment (CookiePoints is set to 0).
3 Cookiepoints can be traded in for a Cookie. Therefore, after 3 has been reached, the CookiePoints count should either be 1 (if score is higher) or 0 (if score isn't higher).

See below for an example:

Score       CookiePoints
14          0
13          0
14          1
17          2
17          0
19          1
20          2
22          3
23          1
17          0
19          1
20          2
22          3
21          0

Note that this is a minimal, reproducible example. A solution must use a Pandas DataFrame, and ideally only vectorized operations.

This is basically a dynamic cumsum (in this case a column of 1s), which I don't think can be vectorized. See stackoverflow.com/questions/54208023/… — ALollz
– ALollz, Commented Jul 6, 2019 at 14:03
Another link which i came across yesterday: stackoverflow.com/questions/56904390/… — anky
– anky, Commented Jul 6, 2019 at 14:06
@KevinWinata no. So that would also be helpful. Thanks for the links ALollz and anky_91 - I'm reading up on those now. — Joakim
– Joakim, Commented Jul 6, 2019 at 14:22

Mark Wang · Accepted Answer · 2019-07-06 18:12:42Z

5

It's certainly a tricky question, but still possible to solve within Pandas. (Update V3 solution)

Version 3 (OneLiner)

score = pd.Series([14,13,14,17,17,19,20,22,23,17,19,20,22,21])
result = score.diff().gt(0).pipe(lambda x:x.groupby((~x).cumsum()).cumsum().mod(3).replace(0,3).where(x,0).map(int))

Version 2

score = pd.Series([14,13,14,17,17,19,20,22,23,17,19,20,22,21])

mask= score.diff()>0        

result = mask.groupby((~mask).cumsum()).cumsum().mod(3).replace(0,3).where(mask,0).map(int)

Version 1

score = pd.Series([14,13,14,17,17,19,20,22,23,17,19,20,22,21])

mask= score.diff()>0        # Identify score going up

mask 

0     False
1     False
2      True
3      True
4     False
5      True
6      True
7      True
8      True
9     False
10     True
11     True
12     True
13    False
dtype: bool

# Use False Cumsum to group True values

group = (mask==False).cumsum()

group
0     1
1     2
2     2
3     2
4     3
5     3
6     3
7     3
8     3
9     4
10    4
11    4
12    4
13    5
dtype: int64

# Groupby False Cumsum
temp = mask.groupby(group).cumsum().map(int)
temp

0     0
1     0
2     1
3     2
4     0
5     1
6     2
7     3
8     4
9     0
10    1
11    2
12    3
13    0
dtype: int64

# Fix Cap at 3
# result = temp.where(temp<=3,temp.mod(3)) # This is Wrong. 

result = temp.mod(3).replace(0,3).where(mask,0)
result

0     0
1     0
2     1
3     2
4     0
5     1
6     2
7     3
8     1
9     0
10    1
11    2
12    3
13    0
dtype: int64

edited Jul 6, 2019 at 18:12

answered Jul 6, 2019 at 15:01

Mark Wang

2,7579 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Mark Wang Over a year ago

There's a small issue in the last line. I'll fix it soon. (Fixed)

Umar.H Over a year ago

Great answer, I tried for a little while to produce a one liner, but I wasn't successful. The mod usage is very clever.

Umar.H Over a year ago

Not very readavle but very clever ,will be visiting this post in the future if I come across a similar problem

Mark Wang Over a year ago

@Datanovice well..that's the cost of one liner :)

Joakim Over a year ago

Thank you, that was both very helpful and very impressive.

|

Collectives™ on Stack Overflow

Pandas: Increment or reset count based on another column

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related