2

I have a Pandas DataFrame representing a time series of scores. I want to use that score to calculate a CookiePoints column based on the following criteria:

  • Every time the score improves compared to the previous score, a CookiePoint is given.
  • Every time the score does not improve, all CookiePoints are taken away as punishment (CookiePoints is set to 0).
  • 3 Cookiepoints can be traded in for a Cookie. Therefore, after 3 has been reached, the CookiePoints count should either be 1 (if score is higher) or 0 (if score isn't higher).

See below for an example:

Score       CookiePoints
14          0
13          0
14          1
17          2
17          0
19          1
20          2
22          3
23          1
17          0
19          1
20          2
22          3
21          0

Note that this is a minimal, reproducible example. A solution must use a Pandas DataFrame, and ideally only vectorized operations.

4
  • 1
    Do you have your current unvectorized implementation? Commented Jul 6, 2019 at 13:52
  • 1
    This is basically a dynamic cumsum (in this case a column of 1s), which I don't think can be vectorized. See stackoverflow.com/questions/54208023/… Commented Jul 6, 2019 at 14:03
  • 1
    Another link which i came across yesterday: stackoverflow.com/questions/56904390/… Commented Jul 6, 2019 at 14:06
  • @KevinWinata no. So that would also be helpful. Thanks for the links ALollz and anky_91 - I'm reading up on those now. Commented Jul 6, 2019 at 14:22

1 Answer 1

5

It's certainly a tricky question, but still possible to solve within Pandas. (Update V3 solution)

Version 3 (OneLiner)

score = pd.Series([14,13,14,17,17,19,20,22,23,17,19,20,22,21])
result = score.diff().gt(0).pipe(lambda x:x.groupby((~x).cumsum()).cumsum().mod(3).replace(0,3).where(x,0).map(int))

Version 2

score = pd.Series([14,13,14,17,17,19,20,22,23,17,19,20,22,21])

mask= score.diff()>0        

result = mask.groupby((~mask).cumsum()).cumsum().mod(3).replace(0,3).where(mask,0).map(int)

Version 1

score = pd.Series([14,13,14,17,17,19,20,22,23,17,19,20,22,21])

mask= score.diff()>0        # Identify score going up

mask 

0     False
1     False
2      True
3      True
4     False
5      True
6      True
7      True
8      True
9     False
10     True
11     True
12     True
13    False
dtype: bool

# Use False Cumsum to group True values

group = (mask==False).cumsum()

group
0     1
1     2
2     2
3     2
4     3
5     3
6     3
7     3
8     3
9     4
10    4
11    4
12    4
13    5
dtype: int64

# Groupby False Cumsum
temp = mask.groupby(group).cumsum().map(int)
temp

0     0
1     0
2     1
3     2
4     0
5     1
6     2
7     3
8     4
9     0
10    1
11    2
12    3
13    0
dtype: int64

# Fix Cap at 3
# result = temp.where(temp<=3,temp.mod(3)) # This is Wrong. 

result = temp.mod(3).replace(0,3).where(mask,0)
result

0     0
1     0
2     1
3     2
4     0
5     1
6     2
7     3
8     1
9     0
10    1
11    2
12    3
13    0
dtype: int64
Sign up to request clarification or add additional context in comments.

6 Comments

There's a small issue in the last line. I'll fix it soon. (Fixed)
Great answer, I tried for a little while to produce a one liner, but I wasn't successful. The mod usage is very clever.
Not very readavle but very clever ,will be visiting this post in the future if I come across a similar problem
@Datanovice well..that's the cost of one liner :)
Thank you, that was both very helpful and very impressive.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.