There's some questions about this topic already (like Pandas: Cumulative sum of one column based on value of another) however, none of them full fill my requirements. Let's say I have a dataframe like this one:
id flag
a 1
a 1
a 0
a 0
a 1
b 0
b 0
b 1
b 1
b 1
b 1
c 0
c 1
c 1
c 0
c 1
I want to compute the cumulative sum of flag grouping by id, but avoiding sum where flag is 0 and also cumsum reset to 0 again. I tried summing it up using shift(), groupby(id)['flag']cumsum() within np.where but no luck. Desired Output should be:
id flag cum_flag
a 1 1
a 1 2
a 0 0
a 0 0
a 1 1
b 0 0
b 0 0
b 1 1
b 1 2
b 1 3
b 1 4
c 0 0
c 1 1
c 1 2
c 0 0
c 1 1
The DDL to generate the dataframe:
df = pd.DataFrame({'id': [a, a, a, a, a, b, b, b, b, b, b, c, c, c, c, c],
'flag': [1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1]})
Thanks for your help!