Find the cumulative sum of certain values in Python pandas

Question

I have a DataFrame like this:

timestamp             variance
2017-07-10 20:42:42   0
2017-07-10 20:42:42   1
2017-07-10 20:42:42   2
2017-07-10 20:42:43   6
2017-07-10 20:42:43   7
2017-07-10 20:42:43   9
2017-07-10 20:42:43   3
2017-07-10 20:42:43   4
2017-07-10 20:42:43   5
2017-07-10 20:42:43   1
2017-07-10 20:42:43   4
2017-07-10 20:42:43   1
2017-07-10 20:42:43   3
2017-07-10 20:42:43   7
2017-07-10 20:42:43   9

I would like to add a new column that increments for each row in which variance is equal or greater than 5. When values drop below 5, the count should decrement instead. If the value reaches 0, it should stay at 0.

This is what it should look like:

timestamp             variance  cumvar
2017-07-10 20:42:42   0         0
2017-07-10 20:42:42   1         0
2017-07-10 20:42:42   2         0
2017-07-10 20:42:43   6         1
2017-07-10 20:42:43   7         2
2017-07-10 20:42:43   9         3
2017-07-10 20:42:43   3         2
2017-07-10 20:42:43   4         1
2017-07-10 20:42:43   5         2
2017-07-10 20:42:43   1         1
2017-07-10 20:42:43   4         0
2017-07-10 20:42:43   1         0
2017-07-10 20:42:43   3         0
2017-07-10 20:42:43   7         1
2017-07-10 20:42:43   9         2

The closest I've come to doing this is this:

df['cumvar'] = np.where((df['variance'] > 5), 1, -1).cumsum()

But of course, this doesn't apply a minimum value of 0 to the cumulative sum. How can I adapt this to achieve the above?

May be possible to do recursively with scipy.signal.lfilter, see posts here and here. — Brad Solomon
– Brad Solomon, Commented Jul 11, 2017 at 14:52

Alex Pertsev · Accepted Answer · 2017-07-11 15:19:44Z

2

One-liner:

pd.expanding_apply(df['variance'], 
                   lambda s: reduce(lambda x,y : max(x+(1 if y-5 > 0 else -1), 0), s, 0))

But of course, readability sucks =)

You can do it the way you started doing it:

pd.expanding_apply(np.where((df['variance'] > 5), 1, -1), lambda s: reduce(lambda x,y : max(x+y, 0), s, 0))

You can improve readability if you will extract reduce function:

def tricky_func(acc, y):
    next_value = 1 if y - 5 > 0 else -1 
    return max(acc + next_value, 0)

pd.expanding_apply(df['variance'], lambda s: reduce(tricky_func, s))

Edit:

You need to import reduce from functools first you are using python 3

And if you are using pandas 0.18+ you should use

df['variance'].expanding().apply(lambda s: reduce(tricky_func, s))

notation (thanks to Brad Solomon)

edited Jul 11, 2017 at 15:19

answered Jul 11, 2017 at 15:01

Alex Pertsev

9394 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

BENY Over a year ago

This is good Answer ~ Thank you(Ps: I think about the same way reduce function) +1

Brad Solomon Over a year ago

Nice answer, might want to specify from functools import reduce for 3.x, and also expanding_apply is deprecated in favor of .expanding.apply. (New API.)

Jan Zeiseweis · Accepted Answer · 2017-07-11 15:00:23Z

1

It's probably not the most elegant way to do it, but it works:

def cum_sum_limited(val, threshold=5, min_sum=0):
    global tot
    tot -= 1 if val < threshold else -1
    tot = 0 if tot < 0 else tot
    return tot

tot = 0
df['cumvar'] = df.variance.apply(cum_sum_limited)

Let me know what you think

answered Jul 11, 2017 at 15:00

Jan Zeiseweis

3,7482 gold badges19 silver badges24 bronze badges

Comments

Thomas Dussaut · Accepted Answer · 2017-07-11 14:45:49Z

0

I would try a different approach. I'd iterate over df['variance'].values and create a list then append a new Series to the dataframe :

x=0
l=[]
for val in df['variance'].values:
    x = max(x+1 if val > 5 else x-1,0)
    l.append(x)
s=pd.DataFrame([l]).T
df=pd.concat([df,s],axis=1,ignore_index=True, join_axes=[df1.index])

answered Jul 11, 2017 at 14:45

Thomas Dussaut

7551 gold badge7 silver badges21 bronze badges

Collectives™ on Stack Overflow

Find the cumulative sum of certain values in Python pandas

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related