2

I have a DataFrame like this:

timestamp             variance
2017-07-10 20:42:42   0
2017-07-10 20:42:42   1
2017-07-10 20:42:42   2
2017-07-10 20:42:43   6
2017-07-10 20:42:43   7
2017-07-10 20:42:43   9
2017-07-10 20:42:43   3
2017-07-10 20:42:43   4
2017-07-10 20:42:43   5
2017-07-10 20:42:43   1
2017-07-10 20:42:43   4
2017-07-10 20:42:43   1
2017-07-10 20:42:43   3
2017-07-10 20:42:43   7
2017-07-10 20:42:43   9

I would like to add a new column that increments for each row in which variance is equal or greater than 5. When values drop below 5, the count should decrement instead. If the value reaches 0, it should stay at 0.

This is what it should look like:

timestamp             variance  cumvar
2017-07-10 20:42:42   0         0
2017-07-10 20:42:42   1         0
2017-07-10 20:42:42   2         0
2017-07-10 20:42:43   6         1
2017-07-10 20:42:43   7         2
2017-07-10 20:42:43   9         3
2017-07-10 20:42:43   3         2
2017-07-10 20:42:43   4         1
2017-07-10 20:42:43   5         2
2017-07-10 20:42:43   1         1
2017-07-10 20:42:43   4         0
2017-07-10 20:42:43   1         0
2017-07-10 20:42:43   3         0
2017-07-10 20:42:43   7         1
2017-07-10 20:42:43   9         2

The closest I've come to doing this is this:

df['cumvar'] = np.where((df['variance'] > 5), 1, -1).cumsum()

But of course, this doesn't apply a minimum value of 0 to the cumulative sum. How can I adapt this to achieve the above?

1
  • May be possible to do recursively with scipy.signal.lfilter, see posts here and here. Commented Jul 11, 2017 at 14:52

3 Answers 3

2

One-liner:

pd.expanding_apply(df['variance'], 
                   lambda s: reduce(lambda x,y : max(x+(1 if y-5 > 0 else -1), 0), s, 0))

But of course, readability sucks =)

You can do it the way you started doing it:

pd.expanding_apply(np.where((df['variance'] > 5), 1, -1), lambda s: reduce(lambda x,y : max(x+y, 0), s, 0))

You can improve readability if you will extract reduce function:

def tricky_func(acc, y):
    next_value = 1 if y - 5 > 0 else -1 
    return max(acc + next_value, 0)

pd.expanding_apply(df['variance'], lambda s: reduce(tricky_func, s))

Edit:

You need to import reduce from functools first you are using python 3

And if you are using pandas 0.18+ you should use

df['variance'].expanding().apply(lambda s: reduce(tricky_func, s))

notation (thanks to Brad Solomon)

Sign up to request clarification or add additional context in comments.

2 Comments

This is good Answer ~ Thank you(Ps: I think about the same way reduce function) +1
Nice answer, might want to specify from functools import reduce for 3.x, and also expanding_apply is deprecated in favor of .expanding.apply. (New API.)
1

It's probably not the most elegant way to do it, but it works:

def cum_sum_limited(val, threshold=5, min_sum=0):
    global tot
    tot -= 1 if val < threshold else -1
    tot = 0 if tot < 0 else tot
    return tot

tot = 0
df['cumvar'] = df.variance.apply(cum_sum_limited)

Let me know what you think

Comments

0

I would try a different approach. I'd iterate over df['variance'].values and create a list then append a new Series to the dataframe :

x=0
l=[]
for val in df['variance'].values:
    x = max(x+1 if val > 5 else x-1,0)
    l.append(x)
s=pd.DataFrame([l]).T
df=pd.concat([df,s],axis=1,ignore_index=True, join_axes=[df1.index])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.