2

I'm trying to create a new dataframe column that acts as a running variable that resets to zero or "passes" under certain conditions. Below is a simplified example of what I'm looking to accomplish. Let's say I'm trying to quit drinking coffee and I'm tracking the number of days in a row i've gone without drinking any. On days where I forgot to make note of whether I drank coffee, I put "forgot", and my tally does not get influenced.

Below is how i'm currently accomplishing this, though I suspect there's a much more efficient way of going about it.

Thanks in advance!

import pandas as pd

Day = [1,2,3,4,5,6,7,8,9,10,11]  
DrankCoffee = ['no','no','forgot','yes','no','no','no','no','no','yes','no']

df = pd.DataFrame(list(zip(Day,DrankCoffee)), columns=['Day','DrankCoffee'])

df['Streak'] = 0  

s = 0

for (index,row) in df.iterrows():
   if row['DrankCoffee'] == 'no':
      s += 1
   if row['DrankCoffee'] == 'yes':
      s = 0
   else:
      pass

   df.loc[index,'Streak'] = s

enter image description here

1
  • Could you give more details of how the problem is structured? Because it seems you could use iloc and keep track of the last 0 in your streak column. Let us call it zero_streak. If the next entry is yes then just add +1 from the zero_streak index to current index. If no then set the new row for streak as 0 and update your zero_streak to the new index Commented May 2, 2018 at 21:34

3 Answers 3

4

you can use groupby.transform

for each streak, what you're looking for is something like this:

def my_func(group):
    return (group == 'no').cumsum()

you can divide the different streak with simple comparison and cumsum

streak = (df['DrankCoffee'] == 'yes').cumsum()
0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     1
8     1
9     2
10    2

then apply the transform

df['Streak'] = df.groupby(streak)['DrankCoffee'].transform(my_func)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks!! All of the responses are great, but I think this one was the easiest for me to understand the step by step process.
3

You need firstly map you DrankCoffee to [0,1](Base on my understanding yes and forgot should be 0 and no is 1), then we just do groupby cumsum to create the group key , when there is yes we start a new round for count those evens

df.DrankCoffee.replace({'no':1,'forgot':0,'yes':0}).groupby((df.DrankCoffee=='yes').cumsum()).cumsum()
Out[111]: 
0     1
1     2
2     2
3     0
4     1
5     2
6     3
7     4
8     5
9     0
10    1
Name: DrankCoffee, dtype: int64

1 Comment

mapping DrankCoffee to 0,1 can be easier with == 'no'
2

Use:

df['Streak'] = df.assign(streak=df['DrankCoffee'].eq('no'))\
                 .groupby(df['DrankCoffee'].eq('yes').cumsum())['streak'].cumsum().astype(int)

Output:

    Day DrankCoffee  Streak
0     1          no       1
1     2          no       2
2     3      forgot       2
3     4         yes       0
4     5          no       1
5     6          no       2
6     7          no       3
7     8          no       4
8     9          no       5
9    10         yes       0
10   11          no       1
  1. First, create streak increment when 'no' then True.
  2. Next, create streak when 'yes' start a new streak using cumsum().
  3. Lastly, use cumsum to count streak increment in streaks with cumsum().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.