1

I have a dataframe which is populated by answers from a Google Form. The new_cycle column is yes/no question on the form. In my dataframe, I need to calculate the cycle_day for each response. In other words, I need the 'yes' to be 1, and every 'no' afterwards to increment by 1. Then, once another 'yes' response is recorded, the count resets.

Here is a minimal, reproducible example. Note that I'm actually doing this on a much larger dataframe, so an optimal solution is paramount.

df = pd.DataFrame(['yes', 'no', 'no', 'no', 'yes', 'no'], columns=['new_cycle'])

# df:
    new_cycle
0   yes
1   no
2   no
3   no
4   yes
5   no

My desired output would be:

    new_cycle
0   1
1   2
2   3
3   4
4   1
5   2

# OR:
    new_cycle   cycle_day
0   yes         1
1   no          2
2   no          3
3   no          4
4   yes         1
5   no          2

How would I do this?

2 Answers 2

2

Let's try this and do it one step:

df['cycle_day'] = df.groupby(df['new_cycle'].eq('yes').cumsum()).cumcount() + 1

Output:

 new_cycle  cycle_day
0       yes          1
1        no          2
2        no          3
3        no          4
4       yes          1
5        no          2
Sign up to request clarification or add additional context in comments.

Comments

1
  1. first add a new column and set them all to 1

    df['val'] = 1

  2. transform new_cycle column into 1 (yes) and 0 (no)

    df['reset'] = df['new_cycle'] == 'yes'

  3. do a cumsum on the new reset column

    df['cumsum']=df['reset'].cumsum()

  4. group by the cumsum column, then do a cumsum on val column and get the result

    df['cycle_day'] = df.groupby(['cumsum'])['val'].cumsum()

This is adapted from this answer

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.