0

I'm trying to calculate running difference on the date column depending on "event column". So, to add another column with date difference between 1 in event column (there only 0 and 1).

Spo far I came to this half-working crappy solution

Dataframe:

    df = pd.DataFrame({'date':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],'event':[0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0],'duration':None})

Code:

x = df.loc[df['event']==1, 'date']
k = 0
for i in range(len(x)):
    df.loc[k:x.index[i], 'duration'] = x.iloc[i] - k
    k = x.index[i]

But I'm sure there is a more elegant solution. Thanks for any advice.

Output format:

+------+-------+----------+
| date | event | duration |
+------+-------+----------+
|    1 |     0 |        3 |
|    2 |     0 |        3 |
|    3 |     1 |        3 |
|    4 |     0 |        6 |
|    5 |     0 |        6 |
|    6 |     0 |        6 |
|    7 |     0 |        6 |
|    8 |     0 |        6 |
|    9 |     1 |        6 |
|   10 |     0 |        4 |
|   11 |     0 |        4 |
|   12 |     0 |        4 |
|   13 |     1 |        4 |
|   14 |     0 |        2 |
|   15 |     1 |        2 |
+------+-------+----------+
3
  • Can you share desired output format? Commented Feb 2, 2021 at 13:50
  • Thanks, added. I mean for example here duration till first event is 3 days, between first and second - 6, etc. Commented Feb 2, 2021 at 13:58
  • I'll post an answer that I think matches the structure of your output table and addresses what you're looking for, but your code sample and test dataframe don't match the output you added. Commented Feb 2, 2021 at 14:18

1 Answer 1

1

Using your initial dataframe:

df = pd.DataFrame({'date':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],'event':[0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0],'duration':None})

Add an index-like column to mark where the transitions occur (you could also base this on the date column if it is unique):

df = df.reset_index().rename(columns={'index':'idx'})
df.loc[df['event']==0, 'idx'] = np.nan
df['idx'] = df['idx'].fillna(method='bfill')

Then, use a groupby() to count the records, and backfill them to match your structure:

df['duration'] = df.groupby('idx')['event'].count() 
df['duration'] = df['duration'].fillna(method='bfill') 

# Alternatively, the previous two lines can be combined as pointed out by OP
# df['duration'] = df.groupby('idx')['event'].transform('count')

df = df.drop(columns='idx')

print(df)
    date  event  duration
0      1      0       2.0
1      2      1       2.0
2      3      0       3.0
3      4      0       3.0
4      5      1       3.0
5      6      0       5.0
6      7      0       5.0
7      8      0       5.0
8      9      0       5.0
9     10      1       5.0
10    11      0       6.0
11    12      0       6.0
12    13      0       6.0
13    14      0       6.0
14    15      0       6.0
15    16      1       6.0
16    17      0       NaN

It ends up as a float value because of the NaN in the last row. This approach works well in general if there are obvious "groups" of things to count.

As an alternative, because the dates are already there as integers you can look at the differences in dates directly:

df = pd.DataFrame({'date':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],'event':[0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0]})

tmp = df[df['event']==1].copy()
tmp['duration'] = (tmp['date'] - tmp['date'].shift(1)).fillna(tmp['date'])
df = pd.merge(df, tmp[['date','duration']], on='date', how='left').fillna(method='bfill')
Sign up to request clarification or add additional context in comments.

2 Comments

Can't say this is an elegant solution, but hey, it's working, thanks! BTW, we can skip fillna df['duration'] = df['duration'].fillna(method='bfill') with transform in previous line df['duration'] = df.groupby('idx')['event'].count() => df['duration'] = df.groupby('idx')['event'].transform('count')
I edited it to reflect your use of transform(), and added a second solution while I was in there. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.