I am trying to add a row in a dataframe. The condition is when a user comes back (after 300 seconds) on the app again then I need to add a row. Below is my code. It works fine but takes a lot of execution time, as the real data frame has 10 million rows.
for i in range(1,len(df)):
if df['user_id'][i]==df['user_id'][i-1] and (df['start_time'][i]-df['start_time'][i-1]).seconds>300:
df.loc[len(df)]=[df['user_id'][i],df['start_time'][i],'psuedo_App_start_2']
Input:
user_id start_time event
100 03/04/19 6:11 psuedo_App_start
100 03/04/19 6:11 notification_receive
100 03/04/19 8:56 notification_dismiss
10 03/04/19 22:05 psuedo_App_start
10 03/04/19 22:05 subcategory_click
10 03/04/19 22:06 subcategory_click
output should look like:
user_id start_time event
100 03/04/19 6:11 psuedo_App_start
100 03/04/19 6:11 notification_receive
100 03/04/19 8:56 psuedo_App_start_2
100 03/04/19 8:56 notification_dismiss
10 03/04/19 22:05 psuedo_App_start
10 03/04/19 22:05 subcategory_click
10 03/04/19 22:06 subcategory_click
As seen in the output, there is a row added for user_id = 100, as he came back at 8.56 i.e after 300 seconds.
groupby['user_id','start_time'], then use df.timedelta to check if thestart_timefor each id is bigger than 300 and insert a new line if condition is met (with the laststart_timeanduser_idpulled from df)