Given the following dataset DF:
uuid,eventTime,Op.progress,Op.progressPercentage, AnotherAttribute
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:39,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:49,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:18,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:49,P,5.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:54:27,P,5.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:07,P,6.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:27,P,6.0,01:53:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:33:46,W,40.0,01:13:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:10,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:16,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:18,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:55,P,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:15,P,1.0,01:59:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:31,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:51,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:22,P,4.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:51,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:22,S,98.0,00:04:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:27,S,98.0,00:03:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:30:27,S,99.0,00:02:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:31:27,S,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
I would like to split into two:
df1:
uuid,eventTime,Op.progress,Op.progressPercentage, AnotherAttribute
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:39,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:52:49,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:18,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:53:49,P,5.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:54:27,P,5.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:07,P,6.0,01:54:00
C0972765-8436-0000-0000-000000000000,2017-08-19T12:55:27,P,6.0,01:53:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:33:46,W,40.0,01:13:00
df2:
uuid,eventTime,Op.progress,Op.progressPercentage, AnotherAttribute
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:10,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:16,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:18,N,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:40:55,P,1.0,02:00:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:15,P,1.0,01:59:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:31,P,3.0,01:57:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:41:51,P,3.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:22,P,4.0,01:56:00
C0972765-8436-0000-0000-000000000000,2017-08-19T13:42:51,P,4.0,01:55:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:22,S,98.0,00:04:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:29:27,S,98.0,00:03:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:30:27,S,99.0,00:02:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:31:27,S,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
C0972765-8436-0000-0000-000000000000,2017-08-19T15:33:01,F,100.0,00:01:00
The split should be based on the Op.progressPercentage attribute which can assume value from 1 to 100.
When I try to apply the solution provided at splitting a pandas Dataframe as shown below, I do not get the right and expected result.
df_dataset = pd.read_csv(filepath) #your input data saved here
wash_list = []
shifted = df_dataset['Op.progressPercentage'].shift()
m = shifted.diff(-1).ne(0) & shifted.eq(100)
a = m.cumsum()
aa = df_dataset.groupby([df_dataset.uuid,a])
for k, gp in aa:
wash_list.append(gp.sort_values(['uuid', 'eventTime'], ascending=[1, 1]))
for wash in wash_list :
print("")
print(wash.to_string())
print("")
Please, any help would be very appreciated. Thank you very much in advance, Best Regards, Carlo