1

I want to keep the last few rows, but such that once there is a time gap above 100ms, cut off the rest of the dataframe. For example:

Input:

           Time  X
0   12:30:00.00  A
1  12:30:00.100  B
2  12:30:00.202  C
3  12:30.00.300  D

Output

           Time  X
2  12:30:00.202  C
3  12:30.00.300  D

Explanation: there's more than 100ms between rows B and C, so we throw away everything above row C.

2
  • What is your expected behavior when there are multiple 100ms+ gaps in the data? Take the last group past the gaps? Commented May 31, 2016 at 14:01
  • No, truncate first time there is 100ms gap, and by first time I mean when looking from the end towards the start(top). Commented May 31, 2016 at 14:31

1 Answer 1

3

You can use diff comparing with Timedelta by to_timedelta, then cumsum with comparing with 1. Last use boolean indexing:

df['Time']= pd.to_datetime(df['Time'], format='%H:%M:%S.%f')

print (df)
                     Time  X
0 1900-01-01 12:30:00.000  A
1 1900-01-01 12:30:00.100  B
2 1900-01-01 12:30:00.202  C
3 1900-01-01 12:30:00.300  D

print (df.Time.diff())
0               NaT
1   00:00:00.100000
2   00:00:00.102000
3   00:00:00.098000
Name: Time, dtype: timedelta64[ns]

mask = (((df.Time.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()) >= 1)
print (mask)
0    False
1    False
2     True
3     True
Name: Time, dtype: bool

print (df[mask])
                     Time  X
2 1900-01-01 12:30:00.202  C
3 1900-01-01 12:30:00.300  D

If need column Time not changed ans split on first value higher as 100ms:

df['Time1']= pd.to_datetime(df['Time'], format='%H:%M:%S.%f')
print (df)
           Time  X                   Time1
0   12:30:00.00  A 1900-01-01 12:30:00.000
1  12:30:00.100  B 1900-01-01 12:30:00.100
2  12:30:00.202  C 1900-01-01 12:30:00.202
3  12:30:00.300  D 1900-01-01 12:30:00.300
1  12:30:00.100  E 1900-01-01 12:30:00.100
2  12:30:00.202  F 1900-01-01 12:30:00.202

print (df.Time1.diff())
0                        NaT
1            00:00:00.100000
2            00:00:00.102000
3            00:00:00.098000
1   -1 days +23:59:59.800000
2            00:00:00.102000
Name: Time1, dtype: timedelta64[ns]

mask = (((df.Time1.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()) >= 1)
print (mask)
0    False
1    False
2     True
3     True
1     True
2     True
Name: Time1, dtype: bool

print (df[mask].drop('Time1',axis=1))
           Time  X
2  12:30:00.202  C
3  12:30:00.300  D
1  12:30:00.100  E
2  12:30:00.202  F

If need split by last value:

print (df)
           Time  X
0   12:30:00.00  A
1  12:30:00.100  B
2  12:30:00.202  C
3  12:30:00.300  D
1  12:30:00.100  E
2  12:30:00.202  F

#create helper series
time_ser= pd.to_datetime(df['Time'], format='%H:%M:%S.%f')
#get differences
print (time_ser.diff())
0                        NaT
1            00:00:00.100000
2            00:00:00.102000
3            00:00:00.098000
1   -1 days +23:59:59.800000
2            00:00:00.102000
Name: Time, dtype: timedelta64[ns]
#compare with 100ms timedalta
mask = (((time_ser.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()))
print (mask)
0    0
1    0
2    1
3    1
1    1
2    2
Name: Time, dtype: int32

#get last value of mask
last_val = mask.iat[-1]
print(last_val)
2

#compare mask with last value and use boolean indexing
print (df[mask == last_val])
           Time  X
2  12:30:00.202  F
Sign up to request clarification or add additional context in comments.

1 Comment

I edit answer by spliting by last value, please check solution. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.