I have a large dataframe of the form
timestamp | col1 | col2 ...
I want to select rows spaced out by an interval of at least x minutes, where x can be 5,10,30, etc. The problem is the timestamps arent equally spaced, so I cant do a simple "take every nth row" trick.
Example:
timestamp | col1 | col2
'2019-01-15 17:52:29.955000', x, b
'2019-01-15 17:58:29.531000', x, b
'2019-01-16 03:21:48.255000', x, b
'2019-01-16 03:27:46.324000', x, b
'2019-01-16 03:33:09.984000', x, b
'2019-01-16 07:22:08.170000', x, b
'2019-01-16 07:28:27.406000', x, b
'2019-01-16 07:34:35.194000', x, b
if interval = 10:
result:
'2019-01-15 17:52:29.955000', x, b
'2019-01-16 03:21:48.255000', x, b
'2019-01-16 03:33:09.984000', x, b
'2019-01-16 07:22:08.170000', x, b
'2019-01-16 07:34:35.194000', x, b
if interval = 30:
result:
'2019-01-15 17:52:29.955000', x, b
'2019-01-16 03:21:48.255000', x, b
'2019-01-16 07:22:08.170000', x, b
I could do a brute force n^2 approach, but I'm sure theres a pandas way for this that im missing..
Thank you! :)
EDIT: It is not a duplicate of Calculate time difference between Pandas Dataframe indices just to clarify. I need to subset a dataframe based on a given interval
1, 2, 3, 4, 12, 20, 27you don't know to keep 12 until you've dropped 2 3 and 4 for being too close to 1 (if the diff is >10).forloop. It'sO(n), notO(n**2).