I have a device that collects data. Type of values are 64 bit unsigned integers in two columns of this data (col1, col2). These values may overflow in some extreme cases and I need to handle them, but with conditions.
There are 4 columns: uptime, type, col1, col2. The conditions will be checked on uptime and type columns. The overflows will be handled on col1 and col2.
uptime is the time in seconds since the device rebooted, col1 and col2 holds values up until that time.
Example Data:
uptime type col1 col2
44 type0 980 561
104 type0 1422 902
164 type0 2304 1522
224 type1 690 623
284 type1 1603 1245
44 type1 752 698
104 type1 1304 1125
As you can see when the type change or uptime decrease, the col1 and col2 values resets too. So I only need to handle the overflow when neither the type change nor the uptime decrease.
I managed to do this with a loop by iterating through rows, as you can see below:
df[['col1', 'col2']] = df[['col1', 'col2']].fillna(0).astype(int)
of_flag_col1 = False
of_flag_col2 = False
for i, x in df.iterrows():
if (of_flag_col1 or of_flag_col2):
if of_flag_col1:
if df.loc[i, 'uptime'] > df.loc[i-1, 'uptime'] or df.loc[i, 'type'] == df.loc[i-1, 'type']:
df.loc[i, 'col1'] = 2**64 - 1 + df.loc[i, 'col1']
else:
of_flag_col1 = False
if of_flag_col2:
if df.loc[i, 'uptime'] > df.loc[i-1, 'uptime'] or df.loc[i, 'type'] == df.loc[i-1, 'type']:
df.loc[i, 'col2'] = 2**64 - 1 + df.loc[i, 'col2']
else:
of_flag_col2 = False
elif (df.loc[i, 'uptime'] > df.loc[i-1, 'uptime'] and df.loc[i, 'type'] == df.loc[i-1, 'type']):
if df.loc[i, 'col1'] < df.loc[i-1, 'col1']:
df.loc[i, 'col1'] = 2**64 - 1 + df.loc[i, 'col1']
of_flag_col1 = True
if df.loc[i, 'col2'] < df.loc[i-1, 'col2']:
df.loc[i, 'col2'] = 2**64 - 1 + df.loc[i, 'col2']
of_flag_col2 = True
Integers in Python has no limit, so I changed the data type to Python integer beforehand.
The conditions are basically:
- if the value in uptime higher than previous value AND the value in type is same with previous value:
- if value in col1 lower than previous value:
- add 2^64 to value in col1
- continue to add 2^64 until both conditions in the first if are true
- if value in col2 lower than previous value:
- add 2^64 to value in col2
- continue to add 2^64 until both conditions in the first if are true
- if value in col1 lower than previous value:
I know that updating a Pandas Dataframe by iterating isn't healthy but I couldn't manage to do it without it. I wonder if this is possible without iterating through and updating rows.
timecolumn?