I'm dealing with bunch of .asc(ascii) files that are the output of continous monitoring of various electronic equipments for certification purposes. We monitor various parameters of the equipments like voltage, current , temperature at different states(modes) of operation [sleep mode, minimal load , maximum load etc.]. The tests run for an average of 600-700 hours and the data is recorded every 2 seconds. At the end i have datasets of 100s of MBs which i want to reduce in size. On an average there are about million datapoints generated which are good but not necessarily important. It doesn't make sense for me to have For e.g 5 hours worth of same voltage reading that are well inside our tolerance levels( 9000 data points of same value).
What is crucial for me is that my program monitors the incoming stream of data, look for errors(tolerance breaches due to device error), and if for certain amount of time no error occurs( for e.g 10 mins post startup) the data should be bunched up into smaller set of datapoints ( reduction by a factor of 2 or 5 or similar) and continue this process until an error occurs at which point it would record the point of error as well 10 subsequent datapoints as is before switching back to the compression method.
What approaches could i take here to reduce this bunch of data into smaller size so that during analysis we end up with sensible data ( that is representative of the success of the tests ) but is also significantly smaller than what we get right now? Would averaging the data be a good option here? In the case averaging against data counts or time would be appropriate? I was also told to think about filtering ( kalman or moving average ) but i am not sure they would serve my purpose since i'm not looking to eliminate any wild data but rather reduce 100 numbers in similar range into 10 numbers.
Thank you in advance and as a first time poster, i'm open to any kind of suggestion regarding further research into this topic or posting style too. The problem is a bit critical and hence i'd be grateful for any and every suggestions pertaining to it.