You can do this by using diff. diff returns the difference between pairs of values in your array. This also computes the discrete approximation to the derivative, such that for an index i in your array x, and x_i accesses the ith element of your array, the output at index i for the output array y is thus:
y_i = x_{i+1} - x_i
Note that diff will return a M - 1 element vector given that your vector is of size M. As such, we will pad the last element with inf so that we can capture the last value of the array.
As soon as diff returns a non-zero difference, this is when a change happens and so this is the location that you need to keep track of that results in the very last time a number in date2 remained the same before transitioning.
As such, this is the algorithm I would do:
- Find where a change occurs in
date2
- Use these indices to access the fields in your structure.
- Create a new structure that subsets each of the fields' arrays using the output in Step #1
Without further ado:
%// Set up example data
s.id = [1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2];
s.date1 = [1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4];
s.date2 = [1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2];
s.value = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16];
%// Find where the transitions are
ind = find(diff([s.date2 inf]));
%// Subset each of the other arrays in the structure and
%// assign to output
sOut.id = s.id(ind);
sOut.date1 = s.date1(ind);
sOut.date2 = s.date2(ind);
sOut.value = s.value(ind);
sOut will contain the output structure. Given your example, the output is:
sOut =
id: [1 1 2 2]
date1: [4 4 4 4]
date2: [1 2 1 2]
value: [4 8 12 16]
Bonus - Time Measurement
I decided to time how long this took using randomly generated integers for id, date1, date2 and value of 100000 elements each using a seed of 123. I'm running MATLAB R2014a (v. 8.3.0.532) on an Intel i7-4770 @ 3.40 GHz with 8 GB of RAM. I used timeit to time how long this took. Here's the code I used:
function [] = measureTime()
format long;
T = timeit(@testing5);
disp(['Time elapsed: ' num2str(T)]);
end
function [] = testing5()
rng(123); %//Set for reproducibility
s.id = randi(100, 1, 100000);
s.date1 = randi(100, 1, 100000);
s.date2 = randi(100, 1, 100000);
s.value = randi(100, 1, 100000);
ind = find(diff([s.date2 inf]));
sOut.id = s.id(ind);
sOut.date1 = s.date1(ind);
sOut.date2 = s.date2(ind);
sOut.value = s.value(ind);
end
All you have to do is copy this code to a file, call it measureTime.m and run this in the MATLAB command prompt. As such, the time elapsed was:
>> measureTime
Time elapsed: 0.0086272
... so roughly 8.6 ms... pretty good!