TL;DR an efficient algorithm based on factor decomposition of differences that relies on an O(n²) component for counting subsequences, just barely avoiding the timeout for worst-case input (full PoC available as C# fiddle)
Each pair of consecutive values in the sequence has a certain absolute difference d (non-zero because all values are distinct). This means that the pair MUST be part of a contiguous subsequence with stride d and it MAY be part of contiguous subsequences whose strides are equal to the other factors of d. It also means that candidate subsequences with other strides cannot continue across the current pair and must necessarily terminate at the first value of the pair.
Ergo you can sweep across the input sequence looking at each pair of consecutive values in order to collect candidate subsequences as follows:
- for each factor f of the absolute difference d between ai and ai+i:
- if there is no open candidate sequence with stride f, create one with the tuple (ai, i)
- append the tuple (ai+1, i+1) to the open subsequence with stride f
- for each open candidate sequence whose stride is not a factor of d:
- count the number of contiguous subsequences that form progressions
- throw the candidate sequence away
Here is the example sequence with the difference factors indicated below the gaps between the values, in separate rows per factor. This shows how the candidate sequences come into being:
5 1 3 2 4
1 1 1 1
2 2 2
4
The tricky bit is the counting of the contiguous subsequences contained in a collected candidate sequence. This sounds a lot like the overall problem description again, but at this point we are in a position to mine the raw number ore efficiently.
Here is the first of the two candidate sequences for factor/stride 2. It contains tuples (value, index), but values and indices are shown in separate rows for clarity:
5 1 3
0 1 2
If the tuples of the candidate sequence are not sorted by value already (i.e. if they are not collected with the aid of a structure that orders them by value), sort them.
1 3 5
1 2 0
In this form it becomes possible to isolate unbroken runs of values and then mine those for compliant subsequences. The example sequence does not generate candidate sequences with separate runs but some other sequence might generate a stride 2 candidate sequence like this, with a gap between 3 and 7:
1 3 7 9
1 2 0 3
Candidate sequence processing
Remember your current position as base position. Step through the tuples, tracking index minima and maxima. If the value chain is unbroken and the index extrema indicate a compact range of the same size then you have found a compliant subsequence; add 1 to your count and step forward. If the value chain is broken, remember the current position for later (as the potential start of a new unbroken subsequence) and repeat the process starting at base position + 1. When you are done with this unbroken run of values, continue with the next one whose beginning you stored earlier.
This caterpillar movement is square in nature, and a test implementation clocked in at 1.6 seconds for the worst-case input (ordered sequence 1 .. 50000 with N * (N - 1) / 2 = 1,249,975,000 subsequences for stride 1).
That is roughly 1 nanosecond per subsequence, meaning there is little speed-up potential left in this approach. Or any other approach that is based on finding/counting all compliant subsequences individually, for that matter, because this necessarily leads to the big bad O(n²).
However, all is not lost. Perhaps we have reduced the original, complex problem to a smaller one. The crucial - and so far O(n²) - operation is this:
- given a sequence of integers, count all the pairings where the difference in position plus 1 is equal to the difference in between minimal and maximal value between them (indicating that they represent a contiguous subrange)
Note that 'count' here means 'perform some sub-square algorithm that effectively computes the count without counting each pairing individually'.
As illustration, here's the candidate sequence for stride 1 that is generated by the example sequence, sorted by value and separated into unbroken value runs (of which stride 1 only ever has exactly one that covers the whole input).
1 2 3 4 5 <- value components of the tuples
1 3 2 4 0 <- index components of the tuples, input for above subproblem
I do not have a solution at the moment, but this reduced problem may be easier to solve than the original one.
I have looked at the displacement of inversions as a promising avenue of research, but I quickly shelved that (consider 4 3 2 1 0, which loses no subsequences at all). Still in the running is looking at the absolute values of gaps in the index sequence (other than ±1). Each of those gaps causes a certain number of sequences to be lost and the reach of its effect grows with its size.
Last but not least, it may be possible to curb the worst quadratic excesses by finding unbroken runs - regardless of whether descending or ascending - and treating them wholesale, like a single hop that contributes run_length * (run_length - 1) / 2 counts instead of one. This would make the worst-case input of the original algorithm the easiest input of all,, and so this little trick may just be enough to beat the time-out. ;-) Caveat: an index sequence like 0 3 1 4 2 5 ... does not contain any runs and so the trick cannot work in this case.
nbe? What's the time limit?D = 2seems to be missing the subsequence[1, 3].