Efficient algorithm to count contiguous subarrays that can form arithmetic progressions

Question

I'm working on a problem where I need to count, for each possible common difference D, the number of contiguous subarrays whose elements can be rearranged to form an arithmetic progression with common difference D.

Problem Description

Given an array arr[1..n] which is a permutation of 1..n, for each D from 1 to n, count the number of contiguous subarrays where the set of elements can be rearranged into an arithmetic progression with common difference D.

Example:

text

arr = [5, 1, 3, 2, 4]  (permutation of 1..5)

Expected results:

D = 1: 5 subarrays ([3,2], [1,3,2], [3,2,4], [1,3,2,4], [5, 1, 3, 2, 4])
D = 2: 2 subarrays ([5,1,3], [1,3], [2,4])
D = 3: 0 subarrays
D = 4: 1 subarray ([5,1])
D = 5: 0 subarrays

I've tried an O(n²) solution that checks all contiguous subarrays, but it's too slow for large n .

You could analyse the prime factor decomposition of differences between subsequent array elements. If the difference between two elements is 12, for example, then they MUST be part of some subsequence with stride 12 and they MAY be part of subsequences with strides 2, 3, 4, and 6. Basically an indexing exercise. — DarthGizka
– DarthGizka, Commented Nov 21 at 7:01
n ≤ 10⁵, and the time limit is 2 seconds. The array is a permutation of 1..n, so all values are distinct and consecutive integers. — YAR
– YAR, Commented Nov 21 at 13:42
The expected result for D = 2 seems to be missing the subsequence [1, 3]. — DarthGizka
– DarthGizka, Commented Nov 22 at 18:36

DarthGizka · Accepted Answer · 2025-11-29 09:41:51Z

TL;DR an efficient algorithm based on factor decomposition of differences that relies on an O(n²) component for counting subsequences, just barely avoiding the timeout for worst-case input (full PoC available as C# fiddle)

Each pair of consecutive values in the sequence has a certain absolute difference d (non-zero because all values are distinct). This means that the pair MUST be part of a contiguous subsequence with stride d and it MAY be part of contiguous subsequences whose strides are equal to the other factors of d. It also means that candidate subsequences with other strides cannot continue across the current pair and must necessarily terminate at the first value of the pair.

Ergo you can sweep across the input sequence looking at each pair of consecutive values in order to collect candidate subsequences as follows:

for each factor f of the absolute difference d between a_i and a_i+i:
- if there is no open candidate sequence with stride f, create one with the tuple (a_i, i)
- append the tuple (a_i+1, i+1) to the open subsequence with stride f
for each open candidate sequence whose stride is not a factor of d:
- count the number of contiguous subsequences that form progressions
- throw the candidate sequence away

Here is the example sequence with the difference factors indicated below the gaps between the values, in separate rows per factor. This shows how the candidate sequences come into being:

The tricky bit is the counting of the contiguous subsequences contained in a collected candidate sequence. This sounds a lot like the overall problem description again, but at this point we are in a position to mine the raw number ore efficiently.

Here is the first of the two candidate sequences for factor/stride 2. It contains tuples (value, index), but values and indices are shown in separate rows for clarity:

5 1 3
0 1 2

If the tuples of the candidate sequence are not sorted by value already (i.e. if they are not collected with the aid of a structure that orders them by value), sort them.

1 3 5
1 2 0

In this form it becomes possible to isolate unbroken runs of values and then mine those for compliant subsequences. The example sequence does not generate candidate sequences with separate runs but some other sequence might generate a stride 2 candidate sequence like this, with a gap between 3 and 7:

1 3 7 9
1 2 0 3

Candidate sequence processing

Remember your current position as base position. Step through the tuples, tracking index minima and maxima. If the value chain is unbroken and the index extrema indicate a compact range of the same size then you have found a compliant subsequence; add 1 to your count and step forward. If the value chain is broken, remember the current position for later (as the potential start of a new unbroken subsequence) and repeat the process starting at base position + 1. When you are done with this unbroken run of values, continue with the next one whose beginning you stored earlier.

This caterpillar movement is square in nature, and a test implementation clocked in at 1.6 seconds for the worst-case input (ordered sequence 1 .. 50000 with N * (N - 1) / 2 = 1,249,975,000 subsequences for stride 1).

That is roughly 1 nanosecond per subsequence, meaning there is little speed-up potential left in this approach. Or any other approach that is based on finding/counting all compliant subsequences individually, for that matter, because this necessarily leads to the big bad O(n²).

However, all is not lost. Perhaps we have reduced the original, complex problem to a smaller one. The crucial - and so far O(n²) - operation is this:

given a sequence of integers, count all the pairings where the difference in position plus 1 is equal to the difference in between minimal and maximal value between them (indicating that they represent a contiguous subrange)

Note that 'count' here means 'perform some sub-square algorithm that effectively computes the count without counting each pairing individually'.

As illustration, here's the candidate sequence for stride 1 that is generated by the example sequence, sorted by value and separated into unbroken value runs (of which stride 1 only ever has exactly one that covers the whole input).

1 2 3 4 5 <- value components of the tuples
1 3 2 4 0 <- index components of the tuples, input for above subproblem

I do not have a solution at the moment, but this reduced problem may be easier to solve than the original one.

I have looked at the displacement of inversions as a promising avenue of research, but I quickly shelved that (consider 4 3 2 1 0, which loses no subsequences at all). Still in the running is looking at the absolute values of gaps in the index sequence (other than ±1). Each of those gaps causes a certain number of sequences to be lost and the reach of its effect grows with its size.

Last but not least, it may be possible to curb the worst quadratic excesses by finding unbroken runs - regardless of whether descending or ascending - and treating them wholesale, like a single hop that contributes run_length * (run_length - 1) / 2 counts instead of one. This would make the worst-case input of the original algorithm the easiest input of all,, and so this little trick may just be enough to beat the time-out. ;-) Caveat: an index sequence like 0 3 1 4 2 5 ... does not contain any runs and so the trick cannot work in this case.

Collectives™ on Stack Overflow

Efficient algorithm to count contiguous subarrays that can form arithmetic progressions

1 Answer 1

Candidate sequence processing

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Candidate sequence processing

Comments

Your Answer

Sign up or log in

Post as a guest

Related