How to intersect two arrays of ranges?

Question

Let a range be an array of two integers: the start and the end (e.g. [40, 42]).

Having two arrays of ranges (which are sorted), I want to find the optimal way to calculate their intersection (which will result into another array of ranges):

A = [[1, 3], [7, 9], [12, 18]]
B = [[2, 3], [4,5], [6,8], [13, 14], [16, 17]]

Intersection:

[[2, 3], [7, 8], [13, 14], [16, 17]]

What is the optimal algorithm for this?

The naive way would be to check each one with all the other ones, but that's obviously not optimal.

I found a similar question asking for the same thing in VBA: Intersection of two arrays of ranges

are all the ranges sorted and distinct? Could the array 1 be [[4, 6], [1,2]]? or [[1, 5], [4,6]]? — Liora Haydont
– Liora Haydont, Commented Mar 23, 2018 at 16:06
Assuming arrays are sorted, this seems fairly trivial. Am I missing something? — Amit
– Amit, Commented Mar 23, 2018 at 16:07
@PraveenKumar: the input arrays are just that: input, a given to the program. The intersection is also just that: the intersection of the sets of ranges, values that appear in both A and B's ranges. — Prune
– Prune, Commented Mar 23, 2018 at 16:07
@LioraHaydont No that can't be. They are normalized already. First would be [[1, 2], [4, 6]] and the second would be just [[1, 6]]. — Ionică Bizău
– Ionică Bizău, Commented Mar 23, 2018 at 16:08

Ted Hopp · Accepted Answer · 2018-03-23 20:02:09Z

Since the input arrays are sorted, this should be fairly straightforward to work out. I'm assuming that the ranges in any one input array don't intersect one another (otherwise, "which are sorted" would be ambiguous). Consider one range from each array (defined by "current range" indexes a and b). There are several cases (each case other than "full overlap" has a mirror image where A and B are reversed):

No intersection:

A[a]: |------|
B[b]:          |---|

Because the arrays are sorted, A[a] cannot intersect anything in B, so it can be skipped (increment a).

Partial overlap (B[b] extends beyond A[a]):

A[a]: |-------|
B[b]:      |-------|

In this case, add the intersection to the output and then increment a because A[a] cannot intersect anything else in B.

Containment (possibly with coinciding ends):

A[a]: |------|
B[b]:   |--|

Again add the intersection to the output and this time increment b. Note that a further slight optimization is that if A[a] and B[b] end at the same value, then you can increment b as well, since B[b] also cannot intersect anything else in A. (The case of coinciding ends could have been lumped into the partial overlap case. This case could then have been called "strict containment".)

Full overlap:

A[a]: |------|
B[b]: |------|

Add the intersection to the output and increment both a and b (neither range can intersect anything else in the other array).

Continue iterating the above until either a or b runs off the end of the corresponding array and you're done.

It should be ~~trivial~~ straightforward to translate the above to code.

EDIT: To back up that last sentence (okay, it wasn't trivial), here's my version of the above in code. It's a little tedious because of all the cases, but each branch is quite straightforward.

const A = [[1, 3], [7, 9], [12, 18]];
const B = [[2, 3], [4, 5], [6, 8], [13, 14], [16, 17]];

const merged = [];

var i_a = 0,
    i_b = 0;

while (i_a < A.length && i_b < B.length) {
  const a = A[i_a];
  const b = B[i_b];

  if (a[0] < b[0]) {
    // a leads b
    if (a[1] >= b[1]) {
      // b contained in a
      merged.push([b[0], b[1]]);
      i_b++;
      if (a[1] === b[1]) {
        // a and b end together
        i_a++;
      }
    } else if (a[1] >= b[0]) {
      // overlap
      merged.push([b[0], a[1]]);
      i_a++;
    } else {
      // no overlap
      i_a++;
    }
  } else if (a[0] === b[0]) {
    // a and b start together
    if (a[1] > b[1]) {
      // b contained in a
      merged.push([a[0], b[1]]);
      i_b++;
    } else if (a[1] === b[1]) {
      // full overlap
      merged.push([a[0], a[1]]);
      i_a++;
      i_b++;
    } else /* a[1] < b[1] */ {
      // a contained in b
      merged.push([a[0], a[1]]);
      i_a++;
    }
  } else /* a[0] > b[0] */ {
    // b leads a
    if (b[1] >= a[1]) {
      // containment: a in b
      merged.push([a[0], b[1]]);
      i_a++;
      if (b[1] === a[1]) {
        // a and b end together
        i_b++;
      }
    } else if (b[1] >= a[0]) {
      // overlap
      merged.push([a[0], b[1]]);
      i_b++
    } else {
      // no overlap
      i_b++;
    }
  }
}
console.log(JSON.stringify(merged));

You asked for an optimal algorithm. I believe mine is very close to optimal. It runs in linear time with the number of ranges in the two arrays, since each iteration completes the processing of at least one range (and sometimes two). It requires constant memory plus the memory required to build the result.

I should note that unlike the answer by CertainPerformance (the only other answer posted here at the time I'm writing this) my code works for any kind of numeric range data, not just integers. (You might want to replace === with == in the above if you're mixing numbers and string representations of numbers). The algorithm by CertainPerformance flattens the ranges into arrays of consecutive integers that span the ranges. If that total number of integers is n, then his algorithm runs in O(n²) time and O(n) space. (So, for instance, if one of the ranges were [1, 50000], that would require memory for 50,000 numbers and time proportional to the square of that.)

Salman Arshad · Accepted Answer · 2018-03-24 14:05:44Z

The idea suggested by @Ted Hopp could be implemented in fewer lines of code as follows:

var A = [[1, 3], [7, 9], [12, 18]];
var B = [[2, 3], [4, 5], [6, 8], [13, 14], [16, 17]];

var result = [];
var ai = 0, alength = A.length, ax, ay;
var bi = 0, blength = B.length, bx, by;
while (ai < alength && bi < blength) {
  ax = A[ai][0];
  ay = A[ai][1];
  bx = B[bi][0];
  by = B[bi][1];
  if (ay < bx) {
    // a ends before b
    ai++;
  } else if (by < ax) {
    // b ends before a
    bi++;
  } else {
    // a overlaps b
    result.push([ax > bx ? ax : bx, ay < by ? ay : by]);
    // the smaller range is considered processed
    if (ay < by) {
      ai++;
    } else {
      bi++;
    }
  }
}
console.log(result);

Below is a comprehensive test with large arrays:

var A = [];
var B = [];
var R = [];
(function(rangeArray1, rangeArray2, bruteForceResult) {
  // create random, non-overlapping, sorted ranges
  var i, n, x, y;
  for (i = 0, n = 0; i < 1000; i++) {
    x = n += Math.floor(Math.random() * 100) + 1;
    y = n += Math.floor(Math.random() * 100);
    rangeArray1.push([x, y]);
  }
  for (i = 0, n = 0; i < 1000; i++) {
    x = n += Math.floor(Math.random() * 100) + 1;
    y = n += Math.floor(Math.random() * 100);
    rangeArray2.push([x, y]);
  }
  // calculate intersections using brute force
  rangeArray1.forEach(function(a) {
    rangeArray2.forEach(function(b) {
      if (b[1] >= a[0] && a[1] >= b[0]) {
        bruteForceResult.push([Math.max(a[0], b[0]), Math.min(a[1], b[1])]);
      }
    });
  });
})(A, B, R);

var result = [];
var ai = 0, alength = A.length, ax, ay;
var bi = 0, blength = B.length, bx, by;
while (ai < alength && bi < blength) {
  ax = A[ai][0];
  ay = A[ai][1];
  bx = B[bi][0];
  by = B[bi][1];
  if (ay < bx) {
    // a ends before b
    ai++;
  } else if (by < ax) {
    // b ends before a
    bi++;
  } else {
    // a overlaps b
    result.push([ax > bx ? ax : bx, ay < by ? ay : by]);
    // the smaller range is considered processed
    if (ay < by) {
      ai++;
    } else {
      bi++;
    }
  }
}
console.log(JSON.stringify(R) === JSON.stringify(result) ? "test passed" : "test failed");

I like this solution. It shows what a little creativity can do to simplify a bookkeeping problem.

CertainPerformance · Accepted Answer · 2018-03-23 16:42:07Z

2

Pretty straightforward, just a decent amount of code to write. Flatten a and b into individual elements instead of ranges, find their intersection, and turn it back into an array of ranges again.

const a = [[1, 3], [7, 9], [12, 18]];
const b = [[2, 3], [4,5], [6,8], [13, 14], [16, 17]];
const rangeToArr = ([start, end]) => Array.from({ length: end - start + 1 }, (_, i) => start + i);
const flat = inputArr => inputArr.reduce((arr, elm) => arr.concat(...elm), []);
const aRange = flat(a.map(rangeToArr));
const bRange = flat(b.map(rangeToArr));
const intersection = aRange.filter(num => bRange.includes(num));
console.log(intersection);


// Have the intersection of elements
// now we have to turn the intersection back into an array of ranges again:
const { partialIntersectionRange, thisRangeStarted, lastNum }
= intersection.reduce(({ partialIntersectionRange, thisRangeStarted, lastNum }, num) => {
  // Initial iteration only: populate with initial values
  if (typeof thisRangeStarted !== 'number') {
    return { partialIntersectionRange, thisRangeStarted: num, lastNum: num };
  }
  // If this element is a continuation of the range from the last element
  // then just increment lastNum:
  if (lastNum + 1 === num) {
    return { partialIntersectionRange, thisRangeStarted, lastNum: num };
  }
  // This element is not a continuation of the previous range
  // so make a range out of [thisRangeStarted, lastNum] and push it to the range array
  // (in case thisRangeStarted === lastNum, only push a single value)
  if (thisRangeStarted !== lastNum) partialIntersectionRange.push([thisRangeStarted, lastNum]);
  else partialIntersectionRange.push([thisRangeStarted]);
  return { partialIntersectionRange, thisRangeStarted: num, lastNum: num };
}, { partialIntersectionRange: [] });
if (thisRangeStarted !== lastNum) partialIntersectionRange.push([thisRangeStarted, lastNum]);
else partialIntersectionRange.push([thisRangeStarted]);

console.log(JSON.stringify(partialIntersectionRange));

The difficulty isn't the intersection logic, but getting it formatted the desired way.

edited Mar 23, 2018 at 16:42

answered Mar 23, 2018 at 16:31

CertainPerformance

373k55 gold badges354 silver badges359 bronze badges

2 Comments

Liora Haydont Over a year ago

Could you add somme comments to say what your code is doing? It's hard to read a block like this

Ted Hopp Over a year ago

This is going to be pretty inefficient if the ranges contain huge numbers (e.g., [5000030, 1470000250]). Also, because it relies on includes, this seems like an O(n^2) algorithm, where n is the total number of integers spanned by all the ranges, not the number of ranges.

Collectives™ on Stack Overflow

How to intersect two arrays of ranges?

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related