3

Is there a datastructure that will maintain a unique set of ranges, merging an contiguous or overlapping ranges that are added? I need to track which ranges have been processed, but this may occur in an arbitrary order. E.g.:

range_set = RangeSet() # doesn't exist that I know of, this is what I need help with

def process_data(start, end):
    global range_set
    range_set.add_range(start, end)
    # ...

process_data(0, 10)
process_data(20, 30)
process_data(5, 15)
process_data(50, 60)

print(range_set.missing_ranges())
# [[16,19], [31, 49]]

print(range_set.ranges())
# [[0,15], [20,30], [50, 60]]

Notice that overlapping or contiguous ranges get merged together. What is the best way to do this? I looked at using the bisect module, but its use didn't seem terribly clear.

3

7 Answers 7

1

Another approach is based on sympy.sets.

>>> import sympy as sym
>>> a = sym.Interval(1, 2, left_open=False, right_open=False)
>>> b = sym.Interval(3, 4, left_open=False, right_open=False)
>>> domain = sym.Interval(0, 10, left_open=False, right_open=False)
>>> missing = domain - a - b
>>> missing
[0, 1) U (2, 3) U (4, 10]
>>> 2 in missing
False
>>> missing.complement(domain)
[1, 2] U [3, 4]
Sign up to request clarification or add additional context in comments.

Comments

0

You could get some similar functionality with pythons built-in set data structure; supposing only integer values are valid for start and end.

>>> whole_domain = set(range(12))
>>> A = set(range(0,1))
>>> B = set(range(4,9))
>>> C = set(range(3,6))  # processed range(3,5) twice
>>> done = A | B | C
>>> print done
set([0, 3, 4, 5, 6, 7, 8])
>>> missing = whole_domain - done
>>> print missing
set([1, 2, 9, 10, 11])

This still lacks many 'range'-features but might be sufficient.

A simple query if a certain range was already processed could look like this:

>>> isprocessed = [foo in done for foo in set(range(2,6))]
>>> print isprocessed
[False, True, True, True]

1 Comment

This uses more memory than it should. Larger ranges will require more memory than smaller ranges. Two range sets containing two different ranges of different sizes should use the same amount of memory.
0

I've only lightly tested it, but it sounds like you're looking for something like this. You'll need to add the methods to get the ranges and missing ranges yourself, but it should be very straighforward as RangeSet.ranges is a list of Range objects maintained in sorted order. For a more pleasant interface you could write a convenience method that converted it to a list of 2-tuples, for example.

EDIT: I've just modified it to use less-than-or-equal comparisons for merging. Note, however, that this won't merge "adjacent" entries (e.g. it won't merge (1, 5) and (6, 10)). To do this you'd need to simply modify the condition in Range.check_merge().

import bisect


class Range(object):

    # Reduces memory usage, overkill unless you're using a lot of these.
    __slots__ = ["start", "end"]

    def __init__(self, start, end):
        """Initialise this range."""
        self.start = start
        self.end = end

    def __cmp__(self, other):
        """Sort ranges by their initial item."""
        return cmp(self.start, other.start)

    def check_merge(self, other):
        """Merge in specified range and return True iff it overlaps."""
        if other.start <= self.end and other.end >= self.start:
            self.start = min(other.start, self.start)
            self.end = max(other.end, self.end)
            return True
        return False


class RangeSet(object):

    def __init__(self):
        self.ranges = []

    def add_range(self, start, end):
        """Merge or insert the specified range as appropriate."""
        new_range = Range(start, end)
        offset = bisect.bisect_left(self.ranges, new_range)
        # Check if we can merge backwards.
        if offset > 0 and self.ranges[offset - 1].check_merge(new_range):
            new_range = self.ranges[offset - 1]
            offset -= 1
        else:
            self.ranges.insert(offset, new_range)
        # Scan for forward merges.
        check_offset = offset + 1
        while (check_offset < len(self.ranges) and
                new_range.check_merge(self.ranges[offset+1])):
            check_offset += 1
        # Remove any entries that we've just merged.
        if check_offset - offset > 1:
            self.ranges[offset+1:check_offset] = []

Comments

0

You have hit on a good solution in your example use case. Rather than try to maintain a set of the ranges that have been used, keep track of the ranges that haven't been used. This makes the problem pretty easy.

class RangeSet:
    def __init__(self, min, max):
        self.__gaps = [(min, max)]
        self.min = min
        self.max = max

    def add(self, lo, hi):
        new_gaps = []
        for g in self.__gaps:
            for ng in (g[0],min(g[1],lo)),(max(g[0],hi),g[1]):
                if ng[1] > ng[0]: new_gaps.append(ng)
        self.__gaps = new_gaps

    def missing_ranges(self):
        return self.__gaps

    def ranges(self):
        i = iter([self.min] + [x for y in self.__gaps for x in y] + [self.max])
        return [(x,y) for x,y in zip(i,i) if y > x]

The magic is in the add method, which checks each existing gap to see whether it is affected by the new range, and adjusts the list of gaps accordingly.

Note that the behaviour of the tuples used for ranges here is the same as Python's range objects, i.e. they are inclusive of the start value and exclusive of the stop value. This class will not behave in exactly the way you described in your question, where your ranges seem to be inclusive of both.

Comments

0

Have a look at portion (https://pypi.org/project/portion/). I'm the maintainer of this library, and it supports disjuction of continuous intervals out of the box. It automatically simplifies adjacent and overlapping intervals.

Consider the intervals provided in your example:

>>> import portion as P
>>> i = P.closed(0, 10) | P.closed(20, 30) | P.closed(5, 15) | P.closed(50, 60)

>>> # get "used ranges"
>>> i
[0,15] | [20,30] | [50,60]

>>> # get "missing ranges"
>>> i.enclosure - i
(15,20) | (30,50)

Comments

0

Similar to DavidT's answer – also based on sympy's sets, but using a list of any length and addition (union) in a single operation:

import sympy

intervals = [[1,4], [6,10], [3,5], [7,8]] # pairs of left,right
print(intervals)

symintervals = [sympy.Interval(i[0],i[1], left_open=False, right_open=False) for i in intervals]
print(symintervals)

merged = sympy.Union(*symintervals) # one operation; adding to an union one by one is much slower for a large number of intervals
print(merged)

for i in merged.args: # assumes that the "merged" result is an union, not a single interval
    print(i.left, i.right) # getting bounds of merged intervals

Comments

0

Here's my solution:

def flatten(collection):
subset = set()
for elem in collection:
    to_add = elem
    to_remove = set()
    for s in subset:
        if s[0] <= to_add[0] <= s[1] or s[0] <= to_add[1] <= s[1] or (s[0] > to_add[0] and s[1] < to_add[1]):
            to_remove.add(s)
            to_add = (min(to_add[0], s[0]), max(to_add[1], s[1]))
    subset -= to_remove
    subset.add(to_add)
return subset


range_set = {(-12, 4), (3, 20), (21, 25), (25, 30), (-13, -11), (5, 10), (-13, 20)}
print(flatten(range_set))
# {(21, 30), (-13, 20)}

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.