python - unique set of ranges, merging when needed

Question

Is there a datastructure that will maintain a unique set of ranges, merging an contiguous or overlapping ranges that are added? I need to track which ranges have been processed, but this may occur in an arbitrary order. E.g.:

range_set = RangeSet() # doesn't exist that I know of, this is what I need help with

def process_data(start, end):
    global range_set
    range_set.add_range(start, end)
    # ...

process_data(0, 10)
process_data(20, 30)
process_data(5, 15)
process_data(50, 60)

print(range_set.missing_ranges())
# [[16,19], [31, 49]]

print(range_set.ranges())
# [[0,15], [20,30], [50, 60]]

Notice that overlapping or contiguous ranges get merged together. What is the best way to do this? I looked at using the bisect module, but its use didn't seem terribly clear.

A data structure named Interval Tree can help you with this - it allows range-based queries and insertion. There are multiple packages as far as I know, e.g. pypi.python.org/pypi/intervaltree/2.0.4 — Maciej Gol
– Maciej Gol, Commented Jul 21, 2015 at 13:26
If the Interval Tree is too complex, writing a class probably wouldn't be too difficult. — Connor S
– Connor S, Commented Jul 21, 2015 at 13:29
You could use SortedSet with a special OverlappingIntervalsUpdator for some algorithms e.g., Python - Removing overlapping lists. Somewhat related: to merge discrete sets, you could use "disjoint set" (aka UnionFind structure) or connected components (in a graph). — jfs
– jfs, Commented Jul 21, 2015 at 14:59

DavidT · Accepted Answer · 2015-08-26 13:25:32Z

1

Another approach is based on sympy.sets.

>>> import sympy as sym
>>> a = sym.Interval(1, 2, left_open=False, right_open=False)
>>> b = sym.Interval(3, 4, left_open=False, right_open=False)
>>> domain = sym.Interval(0, 10, left_open=False, right_open=False)
>>> missing = domain - a - b
>>> missing
[0, 1) U (2, 3) U (4, 10]
>>> 2 in missing
False
>>> missing.complement(domain)
[1, 2] U [3, 4]

edited Aug 26, 2015 at 13:25

answered Aug 26, 2015 at 12:57

DavidT

7910 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

DavidT · Accepted Answer · 2015-07-21 14:06:28Z

0

You could get some similar functionality with pythons built-in set data structure; supposing only integer values are valid for start and end.

>>> whole_domain = set(range(12))
>>> A = set(range(0,1))
>>> B = set(range(4,9))
>>> C = set(range(3,6))  # processed range(3,5) twice
>>> done = A | B | C
>>> print done
set([0, 3, 4, 5, 6, 7, 8])
>>> missing = whole_domain - done
>>> print missing
set([1, 2, 9, 10, 11])

This still lacks many 'range'-features but might be sufficient.

A simple query if a certain range was already processed could look like this:

>>> isprocessed = [foo in done for foo in set(range(2,6))]
>>> print isprocessed
[False, True, True, True]

edited Jul 21, 2015 at 14:06

answered Jul 21, 2015 at 13:58

DavidT

7910 bronze badges

1 Comment

Noctis Skytower Over a year ago

This uses more memory than it should. Larger ranges will require more memory than smaller ranges. Two range sets containing two different ranges of different sizes should use the same amount of memory.

Cartroo · Accepted Answer · 2015-07-21 14:40:13Z

I've only lightly tested it, but it sounds like you're looking for something like this. You'll need to add the methods to get the ranges and missing ranges yourself, but it should be very straighforward as RangeSet.ranges is a list of Range objects maintained in sorted order. For a more pleasant interface you could write a convenience method that converted it to a list of 2-tuples, for example.

EDIT: I've just modified it to use less-than-or-equal comparisons for merging. Note, however, that this won't merge "adjacent" entries (e.g. it won't merge (1, 5) and (6, 10)). To do this you'd need to simply modify the condition in Range.check_merge().

import bisect


class Range(object):

    # Reduces memory usage, overkill unless you're using a lot of these.
    __slots__ = ["start", "end"]

    def __init__(self, start, end):
        """Initialise this range."""
        self.start = start
        self.end = end

    def __cmp__(self, other):
        """Sort ranges by their initial item."""
        return cmp(self.start, other.start)

    def check_merge(self, other):
        """Merge in specified range and return True iff it overlaps."""
        if other.start <= self.end and other.end >= self.start:
            self.start = min(other.start, self.start)
            self.end = max(other.end, self.end)
            return True
        return False


class RangeSet(object):

    def __init__(self):
        self.ranges = []

    def add_range(self, start, end):
        """Merge or insert the specified range as appropriate."""
        new_range = Range(start, end)
        offset = bisect.bisect_left(self.ranges, new_range)
        # Check if we can merge backwards.
        if offset > 0 and self.ranges[offset - 1].check_merge(new_range):
            new_range = self.ranges[offset - 1]
            offset -= 1
        else:
            self.ranges.insert(offset, new_range)
        # Scan for forward merges.
        check_offset = offset + 1
        while (check_offset < len(self.ranges) and
                new_range.check_merge(self.ranges[offset+1])):
            check_offset += 1
        # Remove any entries that we've just merged.
        if check_offset - offset > 1:
            self.ranges[offset+1:check_offset] = []

Sam · Accepted Answer · 2015-07-21 15:10:55Z

You have hit on a good solution in your example use case. Rather than try to maintain a set of the ranges that have been used, keep track of the ranges that haven't been used. This makes the problem pretty easy.

class RangeSet:
    def __init__(self, min, max):
        self.__gaps = [(min, max)]
        self.min = min
        self.max = max

    def add(self, lo, hi):
        new_gaps = []
        for g in self.__gaps:
            for ng in (g[0],min(g[1],lo)),(max(g[0],hi),g[1]):
                if ng[1] > ng[0]: new_gaps.append(ng)
        self.__gaps = new_gaps

    def missing_ranges(self):
        return self.__gaps

    def ranges(self):
        i = iter([self.min] + [x for y in self.__gaps for x in y] + [self.max])
        return [(x,y) for x,y in zip(i,i) if y > x]

The magic is in the add method, which checks each existing gap to see whether it is affected by the new range, and adjusts the list of gaps accordingly.

Note that the behaviour of the tuples used for ranges here is the same as Python's range objects, i.e. they are inclusive of the start value and exclusive of the stop value. This class will not behave in exactly the way you described in your question, where your ranges seem to be inclusive of both.

Guybrush · Accepted Answer · 2020-03-15 09:29:58Z

0

Have a look at portion (https://pypi.org/project/portion/). I'm the maintainer of this library, and it supports disjuction of continuous intervals out of the box. It automatically simplifies adjacent and overlapping intervals.

Consider the intervals provided in your example:

>>> import portion as P
>>> i = P.closed(0, 10) | P.closed(20, 30) | P.closed(5, 15) | P.closed(50, 60)

>>> # get "used ranges"
>>> i
[0,15] | [20,30] | [50,60]

>>> # get "missing ranges"
>>> i.enclosure - i
(15,20) | (30,50)

answered Mar 15, 2020 at 9:29

Guybrush

2,8421 gold badge13 silver badges20 bronze badges

Comments

dolphin · Accepted Answer · 2020-06-06 12:05:46Z

0

Similar to DavidT's answer – also based on sympy's sets, but using a list of any length and addition (union) in a single operation:

import sympy

intervals = [[1,4], [6,10], [3,5], [7,8]] # pairs of left,right
print(intervals)

symintervals = [sympy.Interval(i[0],i[1], left_open=False, right_open=False) for i in intervals]
print(symintervals)

merged = sympy.Union(*symintervals) # one operation; adding to an union one by one is much slower for a large number of intervals
print(merged)

for i in merged.args: # assumes that the "merged" result is an union, not a single interval
    print(i.left, i.right) # getting bounds of merged intervals

answered Jun 6, 2020 at 12:05

dolphin

1,26313 silver badges28 bronze badges

Comments

user13024860 · Accepted Answer · 2022-12-15 17:21:08Z

0

Here's my solution:

def flatten(collection):
subset = set()
for elem in collection:
    to_add = elem
    to_remove = set()
    for s in subset:
        if s[0] <= to_add[0] <= s[1] or s[0] <= to_add[1] <= s[1] or (s[0] > to_add[0] and s[1] < to_add[1]):
            to_remove.add(s)
            to_add = (min(to_add[0], s[0]), max(to_add[1], s[1]))
    subset -= to_remove
    subset.add(to_add)
return subset


range_set = {(-12, 4), (3, 20), (21, 25), (25, 30), (-13, -11), (5, 10), (-13, 20)}
print(flatten(range_set))
# {(21, 30), (-13, 20)}

answered Dec 15, 2022 at 17:21

user13024860

1

1 Comment

NameVergessen Over a year ago

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

python - unique set of ranges, merging when needed

7 Answers 7

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related