0

Say I want to group together ints which are separated by less than a certain threshold. My concrete use case is identifying the largest chunks of uncovered code in test coverage results, e.g.:

groupruns('53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158', 3)
#=> [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]]

3 Answers 3

1

You could use Raymond Hettinger's cluster function:

def cluster(data, maxgap, key=None):
    """Arrange data into groups where successive elements
       differ by no more than *maxgap*

        >>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
        [[1, 6, 9], [100, 102, 105, 109], [134, 139]]

        >>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
        [[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]

    http://stackoverflow.com/a/14783998/190597 (Raymond Hettinger)
    """
    data.sort()
    groups = [[data[0]]]
    for item in data[1:]:
        if key:
            val = key(item, groups[-1])
        else:
            val = abs(item - groups[-1][-1])
        if val <= maxgap:
            groups[-1].append(item)
        else:
            groups.append([item])
    return groups

data = [53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158]
print(cluster(data, maxgap=3))

yields

[[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]]
Sign up to request clarification or add additional context in comments.

Comments

0

itertools.groupby can help us here. The only difficulty is that groupby groups by a "key" which needs to be calculated for each item, such that each group is made of consecutive items with the same key. This means that our keyfunc object needs to save state to perform this task:

class runner(object):
     def __init__(self, threshold=1):
         self.threshold = threshold
         self.last = None
         self.key = None
     def __call__(self,item):
         if self.last is None:
             self.last = item
             self.key = item
             return item
         if item - self.last <= self.threshold:
             self.last = item
             return self.key
         else:
             self.last = item
             self.key = item
             return item

The basic idea is that if we're within the threshold of the last item, we return the current key, which is the first item of the current run.

Let's see that in action:

[list(g) for k,g in itertools.groupby((int(s) for s in '53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311, 317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787, 792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148, 1158'.split(',')), runner(3))]
#=> [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323, 325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870], [875], [884], [947], [993], [1134], [1139], [1148], [1158]]

Comments

0

How about this without using any module:

NB: assuming they are already sorted.

#!/usr/bin/python

group = (53, 55, 57, 59, 83, 200, 205, 211, 217, 219, 306, 311,
         317, 323, 325, 367, 631, 636, 645, 658, 686, 692, 787,
         792, 801, 870, 875, 884, 947, 993, 1134, 1139, 1148,
         1158)

def group_runs(group, step):
    mark = [0]
    diff = map(lambda x: (x[1] - x[0]), zip(group[:],group[1:]))
    [mark.append(i+1) for i,j in enumerate(diff) if j > step]
    return [list(group[x[0]:x[1]]) for x in zip(mark[::], mark[1::])]

print group_runs(group, 3)

Output:

 [[53, 55, 57, 59], [83], [200], [205], [211], [217, 219], [306], [311], [317], [323,
   325], [367], [631], [636], [645], [658], [686], [692], [787], [792], [801], [870],
  [875], [884], [947], [993], [1134], [1139], [1148]]

1 Comment

This is not easily read. Readability counts. You might like to simplify AND explain this code. Also side effects in list comprehensions are discouraged unless there's a specific reason for it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.