How to find the groups of consecutive elements in a NumPy array

Question

I have to cluster the consecutive elements from a NumPy array. Considering the following example

a = [ 0, 47, 48, 49, 50, 97, 98, 99]

The output should be a list of tuples as follows

[(0), (47, 48, 49, 50), (97, 98, 99)]

Here the difference is just one between the elements. It will be great if the difference can also be specified as a limit or a hardcoded number.

I found this answer having EXACTELY the same problem... Small world! :o) — heltonbiker
– heltonbiker, Commented Aug 9, 2012 at 19:07
Possible duplicate of Identify groups of continuous numbers in a list — Sede
– Sede, Commented Jun 29, 2016 at 12:14

unutbu · Accepted Answer · 2015-03-08 23:02:46Z

242

def consecutive(data, stepsize=1):
    return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)

a = np.array([0, 47, 48, 49, 50, 97, 98, 99])
consecutive(a)

yields

[array([0]), array([47, 48, 49, 50]), array([97, 98, 99])]

edited Mar 8, 2015 at 23:02

answered Sep 8, 2011 at 19:21

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

z0r Over a year ago

And for finding runs of identical strings: partitions = np.where(a[1:] != a[:-1])[0] + 1 (np.diff doesn't work for strings)

Ben Over a year ago

Thank you! And if we use np.split(np.r_[:len(data)], np.where(np.diff(data) != stepsize)[0]+1) we get contiguous index lists, so if data is one column of a big table, we could use that result to index the same groups of rows.

Gabriel Lucas Over a year ago

@unutbu May I know what has to be changed in this function if I want to select only the groups with at least 5 consecutive elements?

dkamins · Accepted Answer · 2020-06-29 03:24:47Z

25

Here's a lil func that might help:

def group_consecutives(vals, step=1):
    """Return list of consecutive lists of numbers from vals (number list)."""
    run = []
    result = [run]
    expect = None
    for v in vals:
        if (v == expect) or (expect is None):
            run.append(v)
        else:
            run = [v]
            result.append(run)
        expect = v + step
    return result

>>> group_consecutives(a)
[[0], [47, 48, 49, 50], [97, 98, 99]]
>>> group_consecutives(a, step=47)
[[0, 47], [48], [49], [50, 97], [98], [99]]

P.S. This is pure Python. For a NumPy solution, see unutbu's answer.

edited Jun 29, 2020 at 3:24

answered Sep 8, 2011 at 18:56

dkamins

22.1k7 gold badges58 silver badges62 bronze badges

3 Comments

dkamins Over a year ago

P.S. If you want tuples instead of lists, you can do tuple(map(tuple, group_consecutives(a)))

marscher Over a year ago

This is not a NumPy solution!

Ben Over a year ago

This should be much slower than the answer using np.split.

Paul · Accepted Answer · 2011-09-08 18:44:52Z

13

(a[1:]-a[:-1])==1 will produce a boolean array where False indicates breaks in the runs. You can also use the built-in numpy.grad.

answered Sep 8, 2011 at 18:44

Paul

43.9k17 gold badges112 silver badges126 bronze badges

2 Comments

Luke Skywalker Over a year ago

I don't understand this answer although it's the only one that looks "functional" (as in functional language). What you are doing here is apply the minus operator to list. I don't see how could that work.

Paul Over a year ago

@LukeSkywalker It's not functional. a in this case is a numpy array, not a list and the minus operator does element-wise subtraction.

Pablo Rosales · Accepted Answer · 2020-11-19 21:46:52Z

6

Tested for one dimensional arrays

Get where diff isn't one

diffs = numpy.diff(array) != 1

Get the indexes of diffs, grab the first dimension and add one to all because diff compares with the previous index

indexes = numpy.nonzero(diffs)[0] + 1

Split with the given indexes

groups = numpy.split(array, indexes)

edited Nov 19, 2020 at 21:46

answered Nov 17, 2020 at 8:22

Pablo Rosales

1,82419 silver badges13 bronze badges

Comments

fabmilo · Accepted Answer · 2011-09-08 19:14:42Z

5

this is what I came up so far: not sure is 100% correct

import numpy as np
a = np.array([ 0, 47, 48, 49, 50, 97, 98, 99])
print np.split(a, np.cumsum( np.where(a[1:] - a[:-1] > 1) )+1)

returns:

>>>[array([0]), array([47, 48, 49, 50]), array([97, 98, 99])]

answered Sep 8, 2011 at 19:14

fabmilo

48.5k16 gold badges108 silver badges123 bronze badges

1 Comment

Back2Basics Over a year ago

Counter example: a = np.array([ 0, 47, 48, 49, 50, 97, 98, 99, 101, 102, 103, 140, 141]) print(np.split(a, np.cumsum( np.where(a[1:] - a[:-1] > 1) )+1)) produces [array([0]), array([47, 48, 49, 50]), array([ 97, 98, 99, 101, 102, 103, 140]), array([141]), array([], dtype=int64)]

user7864386 · Accepted Answer · 2022-02-22 00:06:22Z

It turns out that instead of np.split, list comprehension is more performative. So the below function (almost like @unutbu's consecutive function except it uses a list comprehension to split the array) is much faster:

def consecutive_w_list_comprehension(arr, stepsize=1):
    idx = np.r_[0, np.where(np.diff(arr) != stepsize)[0]+1, len(arr)]
    return [arr[i:j] for i,j in zip(idx, idx[1:])]

For example, for an array of length 100_000, consecutive_w_list_comprehension is over 4x faster:

arr = np.sort(np.random.choice(range(150000), size=100000, replace=False))

%timeit -n 100 consecutive(arr)
96.1 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit -n 100 consecutive_w_list_comprehension(arr)
23.2 ms ± 858 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In fact, this relationship holds up no matter the size of the array. The plot below shows the runtime difference between the answers on here.

Code used to produce the plot above:

import perfplot
import numpy as np

def consecutive(data, stepsize=1):
    return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)

def consecutive_w_list_comprehension(arr, stepsize=1):
    idx = np.r_[0, np.where(np.diff(arr) != stepsize)[0]+1, len(arr)]
    return [arr[i:j] for i,j in zip(idx, idx[1:])]

def group_consecutives(vals, step=1):
    run = []
    result = [run]
    expect = None
    for v in vals:
        if (v == expect) or (expect is None):
            run.append(v)
        else:
            run = [v]
            result.append(run)
        expect = v + step
    return result


def JozeWs(array):
    diffs = np.diff(array) != 1
    indexes = np.nonzero(diffs)[0] + 1
    groups = np.split(array, indexes)
    return groups

perfplot.show(
    setup = lambda n: np.sort(np.random.choice(range(2*n), size=n, replace=False)),
    kernels = [consecutive, consecutive_w_list_comprehension, group_consecutives, JozeWs],
    labels = ['consecutive', 'consecutive_w_list_comprehension', 'group_consecutives', 'JozeWs'],
    n_range = [2 ** k for k in range(5, 22)],
    equality_check = lambda *lst: all((x==y).all() for x,y in zip(*lst)),
    xlabel = '~len(arr)'
)

lifeisstillgood · Accepted Answer · 2011-09-08 18:28:01Z

-4

This sounds a little like homework, so if you dont mind I will suggest an approach

You can iterate over a list using

for i in range(len(a)):
    print a[i]

You could test the next element in the list meets some criteria like follows

if a[i] == a[i] + 1:
    print "it must be a consecutive run"

And you can store results seperately in

results = []

Beware - there is an index out of range error hidden in the above you will need to deal with

answered Sep 8, 2011 at 18:28

lifeisstillgood

3,4792 gold badges23 silver badges22 bronze badges

1 Comment

Paul Over a year ago

Please don't suggest using python iterators over numpy arrays when more obvious solutions exist. It defeats the purpose of using numpy. (usually.) If the OP did not care about performance, they probably would have used a python list.

Collectives™ on Stack Overflow

How to find the groups of consecutive elements in a NumPy array

7 Answers 7

3 Comments

3 Comments

2 Comments

Comments

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

3 Comments

2 Comments

Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related