3

I'm looking for a solution to the following problem:

Let's say I have an array with shape (4, 4):

[5. 4. 5. 4.]
[2. 3. 5. 5.]
[2. 1. 5. 1.]
[1. 3. 1. 3.]

Within this array there is one column in which the value "5" appears 3 times in a row. That is, they are not scattered across the column, as exemplified below.

[5.] # This
[1.] # Should
[5.] # Not
[5.] # Count

Now let's say I have a bigger array with shape (M,N) and various integer values in the same range of 1-5. How would I go about counting the maximum number of identical values appearing in a row per column? Furthermore, is it possible to obtain the indices these values would appear at? The expected output of the above example would be

Found 3 in a row of number 5 in column 2
(0,2), (1,2), (2,2)

I assume that the implementation would be similar if the search should concern rows. If not I'd love to know how this is done as well.

1

3 Answers 3

1

Approach #1

Here's one approach -

def find_longest_island_indices(a, values):
    b = np.pad(a, ((1,1),(0,0)), 'constant')
    shp = np.array(b.shape)[::-1] - [0,1]
    maxlens = []
    final_out = []
    for v in values:
        m = b==v        
        idx = np.flatnonzero((m[:-1] != m[1:]).T)
        s0,s1 = idx[::2], idx[1::2]        
        l = s1-s0
        maxidx = l.argmax()
        longest_island_flatidx = np.r_[s0[maxidx]:s1[maxidx]]            
        r,c = np.unravel_index(longest_island_flatidx, shp)
        final_out.append(np.c_[c,r])
        maxlens.append(l[maxidx])
    return maxlens, final_out

Sample run -

In [169]: a
Out[169]: 
array([[5, 4, 5, 4],
       [2, 3, 5, 5],
       [2, 1, 5, 1],
       [1, 3, 1, 3]])

In [173]: maxlens
Out[173]: [1, 2, 1, 1, 3]

In [174]: out
Out[174]: 
[array([[3, 0]]), array([[1, 0],
        [2, 0]]), array([[1, 1]]), array([[0, 1]]), array([[0, 2],
        [1, 2],
        [2, 2]])]

# With "pretty" printing
In [171]: maxlens, out = find_longest_island_indices(a, [1,2,3,4,5])
     ...: for  l,o,i in zip(maxlens,out,[1,2,3,4,5]):
     ...:     print "For "+str(i)+" : L= "+str(l)+", Idx = "+str(o.tolist())
For 1 : L= 1, Idx = [[3, 0]]
For 2 : L= 2, Idx = [[1, 0], [2, 0]]
For 3 : L= 1, Idx = [[1, 1]]
For 4 : L= 1, Idx = [[0, 1]]
For 5 : L= 3, Idx = [[0, 2], [1, 2], [2, 2]]

Approach #2

With a bit of modification and outputting the start and end indices for the max-length island, here's one -

def find_longest_island_indices_v2(a, values):
    b = np.pad(a.T, ((0,0),(1,1)), 'constant')
    shp = b.shape
    out = []
    for v in values:
        m = b==v        
        idx = np.flatnonzero(m.flat[:-1] != m.flat[1:])
        s0,s1 = idx[::2], idx[1::2]        
        l = s1-s0
        maxidx = l.argmax()
        start_index = np.unravel_index(s0[maxidx], shp)[::-1]
        end_index = np.unravel_index(s1[maxidx]-1, shp)[::-1]
        maxlen = l[maxidx]
        out.append([v,maxlen, start_index, end_index])
    return out  

Sample run -

In [251]: a
Out[251]: 
array([[5, 4, 5, 4],
       [2, 3, 5, 5],
       [2, 1, 5, 1],
       [1, 3, 1, 3]])

In [252]: out = find_longest_island_indices_v2(a, [1,2,3,4,5])

In [255]: out
Out[255]: 
[[1, 1, (3, 0), (3, 0)],
 [2, 2, (1, 0), (2, 0)],
 [3, 1, (1, 1), (1, 1)],
 [4, 1, (0, 1), (0, 1)],
 [5, 3, (0, 2), (2, 2)]]

# With some pandas styled printing 
In [253]: import pandas as pd

In [254]: pd.DataFrame(out, columns=['Val','MaxLen','StartIdx','EndIdx'])
Out[254]: 
   Val  MaxLen StartIdx  EndIdx
0    1       1   (3, 0)  (3, 0)
1    2       2   (1, 0)  (2, 0)
2    3       1   (1, 1)  (1, 1)
3    4       1   (0, 1)  (0, 1)
4    5       3   (0, 2)  (2, 2)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! Works exactly as intended.
0

If we store the maximum length of a run of identical values in a column in a variable, then we can iterate through looking for runs of greater length.

If the following requires more explanation, just say!

a = np.array([[5,4,5,4],[2,3,5,5],[2,1,5,1],[1,3,1,3]])
rows, cols = a.shape
max_length = 0
for ci in range(cols):
    for ri in range(rows):
         if ri == 0:                  #start of run
             start_pos = (ri, ci)
             length = 1
         elif a[ri,ci] == a[ri-1,ci]: #during run
             length += 1
         else:                        #end of run
             if length > max_length:
                 max_length = length
                 max_pos = start_pos

max_row, max_col = max_pos
print('Found {} in a row of number {} in column {}'.format(max_length, a[max_pos], max_col))
for i in range(max_length):
     print((max_row+i, max_col))

Output:

Found 3 in a row of number 5 in column 2
(0, 2)
(1, 2)
(2, 2)

Note that if you would like the output of the tuples to be in the exact format you stated, then you can use a generator-expression with str.join:

print((max_row+i, max_col) for i in range(max_length)

Comments

0

Another approach is to use the itertools.groupby as suggested by @user, a possible implementation is the following:

import numpy as np
from itertools import groupby


def runs(column):
    max_run_length, start, indices, max_value = -1, 0, 0, 0
    for val, run in groupby(column):
        run_length = sum(1 for _ in run)
        if run_length > max_run_length:
            max_run_length, start, max_value = run_length, indices, val
        indices += run_length

    return max_value, max_run_length, start

The function above computes the length of the maximum run, the start and the corresponding value for a given column (row). With these values you can compute your expected output. The groupby is the one that does all the heavy lifting, for the array [5., 5., 5., 1.],

[(val, sum(1 for _ in run)) for val, run in groupby([5., 5., 5., 1.])]

the previous line outputs: [(5.0, 3), (1.0, 1)]. The loop keeps the starting index of the largest run, the length and the values of it. To apply the function to the columns you can use the numpy.apply_along_axis:

data = np.array([[5., 4., 5., 4.],
                 [2., 3., 5., 5.],
                 [2., 1., 5., 1.],
                 [1., 3., 1., 3.]])

result = [tuple(row) for row in np.apply_along_axis(runs, 0, data).T]
print(result)

Output

[(2.0, 2.0, 1.0), (4.0, 1.0, 0.0), (5.0, 3.0, 0.0), (4.0, 1.0, 0.0)]

In the output above the fourth tuple corresponds to the fourth column the value of the longest consecutive run is 5, the length is 3 and starts at index 0. To change to rows instead of columns change the index of the axis to 1 and drop the T, like this:

result = [tuple(row) for row in np.apply_along_axis(runs, 1, data)]

Output

[(5.0, 1.0, 0.0), (5.0, 2.0, 2.0), (2.0, 1.0, 0.0), (1.0, 1.0, 0.0)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.