1

I was wondering what the simplest method for doing the following is:

Suppose we have the following 2d arrays:

>>> a = np.array([['z', 'z', 'z', 'f', 'z','f', 'f'], ['z', 'z', 'z', 'f', 'z','f', 'f']])

array([['z', 'z', 'z', 'f', 'z', 'f', 'f'],
   ['z', 'z', 'z', 'f', 'z', 'f', 'f']],
  dtype='<U1')



>>> b = np.array(range(0,14)).reshape(2, -1)


array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13]])


>>> idxs = list(zip(*np.where(a == 'f')))

[(0, 3), (0, 5), (0, 6), (1, 3), (1, 5), (1, 6)]


>>> [b[x] for x in idxs]

[3, 5, 6, 10, 12, 13]

However, I would like to keep the structure that was there before with regard to the first index or rows - i.e. :

[[3, 5, 6], [7, 11]]

Is there a way to keep this structure easily?

3
  • That's a mix of length 3 and length 2 lists; it can't be a 2d array. Commented Aug 19, 2017 at 2:05
  • @hpaulj yes it would end up being a list of lists, it can't be a numpy array at the end Commented Aug 19, 2017 at 2:08
  • @Alexander I fixed the small errors Commented Aug 19, 2017 at 2:11

4 Answers 4

2

This is a more complicated, but pure NumPy, solution:

  1. Get the indices (in a flattened version of a) where it's an 'f'.
  2. Get the indices where a new row begins
  3. Find the indices in the array from 1 which belong to one row
  4. Split the array at these indices.

The code would look like this:

>>> indices = np.flatnonzero(a.ravel() == 'f')
>>> rows = np.arange(1, a.shape[0])*a.shape[1]
>>> np.split(b.ravel()[indices], np.searchsorted(indices, rows))
[array([3, 5, 6], dtype=int64), array([10, 12, 13], dtype=int64)]

A bit longer than the other solutions and I'm not sure if it will be faster 1.

Although, personally, I would go with a list comprehension and a zip:

[b_row[a_row] for a_row, b_row in zip(a == 'f', b)]

It's much shorter and according to my timings quite performant.


Timing:

import numpy as np
a = np.array([['z', 'z', 'z', 'f', 'z','f', 'f']]*10000)
b = np.arange(a.size).reshape(-1, a.shape[1])

%%timeit

indices = np.flatnonzero(a.ravel() == 'f')
rows = np.arange(1, a.shape[0])*a.shape[1]
np.split(b.ravel()[indices], np.searchsorted(indices, rows))

123 ms ± 8.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [b[i][a[i] == 'f'] for i in range(len(a))]

162 ms ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

But a lot slower compared to my suggestion at Psidoms answer:

%timeit [b_row[a_row] for a_row, b_row in zip(a == 'f', b)]

44.9 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Sign up to request clarification or add additional context in comments.

Comments

2

Use a for loop:

[b[i][a[i] == 'f'] for i in range(len(a))]
# [array([3, 5, 6]), array([10, 12, 13])]

2 Comments

or with zip: [b_row[a_row == 'f'] for a_row, b_row in zip(a, b)]. You could even go a step further and do the comparison outside of the loop: [b_row[a_row] for a_row, b_row in zip(a == 'f', b)] (that could be a bit faster).
@MSeifert Nice thought on the second option. I can see a speed up there.
1

a = np.array([['z', 'z', 'z', 'f', 'z','f', 'f'], ['z', 'z', 'z', 'f', 'z','f', 'f']])

b = np.array(range(0,14)).reshape(2, -1)

idxs = list(zip(*np.where(a == 'f')))


c=[[],[]]
for x in idxs:
    c[x[0]].append(b[x])

print c

Comments

1
In [89]: idx = np.where(a == 'f')
In [90]: idx
Out[90]: 
(array([0, 0, 0, 1, 1, 1], dtype=int32),
 array([3, 5, 6, 3, 5, 6], dtype=int32))

We can apply the where tuple to select items in b:

In [93]: b[idx]
Out[93]: array([ 3,  5,  6, 10, 12, 13])

Equivalently apply the boolean mask:

In [94]: b[a == 'f']
Out[94]: array([ 3,  5,  6, 10, 12, 13])

np.argwhere takes the transpose of where, producing a 2d array like your idxs.

In [95]: np.argwhere(a == 'f')
Out[95]: 
array([[0, 3],
       [0, 5],
       [0, 6],
       [1, 3],
       [1, 5],
       [1, 6]], dtype=int32)

As noted in Delete all elements in an array corresponding to Boolean mask, we can't, in general, select elements with a mask, and retain some sort of 2d structure. In selected cases we can reshape the 1d result into something meaningful.

In [96]: b[idx].reshape(2,-1)
Out[96]: 
array([[ 3,  5,  6],
       [10, 12, 13]])

An easy way to collect these values on a row by row basis, and allowing for differing size results in each row, would be to iterate:

In [100]: [j[i=='f'] for i,j in zip(a,b)]
Out[100]: [array([3, 5, 6]), array([10, 12, 13])]
In [101]: [j[i=='f'].tolist() for i,j in zip(a,b)]
Out[101]: [[3, 5, 6], [10, 12, 13]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.