5

I need to apply the same function onto every row in a numpy array and store the result again in a numpy array.

# states will contain results of function applied to a row in array
states = np.empty_like(array)

for i, ar in enumerate(array):
    states[i] = function(ar, *args)

# do some other stuff on states

function does some non trivial filtering of my data and returns an array when the conditions are True and when they are False. function can either be pure python or cython compiled. The filtering operations on the rows are complicated and can depend on previous values in the row, this means I can't operate on the whole array in an element-by-element fashion

Is there a way to do something like this in dask for example?

3
  • It still doesn't make sense. Where is i coming from? Are you trying to call enumerate? Commented Sep 28, 2015 at 8:29
  • Your function takes only current row or it can take any other row also? Commented Sep 28, 2015 at 8:54
  • The function accepts any 1D numpy array. It's doesn't care where that array came from. Commented Sep 28, 2015 at 13:46

2 Answers 2

7

Dask solution

You could do with with dask.array by chunking the array by row, calling map_blocks, then computing the result

ar = ...
x = da.from_array(ar, chunks=(1, arr.shape[1]))
x.map_blocks(function, *args)
states = x.compute()

By default this will use threads, you can use processes in the following way

from dask.multiprocessing import get
states = x.compute(get=get)

Pool solution

However dask is probably overkill for embarrassingly parallel computations like this, you could get by with a threadpool

from multiprocessing.pool import ThreadPool
pool = ThreadPool()

ar = ...
states = np.empty_like(array)

def f(i):
    states[i] = function(ar[i], *args)

pool.map(f, range(len(ar)))

And you could switch to processes with the following change

from multiprocessing import Pool
pool = Pool()
Sign up to request clarification or add additional context in comments.

Comments

0

Turn your function into a universal function: http://docs.scipy.org/doc/numpy/reference/ufuncs.html.

Then: states = function(array, *args).

2 Comments

I can't operate on the array in an element-by-element fashion. The filtering of a single row depends on previous values.
@kain88: You don't use i properly in your for loop and you don't seem to pass it to the function. It doesn't make any sense.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.