Filling out numpy array in parallel?

Question

I have code which looks something like this

import numpy as np
A = np.zeros((10000, 10))

for i in range(10000):

    # Some time-consuming calculations which result in a 10 element 1D array 'a'

    A[i, :] = a

How can I parallelize the for loop, so that the array A is filled out in parallel? My understanding is that multiple processes normally shouldn't be writing to the same variable, so it's not clear to me what the proper way of doing this is.

I would just do A = np.array([some_calculation(i) for i in range(len(A))]). — Quang Hoang
– Quang Hoang, Commented Jul 30, 2020 at 16:01
Does this answer your question? Parallel for loop over numpy matrix — L.Grozinger
– L.Grozinger, Commented Jul 30, 2020 at 16:05
Collecting results in a list, and joining that into one array after might be the most robust approach. Making an array from a list of array is relatively fast. Whether this speeds things up depends on the complexity of the calculations compared to the multiprocessing speed. — hpaulj
– hpaulj, Commented Jul 30, 2020 at 17:30
Check out numba, they have good tools for parallelized writing to a single array. — David Hoffman
– David Hoffman, Commented Jul 30, 2020 at 17:37

Robert Clarke · Accepted Answer · 2020-07-30 16:13:10Z

2

This below code creates a thread for each line of the array, not sure how efficient it is though.

import numpy as np
import threading

def thread_function(index, array):
  # aforementioned time-consuming calculation, resulting in 'a'
  a = np.ones(10)    # placeholder for calculation

  array[index, :] = a

if __name__ == "__main__":
  A = np.zeros((10000, 10))
  threads = []

  for i in range(10000):
    threads.append(threading.Thread(target=thread_function, args=(i, A)))
    threads[i].start()

  for i in range(10000):
    threads[i].join()

  print(A)

https://repl.it/@RobertClarke64/Python-Multithreading
As it is it's not particularly fast, but will hopefully be noticeably faster than running each calculation in series when the calculation takes a long time.

answered Jul 30, 2020 at 16:13

Robert Clarke

4851 gold badge5 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Walker Rowe · Accepted Answer · 2023-04-29 13:33:36Z

0

Why not use the GPU's ability to do it in parallel:

import numpy as np
from numba import vectorize

@vectorize(["float32(float32, float32)"], target='cuda')
def f(a,b):

  for i in range(10000):
    a=4*b
  return a


a = np.ones((10000,10), dtype=np.float32)
b = np.random.rand(10000,10).astype('f')

f(a,b)

answered Apr 29, 2023 at 13:33

Walker Rowe

9931 gold badge14 silver badges24 bronze badges

Collectives™ on Stack Overflow

Filling out numpy array in parallel?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related