-3

I am using the metpy package to calculate many different weather parameters for many different locations across North America for many different hours. I want to fill arrays containing these weather parameters that look like: [hrs,stns]. I am not able to vectorize these operations, unfortunately (see metpy package documentation and notice that many of these calculations cannot operate on the original arrays this data normally comes in).

Here is a very simple example of my code. How would I run the following code in parallel?

wx_array1 = np.empty(shape=(3000,600))
wx_array2 = np.empty(shape=(3000,600))

for hr in range(3000):
    for stn in range(600):
       wx_array1[hr,stn] = hr * stn
       wx_array2[hr,stn] = hr + stn
6
  • 2
    Rule n°1 in Numpy : avoid CPython loops like the plague and write vectorized code instead. >99% of the time spent by your loop is CPython/Numpy overheads. Using more core very inefficiently is not reasonable. See np.outer for example. Commented Dec 27, 2024 at 15:25
  • 2
    If you're aware of a vectorization restriction specific to metpy then, if possible, provide a reference link to that information rather than asking readers to search the metpy documentation for it. Commented Dec 27, 2024 at 17:17
  • 1
    @user8229029 To clarify what Jerome said, CPython is the standard Python distribution (the other, less common version, is PyPy). So, when you write a for loop as you've done, that is doing a CPython loop. Commented Dec 27, 2024 at 17:44
  • 3
    Your example seems to be vectorizable since you are only working with numpy arrays and integers produced by the for loops. Since you say your actual problem cannot be vectorized because of metpy, I recommend rewriting your example to something that is representative of your real problem since this current example doesn't seem to showcase the nuances of your issue. Commented Dec 27, 2024 at 17:46
  • 2
    "Your conclusion that I'm working with numpy arrays and integers is FALSE" Weird, I wonder where I got that "unjustified" assumption from. Oh, right, wx_array1/2 are a numpy arrays and hr and stn are integers. If you're not working with numpy arrays and integers, then your question, which only contains numpy arrays and integers, is clearly not representative (that's what I have been trying to say all along). Commented Dec 27, 2024 at 21:08

1 Answer 1

3

You could use Numba package to speed-up calculations:

Normal: 962 ms ± 228 ms per loop
With numba: 3.77 ms ± 52 µs per loop

import numpy as np
import numba

wx_array1 = np.empty(shape=(3000,600))
wx_array2 = np.empty(shape=(3000,600))

@numba.jit()
def create_array(wx_array1, wx_array2):
  for hr in range(3000):
      for stn in range(600):
        wx_array1[hr,stn] = hr * stn
        wx_array2[hr,stn] = hr + stn
  return wx_array1, wx_array2

wx_array1, wx_array2 = create_array(wx_array1, wx_array2)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for simply providing an answer, though this doesn't really solve the problem, as I am doing much more complex calculations in the for loop that do not natively run vectorized. So, the only way I know of to speed this up would be to run a for loop in parallel.
Numba supports parallelization, vectorization, and GPU acceleration ...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.