Efficient way to operate over a list of Numpy arrays of different sizes

Ask Question

Asked 11 months ago

Modified 11 months ago

Viewed 54 times

I am writing a function to bin points based on their angle in a radial coordinate system. I would like to have the option to perform some nonlinear downsampling of the points in each bin (computing the median coordinate, min or max coordinate based on distance). I am able to split my array into views of each bin, but since they vary in size, I have not been able to find a way to operate on each slice while fully utilizing vectorization.

I have achieved my best solution by sorting the points by angle, computing a quantized copy, and identifying the indices between quantized gaps. I then split the sorted array with the index key.

At this point, I would like to be able to compute metrics for each bin without using loops. I can't simply concatenate each slice into a 3D array since they are inhomogeneous. The way I have achieved this so far is by building an array of NaNs of size [num_slices, length_of_largest_slice, 2], and populating the array along axis 0 with each slice, leaving the unindexed portions as NaN, and finally computing my metrics with operations that ignore NaNs. I don't believe this is memory efficient and I assume that populating the array is quite slow.

Example code below:

polar_points = get_polar(points)    # convert points to polar coordinates, sorted by angle
quantized = (polar_points[:,1] // bin_size) # quantize the points to a provided resolution

split_key = np.nonzero(np.diff(quantized))[0] + 1  # compute gap indices
max_size_key = np.append(np.insert(split_key, 0, 0), quantized.shape[0])   # add first and last index for size computation

split_polar = np.split(polar_points, split_key) # split original points at gap indices

dim_0 = len(split_polar)    # get number of split clouds
dim_1 = max(np.diff(max_size_key)) # get size of largest split cloud

reshaped_array = np.full(shape=(dim_0, dim_1, 2), fill_value=np.nan)    # init array for inhomogenous reshaped data

for idx, arr in enumerate(split_polar):
    reshaped_array[idx, :arr.shape[0], :] = arr

if mode=='mean':
    res = np.nanmean(reshaped_array, axis=1)

elif mode == 'median':
    res = np.nanmedian(reshaped_array, axis=1)

elif mode == 'closest':
    min_indices = np.nanargmin(reshaped_array[:, :, 0], axis=-1) # get idx of min r for each bin
    res = reshaped_array[np.arange(dim_0), min_indices, :]

elif mode == 'furthest':
    max_indices = np.nanargmax(reshaped_array[:, :, 0], axis=-1) # get idx of max r for each bin
    res = reshaped_array[np.arange(dim_0), max_indices, :]

return get_cartesian(res)

I'm wondering if numpy.ufunc or numpy.vectorize could be used to solve this? I have seen map used to similar ends, but I'm not sure how efficient this would be compared to a full numpy solution.

asked Jan 2 at 19:17

Gtingstad

You might want to review the code for the various np.nan... functions. I believe most replace the nan values withsome harmless, e.g. 0 or 1, depending on the operation. np.vectorize etc does not improve speed; depending on the operation it is marginally better than a list iteration.

hpaulj
– hpaulj

2025-01-02 21:00:17 +00:00
Commented Jan 2 at 21:00

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Efficient way to operate over a list of Numpy arrays of different sizes

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest