0

I am writing a function to bin points based on their angle in a radial coordinate system. I would like to have the option to perform some nonlinear downsampling of the points in each bin (computing the median coordinate, min or max coordinate based on distance). I am able to split my array into views of each bin, but since they vary in size, I have not been able to find a way to operate on each slice while fully utilizing vectorization.

I have achieved my best solution by sorting the points by angle, computing a quantized copy, and identifying the indices between quantized gaps. I then split the sorted array with the index key.

At this point, I would like to be able to compute metrics for each bin without using loops. I can't simply concatenate each slice into a 3D array since they are inhomogeneous. The way I have achieved this so far is by building an array of NaNs of size [num_slices, length_of_largest_slice, 2], and populating the array along axis 0 with each slice, leaving the unindexed portions as NaN, and finally computing my metrics with operations that ignore NaNs. I don't believe this is memory efficient and I assume that populating the array is quite slow.

Example code below:

polar_points = get_polar(points)    # convert points to polar coordinates, sorted by angle
quantized = (polar_points[:,1] // bin_size) # quantize the points to a provided resolution

split_key = np.nonzero(np.diff(quantized))[0] + 1  # compute gap indices
max_size_key = np.append(np.insert(split_key, 0, 0), quantized.shape[0])   # add first and last index for size computation

split_polar = np.split(polar_points, split_key) # split original points at gap indices

dim_0 = len(split_polar)    # get number of split clouds
dim_1 = max(np.diff(max_size_key)) # get size of largest split cloud

reshaped_array = np.full(shape=(dim_0, dim_1, 2), fill_value=np.nan)    # init array for inhomogenous reshaped data

for idx, arr in enumerate(split_polar):
    reshaped_array[idx, :arr.shape[0], :] = arr

if mode=='mean':
    res = np.nanmean(reshaped_array, axis=1)

elif mode == 'median':
    res = np.nanmedian(reshaped_array, axis=1)

elif mode == 'closest':
    min_indices = np.nanargmin(reshaped_array[:, :, 0], axis=-1) # get idx of min r for each bin
    res = reshaped_array[np.arange(dim_0), min_indices, :]

elif mode == 'furthest':
    max_indices = np.nanargmax(reshaped_array[:, :, 0], axis=-1) # get idx of max r for each bin
    res = reshaped_array[np.arange(dim_0), max_indices, :]

return get_cartesian(res)

I'm wondering if numpy.ufunc or numpy.vectorize could be used to solve this? I have seen map used to similar ends, but I'm not sure how efficient this would be compared to a full numpy solution.

1
  • You might want to review the code for the various np.nan... functions. I believe most replace the nan values withsome harmless, e.g. 0 or 1, depending on the operation. np.vectorize etc does not improve speed; depending on the operation it is marginally better than a list iteration. Commented Jan 2 at 21:00

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.