1

I have some dynamically created arrays that have varying lengths and I would like to resize them to the same 5000 element length by popping every n element.

Here is what I got so far:

import numpy as np
random_array = np.random.rand(26975,3)

n_to_pop = int(len(random_array) / 5000)
print(n)

If I do the downsampling with n (5) I get 5395 elements

I can do 5395 / 5000 = 1.07899, but I don't know how to calculate how often I should pop a element to remove the last 0.07899 elements.

If I can get within 5000-5050 length that would also be acceptable, then the remainder can be sacrificed with a simple .resize

This is probably just a simple math question, but I couldn't seem to find an answer anywhere.

Any help is much appreciated.

Best regards

Martin

4
  • not exactly what you want but maybe you could simply use np.random.choice(random_array, 5000, replace=False)? Commented Jun 21, 2022 at 8:18
  • It surely is an option, but for uniformity it would be nice with a uniform way to downsample the array to length 5000 Commented Jun 21, 2022 at 8:20
  • Unfortunately, np.random.choice only works on one dimensional arrays, my arrays are 3 dimensional. Commented Jun 21, 2022 at 9:56
  • well, you can always do random_array[np.random.choice(random_array.shape[0], 5000, replace=False)] Commented Jun 21, 2022 at 11:13

2 Answers 2

1

You can use something like np.linspace to make your solution as uniform as possible:

subset = random_array[np.round(np.linspace(0, len(random_array), 5000, endpoint=False)).astype(int)]

You don't always want to drop a uniform number of elements. Consider the case of reducing a 5003 element array to 5000 elements vs a 50003 element array. The trick is to create a set of elements to keep or drop that's as linear as possible in the index, which is exactly what np.linspace does.

You could also do something like

np.delete(random_array, np.round(np.linspace(0, len(random_array) len(random_array) - 5000, endpoint=False)).astype(int))
Sign up to request clarification or add additional context in comments.

3 Comments

👍🏼 it works, could you possibly elaborate on how it works? or else I could probably just google it. Anyway, thanks
I honestly dunno how I can make it any more elaborate
😁 That's alright, I think I just need to google that np.linspace does and understand what the difference with linear or other drops. Thanks
1

You can use Step solution using np.random.choice or np.random.permutation as:

random_array[np.random.permutation(random_array.shape[0])[:5000]]

In case of near uniformly remove the rows, one way is:

indices = np.linspace(0, random_array.shape[0], endpoint=False, num=5000, dtype=int)
# [    0     5    10    16    ...    26958 26964 26969] --> shape = (5000,)

result = random_array[indices]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.