0

So I have a large 3D array (~ 2000 x 1000 x 1000). I want to update each value in the array to a random integer value between 1 and the current max such that all values = x are updated to the same random integer. I want to keep zeros unchanged. Also there can't be any repeats, i.e. different values in the original array can't be updated to the same random int. The values are currently in a continuous range between 0 and 9000. There are quite a lot of values in the array;

np.amax(arr) #output = 9000

So tried the method below...

max_v = np.amax(arr)
vlist = []
for l in range(1,max_v): vlist.append(l)
for l in tqdm(range(1,max_v)):
    m = random.randint(1,len(vlist))
    n = vlist[m]
    arr = np.where(arr == l, n, arr)
    vlist.remove(n)

My current code takes about 13 s per iteration with 9000 itertions (for the first few iterations at least which is too slow). I've thought about parallelisation with concurrent.futures but i'm sure it's likely i've missed something obvious here XD

3
  • Example input/output might help clarify. Commented Nov 7, 2022 at 21:54
  • Can you write this in a form with minimal mutations (e.g. no vlist.remove(n))? Commented Nov 7, 2022 at 21:56
  • I think you're overcomplicating this. The values of the array are indices into a simple permutation. The entire thing can be done with a shuffle and index. Commented Nov 7, 2022 at 22:16

2 Answers 2

2

If your current values are in a continuous range, and you want another continuous range, you're in luck! At that point, you aren't really generating 2 billion random numbers: you're just permuting 9000 or so integers. For example:

arr = np.random.randint(9001, size=(10, 20, 20))
p = np.arange(arr.max(None) + 1)
np.random.shuffle(p)
arr = p[arr]

The replacement values do not have to start with zero, but if you plan on doing this iteratively, you will have to subtract off the offset before using arr as an index into p.

Sign up to request clarification or add additional context in comments.

4 Comments

np.random.permutation(n) might be easier than shuffle(arange(n))
@SamMason. If this is done iteratively, you only allocate the arange once and keep shuffling it. permutation will keep returning copies.
yup, I was writing an answer doing basically the same as you answer. using OPs full array takes my computer ~10secs. also, the new-style RNG is ~3x faster at generating ints for me
@SamMason. Since OP is concerned with speed, and you did a bunch of benchmarks in addition to using the new API, I would recommend posting another answer.
1

As suggested by Mad Physicist, here's my almost identical solution:

from sys import getsizeof
import numpy as np

# create a new-style random generator
rng = np.random.default_rng()

# takes ~20 seconds, ~60 secs with legacy generator
X = rng.integers(9001, size=(2000, 1000, 1000), dtype=np.uint16)

# output: 3.73 GiB, uint16 takes 1/4 space of the default int64
print(f"{getsizeof(X) / 2**30:.2f} GiB")

# generate a permutation, converting to same datatype makes slightly faster
p = rng.permutation(np.max(X)+1).astype(X.dtype)

# iterate applying permutation, takes ~10 seconds in total
for i in range(len(X)):
    X[i] = p[X[i]]

I'm iterating while applying the permutation, to reduce transient memory demands. it will only need one slice of the first dimension at a time (~2MiB) rather than trying to completely allocate a new copy again.

MadPhysicist asked why I'm doing the for loop at the end rather than just directly executing X[:] = p[X]. This is about reducing the memory demands of the program. Under Linux, I'd use something like:

from resource import getrusage, RUSAGE_SELF

print(getrusage(RUSAGE_SELF).ru_maxrss)

to tell me the most RAM that had been allocated to the Python process (in KiB). If I run that after running the above code I get 3938904 printed, so 3.76GiB. If I don't use the for loop, then this goes up to 7.48 GiB. If I don't ensure the permutation is also of type uint16 (i.e. with .astype(X.dtype)) then my laptop would start swapping as it would require more than 16GiB of RAM.

2 Comments

Not sure why the loop at the end. Did you mean X[:] = p[X]?
@MadPhysicist have added more explanation, hope that makes sense

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.