1

Question

Suppose I now have an array that looks like arr = np.random.randint(1, 4, 100), where the unique numbers are 1, 2, and 3.

Now I would like to model the data corruption, where some of the numbers are likely to become others. For example, if arr[k] is 1, then it is likely to remain the same, but it is also possible to become 2 or 3 (all with equal probabilities).

I could implement this using following code

import numpy as np
arr = np.random.randint(1, 4, 100)

mask = np.random.choice([0, 1], size=100, p=[0.8, 0.2])
for idx in range(100):
    if mask[idx] != 0:
        arr[idx] = np.random.choice([1, 2, 3)

This works fine but I really do not like the loop. Is there some way I could eliminate the (ugly) loop?

0

2 Answers 2

1
import numpy as np
arr = np.random.randint(1, 4, 100)
arr2 = np.random.randint(1, 4, 100)
mask = np.random.choice([0, 1], size=100, p=[0.8, 0.2]) != 0
arr[mask] = arr2[mask]

Alternatively, you can count how many Trues there are in mask and just make arr2 precisely that many elements long.

Sign up to request clarification or add additional context in comments.

1 Comment

Your suggestion is the more efficient option
0

I think this works fine

np.where(np.random.choice([1,0],100, p=[0.8,0.2]), arr, np.random.randint(1,4,100))

To compute the final proportion of corruption

y = np.where(np.random.choice([1,0],100, p=[0.8,0.2]), arr, np.random.randint(1,4,100))
(arr != y).mean()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.