0

Is there a way to directly sample a matrix of random integers that are unique on each row? Doing this for each row apart can be slow.

import random as rd
import pandas as pd

N = 1000000 # number of rows/number of draws (try N=1000)
M = 100000  # range to sample from
K = 3       # size of each sample
# note: K<=M
numbers = pd.DataFrame(columns=['A', 'B', 'C'], index=range(N))
for i in range(N):
    numbers.iloc[i,:] = rd.sample(range(M),K)

#  duration in seconds (M=100)
#  N                    1000     10.000   100.000  1.000.000
#  method in question   2.2       3.3         13         99
#  method by Nin17,     0.0085    0.1       0.57        5.6
#  i.e. list comprehension [rd.sample(range(M),K) for _ in range(N)] 
2
  • You can look into Latin Hypercube Sampling. I believe SciPy has a method for it. Though you may need to apply an adjustment to ensure the indeces are integers. Commented Apr 16, 2022 at 9:39
  • Do you mean that each row has to be different or that the numbers in each row cannot be reused in any other row? Can numbers be repeated in one row? Also, please post a minimum reproducible example (i.e. what are numbers, N, M, and K). Commented Apr 16, 2022 at 9:39

2 Answers 2

1

one of the ways I know is that you can do it using numpy.

numpy has a function called reshape() its job is to shape the list you want to any shape like matrix.

try this code:

import random as rd, numpy as np
matrix = np.array(rd.sample(range(M), K*N)).reshape(N, K)

in here N is number of rows and the K is number of columns.

Sign up to request clarification or add additional context in comments.

5 Comments

This doesn't produce the same result as the sample code. With K=M=N=2 it results in ValueError: Sample larger than population or is negative whereas the sample code is error free.
@Nin17 try to use a range equal or bigger than N*K for M and the error will be fixed.
that's my point, on the other method you don't need to
the condition for the original method is M>=K not M>=N*K
This is wholly different sample. In the question, a given integer may appear independently over multiple rows. In your solutions, once an integer appears in one row, it cannot appear in any other rows. eg. [[0, ...], [0, ...]], where zero appears both in the first and second row, could be a valid output. Your solution will never produce this as output.
0

List comprehension is faster if N, M and K are large:

numbers = [rd.sample(range(M), K) for _ in range(N)]

2 Comments

you just simplified it and used a loop again!!!
the problem was that it's slow, this is a faster way

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.