generate matrix of random integers unique per row in python

Question

Is there a way to directly sample a matrix of random integers that are unique on each row? Doing this for each row apart can be slow.

import random as rd
import pandas as pd

N = 1000000 # number of rows/number of draws (try N=1000)
M = 100000  # range to sample from
K = 3       # size of each sample
# note: K<=M
numbers = pd.DataFrame(columns=['A', 'B', 'C'], index=range(N))
for i in range(N):
    numbers.iloc[i,:] = rd.sample(range(M),K)

#  duration in seconds (M=100)
#  N                    1000     10.000   100.000  1.000.000
#  method in question   2.2       3.3         13         99
#  method by Nin17,     0.0085    0.1       0.57        5.6
#  i.e. list comprehension [rd.sample(range(M),K) for _ in range(N)]

You can look into Latin Hypercube Sampling. I believe SciPy has a method for it. Though you may need to apply an adjustment to ensure the indeces are integers. — ChaddRobertson
– ChaddRobertson, Commented Apr 16, 2022 at 9:39
Do you mean that each row has to be different or that the numbers in each row cannot be reused in any other row? Can numbers be repeated in one row? Also, please post a minimum reproducible example (i.e. what are numbers, N, M, and K). — AJH
– AJH, Commented Apr 16, 2022 at 9:39

Amir Shamsi · Accepted Answer · 2022-04-16 09:43:26Z

1

one of the ways I know is that you can do it using numpy.

numpy has a function called reshape() its job is to shape the list you want to any shape like matrix.

try this code:

import random as rd, numpy as np
matrix = np.array(rd.sample(range(M), K*N)).reshape(N, K)

in here N is number of rows and the K is number of columns.

answered Apr 16, 2022 at 9:43

Amir Shamsi

3713 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Nin17 Over a year ago

This doesn't produce the same result as the sample code. With K=M=N=2 it results in ValueError: Sample larger than population or is negative whereas the sample code is error free.

Amir Shamsi Over a year ago

@Nin17 try to use a range equal or bigger than N*K for M and the error will be fixed.

Nin17 Over a year ago

that's my point, on the other method you don't need to

Nin17 Over a year ago

the condition for the original method is M>=K not M>=N*K

Dunes Over a year ago

This is wholly different sample. In the question, a given integer may appear independently over multiple rows. In your solutions, once an integer appears in one row, it cannot appear in any other rows. eg. [[0, ...], [0, ...]], where zero appears both in the first and second row, could be a valid output. Your solution will never produce this as output.

Nin17 · Accepted Answer · 2022-04-16 09:48:49Z

0

List comprehension is faster if N, M and K are large:

numbers = [rd.sample(range(M), K) for _ in range(N)]

edited Apr 16, 2022 at 9:48

answered Apr 16, 2022 at 9:43

Nin17

3,6022 gold badges7 silver badges18 bronze badges

2 Comments

Amir Shamsi Over a year ago

you just simplified it and used a loop again!!!

Nin17 Over a year ago

the problem was that it's slow, this is a faster way

Collectives™ on Stack Overflow

generate matrix of random integers unique per row in python

2 Answers 2

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related