191

I know that if I use randn, the following code gives me what I am looking for, but with elements from a normal distribution. But what if I just wanted random integers?

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

randint works by providing a range, but not an array like randn. So how do I do this with random integers between some range?

1

3 Answers 3

297

numpy.random.randint accepts a third argument (size) , in which you can specify the size of the output array. You can use this to create your DataFrame -

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

Here - np.random.randint(0,100,size=(100, 4)) - creates an output array of size (100,4) with random integer elements between [0,100) .


Demo -

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

which produces:

     A   B   C   D
0   45  88  44  92
1   62  34   2  86
2   85  65  11  31
3   74  43  42  56
4   90  38  34  93
5    0  94  45  10
6   58  23  23  60
..  ..  ..  ..  ..
Sign up to request clarification or add additional context in comments.

1 Comment

Question: Why does this work? What function is being called by pandas.DataFrame(x, y) when x is a numpy.ndarray (which in this case is a 2d matrix like thing)?
25

The recommended way to create random integers with NumPy these days is to use numpy.random.Generator.integers. (documentation)

import numpy as np
import pandas as pd

rng = np.random.default_rng()
df = pd.DataFrame(rng.integers(0, 100, size=(100, 4)), columns=list('ABCD'))
df
----------------------
      A    B    C    D
 0   58   96   82   24
 1   21    3   35   36
 2   67   79   22   78
 3   81   65   77   94
 4   73    6   70   96
... ...  ...  ...  ...
95   76   32   28   51
96   33   68   54   77
97   76   43   57   43
98   34   64   12   57
99   81   77   32   50
100 rows × 4 columns

Comments

6

You can also use np.random.Generator.choice.

df = pd.DataFrame(np.random.default_rng().choice(100, size=(100, 4)), columns=['A','B','C','D'])

The advantage of this method over integers is that you can choose from any list / array you want. For example, if you want to generate random sample from [2, 5, 10], then

df = pd.DataFrame(np.random.default_rng().choice([2,5,10], size=(100, 4)), columns=['A','B','C','D'])

You can even associate a probability distribution to sample entries. For example, if you want to choose 2 with p=0.8, and 5 with p=0.2, you can do so by, passing p= argument.

df = pd.DataFrame(np.random.default_rng().choice([2,5], p=[.8,.2], size=(100, 4)), columns=['A','B','C','D'])

Also, with the Generator, choice is as fast as integers and faster than randint.

%timeit pd.DataFrame(np.random.default_rng().choice(100, size=(100_000,4)), columns=[*'ABCD'])
# 3.34 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit pd.DataFrame(np.random.default_rng().integers(0, 100, size=(100_000,4)), columns=[*'ABCD'])
# 3.81 ms ± 708 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit pd.DataFrame(np.random.randint(100, size=(100_000,4)), columns=[*'ABCD'])
# 6.78 ms ± 776 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.