What numbers that I can put in numpy.random.seed()?

Question

I have noticed that you can put various numbers inside of numpy.random.seed(), for example numpy.random.seed(1), numpy.random.seed(101). What do the different numbers mean? How do you choose the numbers?

Doesn't really matter as it just determines what the subsequent "random" sequence will be. Useful for testing so the values don't vary from run-to-run. I usually use 42... — martineau
– martineau, Commented Apr 25, 2016 at 17:50

score 13 · Accepted Answer · 2016-04-26 05:41:07Z

13

Consider a very basic random number generator:

Z[i] = (a*Z[i-1] + c) % m

Here, Z[i] is the ith random number, a is the multiplier and c is the increment - for different a, c and m combinations you have different generators. This is known as the linear congruential generator introduced by Lehmer. The remainder of that division, or modulus (%), will generate a number between zero and m-1 and by setting U[i] = Z[i] / m you get random numbers between zero and one.

As you may have noticed, in order to start this generative process - in order to have a Z[1] you need to have a Z[0] - an initial value. This initial value that starts the process is called the seed. Take a look at this example:

The initial value, the seed is determined as 7 to start the process. However, that value is not used to generate a random number. Instead, it is used to generate the first Z.

The most important feature of a pseudo-random number generator would be its unpredictability. Generally, as long as you don't share your seed, you are fine with all seeds as the generators today are much more complex than this. However, as a further step you can generate the seed randomly as well. You can skip the first n numbers as another alternative.

Main source: Law, A. M. (2007). Simulation modeling and analysis. Tata McGraw-Hill.

edited Apr 26, 2016 at 5:41

answered Apr 25, 2016 at 17:52

user2285236

Sign up to request clarification or add additional context in comments.

3 Comments

en_Knight Over a year ago

I like this answer. Few things though: "most important feature... is unpredictability": you might want algorithmic reproducibility (because you're doing modelling not security) in which case you want the opposite - predictability. Also: How does one generate a seed randomly before they have a seed to generate random numbers with? Take the timestamp of the recursion-overflow exception that was raised :)? Also, numpy explicitely does not use that random generator - does this advice hold for numpy's random module, which was asked about?

user2285236 Over a year ago

I agree that reproducibility is very important- that's why we share seeds but that reproducibility does not exactly mean predictability I think. The numbers would still have the uniformity and independence features. They wouldn't be predicted only using those numbers. They could be calculated knowing the underlying process.

user2285236 Over a year ago

For defining the seed: AFAIK many applications are using something similar to what you desribed: nanoseconds of the current time mod some other thing, etc. Python is using Mersenne-Twister, and I'm not sure but possibly numpy is also using that en.wikipedia.org/wiki/Mersenne_Twister It is different from linear congruential generators but it also has this recursive process so as long as you have something called a seed to start a process, the same logic should apply, IMHO. :)

Jason S · Accepted Answer · 2017-04-15 02:47:56Z

The short answer:

There are three ways to seed() a random number generator in numpy.random:

use no argument or use None -- the RNG initializes itself from the OS's random number generator (which generally is cryptographically random)
use some 32-bit integer N -- the RNG will use this to initialize its state based on a deterministic function (same seed → same state)
use an array-like sequence of 32-bit integers n₀, n₁, n₂, etc. -- again, the RNG will use this to initialize its state based on a deterministic function (same values for seed → same state). This is intended to be done with a hash function of sorts, although there are magic numbers in the source code and it's not clear why they are doing what they're doing.

If you want to do something repeatable and simple, use a single integer.

If you want to do something repeatable but unlikely for a third party to guess, use a tuple or a list or a numpy array containing some sequence of 32-bit integers. You could, for example, use numpy.random with a seed of None to generate a bunch of 32-bit integers (say, 32 of them, which would generate a total of 1024 bits) from the OS's RNG, store in some seed S which you save in some secret place, then use that seed to generate whatever sequence R of pseudorandom numbers you wish. Then you can later recreate that sequence by re-seeding with S again, and as long as you keep the value of S secret (as well as the generated numbers R), no one would be able to reproduce that sequence R. If you just use a single integer, there's only 4 billion possibilities and someone could potentially try them all. That may be a bit on the paranoid side, but you could do it.

Longer answer

The numpy.random module uses the Mersenne Twister algorithm, which you can confirm yourself in one of two ways:

Either by looking at the documentation for numpy.random.RandomState, of which numpy.random uses an instance for the numpy.random.* functions (but you can also use an isolated independent instance of)
Looking at the source code in mtrand.pyx which uses something called Pyrex to wrap a fast C implementation, and randomkit.c and initarray.c.

In any case here's what the numpy.random.RandomState documentation says about seed():

Compatibility Guarantee A fixed seed and a fixed series of calls to RandomState methods using the same parameters will always produce the same results up to roundoff error except when the values were incorrect. Incorrect values will be fixed and the NumPy version in which the fix was made will be noted in the relevant docstring. Extension of existing parameter ranges and the addition of new parameters is allowed as long the previous behavior remains unchanged.

Parameters:
seed : {None, int, array_like}, optional

Random seed used to initialize the pseudo-random number generator. Can be any integer between 0 and 2**32 - 1 inclusive, an array (or other sequence) of such integers, or None (the default). If seed is None, then RandomState will try to read data from /dev/urandom (or the Windows analogue) if available or seed from the clock otherwise.

It doesn't say how the seed is used, but if you dig into the source code it refers to the init_by_array function: (docstring elided)

def seed(self, seed=None):
    cdef rk_error errcode
    cdef ndarray obj "arrayObject_obj"
    try:
        if seed is None:
            with self.lock:
                errcode = rk_randomseed(self.internal_state)
        else:
            idx = operator.index(seed)
            if idx > int(2**32 - 1) or idx < 0:
                raise ValueError("Seed must be between 0 and 2**32 - 1")
            with self.lock:
                rk_seed(idx, self.internal_state)
    except TypeError:
        obj = np.asarray(seed).astype(np.int64, casting='safe')
        if ((obj > int(2**32 - 1)) | (obj < 0)).any():
            raise ValueError("Seed must be between 0 and 2**32 - 1")
        obj = obj.astype('L', casting='unsafe')
        with self.lock:
            init_by_array(self.internal_state, <unsigned long *>PyArray_DATA(obj),
                PyArray_DIM(obj, 0))

And here's what the init_by_array function looks like:

extern void
init_by_array(rk_state *self, unsigned long init_key[], npy_intp key_length)
{
    /* was signed in the original code. RDH 12/16/2002 */
    npy_intp i = 1;
    npy_intp j = 0;
    unsigned long *mt = self->key;
    npy_intp k;

    init_genrand(self, 19650218UL);
    k = (RK_STATE_LEN > key_length ? RK_STATE_LEN : key_length);
    for (; k; k--) {
        /* non linear */
        mt[i] = (mt[i] ^ ((mt[i - 1] ^ (mt[i - 1] >> 30)) * 1664525UL))
            + init_key[j] + j;
        /* for > 32 bit machines */
        mt[i] &= 0xffffffffUL;
        i++;
        j++;
        if (i >= RK_STATE_LEN) {
            mt[0] = mt[RK_STATE_LEN - 1];
            i = 1;
        }
        if (j >= key_length) {
            j = 0;
        }
    }
    for (k = RK_STATE_LEN - 1; k; k--) {
        mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941UL))
             - i; /* non linear */
        mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
        i++;
        if (i >= RK_STATE_LEN) {
            mt[0] = mt[RK_STATE_LEN - 1];
            i = 1;
        }
    }

    mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */
    self->gauss = 0;
    self->has_gauss = 0;
    self->has_binomial = 0;
}

This essentially "munges" the random number state in a nonlinear, hash-like method using each value within the provided sequence of seed values.

Arshdeep Singh · Accepted Answer · 2017-07-09 10:42:07Z

5

What is normally called a random number sequence in reality is a "pseudo-random" number sequence because the values are computed using a deterministic algorithm and probability plays no real role.

The "seed" is a starting point for the sequence and the guarantee is that if you start from the same seed you will get the same sequence of numbers. This is very useful for example for debugging (when you are looking for an error in a program you need to be able to reproduce the problem and study it, a non-deterministic program would be much harder to debug because every run would be different).

answered Jul 9, 2017 at 10:42

Arshdeep Singh

5276 silver badges7 bronze badges

Comments

sjwarner · Accepted Answer · 2016-04-25 17:41:59Z

0

Basically the number guarantees the same 'randomness' every time.

More properly, the number is a seed, which can be an integer, an array (or other sequence) of integers of any length, or the default (none). If seed is none, then random will try to read data from /dev/urandom if available or make a seed from the clock otherwise.

Edit: In most honesty, as long as your program isn't something that needs to be super secure, it shouldn't matter what you pick. If this is the case, don't use these methods - use os.urandom() or SystemRandom if you require a cryptographically secure pseudo-random number generator.

The most important concept to understand here is that of pseudo-randomness. Once you understand this idea, you can determine if your program really needs a seed etc. I'd recommend reading here.

edited Apr 25, 2016 at 17:41

answered Apr 25, 2016 at 17:21

sjwarner

4621 gold badge8 silver badges21 bronze badges

2 Comments

en_Knight Over a year ago

This answer seems incomplete to me. Is it okay for me to write "random.seed(random.random())"? Is 7 better than 4? Is it just going to hash the array? Basically, what's going on here? "How do you choose the number" seems like the most important line in the question, imo

Jason S Over a year ago

@en_Knight -- I just attempted to give this kind of detail in my answer.

Community · Accepted Answer · 2017-05-23 12:25:39Z

0

To understand the meaning of random seeds, you need to first understand the "pseudo-random" number sequence because the values are computed using a deterministic algorithm.

So you can think of this number as a starting value to calulate the next number you get from the random generator. Putting the same value here will make your program getting the same "random" value everytime, so your program becomes deterministic.

As said in this post

they (numpy.random and random.random) both use the Mersenne twister sequence to generate their random numbers, and they're both completely deterministic - that is, if you know a few key bits of information, it's possible to predict with absolute certainty what number will come next.

If you really care about randomness, ask the user to generate some noise (some arbitary words) or just put the system time as seed.

If your codes run on Intel CPU (or AMD with newest chips) I also suggest you to check the RdRand package which uses the cpu instruction rdrand to collect "true" (hardware) randomness.

Refs:

edited May 23, 2017 at 12:25

CommunityBot

11 silver badge

answered Apr 25, 2016 at 17:38

gdlmx

6,8691 gold badge25 silver badges44 bronze badges

5 Comments

en_Knight Over a year ago

This seems important: "If you really care about randomness, ask the user to generate some noise (some arbitary words) or just put the system time as seed" but I think I buy only half of that. Seems like users might "randomly" pick similar common words every time - ppl aren't great at being random. Time is pretty solid advice. But there isn't enough here for me to know what to do - e.g., based on what you've written it seems wise for me to continue resetting the seed in session to ensure super unpredictable randomness, when in fact doing so would be counterproductive

gdlmx Over a year ago

I didn't suggest continue resetting is good. In contrast, I think you should set the seed only once. The key point is to prevent attacker from guessing your random number. If the someone can guess your seed of the first time then I assume he/she can also know the rest.

en_Knight Over a year ago

I don't see any security tag.. numpy is used for modelling and simulation a lot, and if that's the case then maybe we don't care attackers as much as being able to reproduce our experiments, right? My comment about "continue resetting" wasn't that you explicitly advocate for it, but that there isn't a lot of information here about how the seeds actually work. Imo, there should be enough info for me to know how to set a seed and what things are bad to do, if the question asks how to choose a seed, and there isn't anything here to dispel that common mistake, e.g.

en_Knight Over a year ago

I think this is actually a pretty good answer, I'm just looking for a little more before I upvote : )

gdlmx Over a year ago

I agree the context of this question should be scientific computing so I add the RdRand package here.

rsmith54 · Accepted Answer · 2020-01-31 15:19:30Z

0

One very specific answer: np.random.seed can take values from 0 and 2**32 - 1, which interestingly differs from random.seed which can take any hashable object.

answered Jan 31, 2020 at 15:19

rsmith54

8029 silver badges16 bronze badges

Comments

Yiwei Jiang · Accepted Answer · 2021-10-07 14:38:22Z

0

A side comment: better set your seed to a rather large number but still within the generator limit. Doing so can let the seed number have a good balance of 0 and 1 bits. Avoid having many 0 bits in the seed.

Reference: pyTorch documentation

answered Oct 7, 2021 at 14:38

Yiwei Jiang

1661 silver badge9 bronze badges

Collectives™ on Stack Overflow

What numbers that I can put in numpy.random.seed()?

7 Answers 7

3 Comments

Comments

Comments

2 Comments

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

Comments

Comments

2 Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related