6

I have a dataset named "admissions".

I am trying to carry out holdout validation on a simple dataset. In order to carry out permutation on the index of the dataset, I use the following command:

import numpy as np
np.random.permutation(admissions.index)

Do I need to use np.random.seed() before the permutation? If so, then why and what does the number in np.random.seed(number)represent?

4
  • 3
    check it here: docs.scipy.org/doc/numpy/reference/generated/… Commented Nov 25, 2016 at 16:09
  • 6
    If you want to be able to repeat the experiment with exactly the same permutation (say for debugging purposes), you need to set a reproducible seed. If you don't need to be able to repeat, then you can skip the explicit seeding part. If you do set an explicit seed for debugging, remove it when you are done debugging. Commented Nov 25, 2016 at 16:12
  • 3
    Possible duplicate of random.seed(): What does it do? Commented Nov 25, 2016 at 16:21
  • C.f. stackoverflow.com/questions/5836335/… Commented Nov 25, 2016 at 19:48

2 Answers 2

7

You don't need to initialize the seed before the random permutation, because this is already set for you. According to the documentation of RandomState:

Parameters:
seed : {None, int, array_like}, optional Random seed initializing the pseudo-random number generator. Can be an integer, an array (or other sequence) of integers of any length, or None (the default). If seed is None, then RandomState will try to read data from /dev/urandom (or the Windows analogue) if available or seed from the clock otherwise.

The concept of seed is relevant for the generation of random numbers. You can read more about it here.

To integrate this answer with a comment (from JohnColeman) to your question, I want to mention this example:

>>> numpy.random.seed(0)
>>> numpy.random.permutation(4)
array([2, 3, 1, 0])
>>> numpy.random.seed(0)
>>> numpy.random.permutation(4)
array([2, 3, 1, 0])
Sign up to request clarification or add additional context in comments.

Comments

4

Note that np.random.seed is deprecated and only kept around for backwards-compatibility. That's because re-seeding an existing random-number generator (RNG) is bad practice. If you need to seed (e.g., to make computations reproducible for tests), create a new RNG:

import numpy as np


rng = np.random.default_rng(seed=0)
out = rng.random(5)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.