How does random_state=0 and random_state= numpy.random.RandomState(0) differ from each other?

Question

I am testing some possibilities around random_state. Can you explain how random_state = 0 and random_state = numpy.random.RandomState(0) differ from each other ?

Code

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np
import random

for i in range(5):
    
    ###########  code for random_state= numpy.random.RandomState(i) ##############
    
    rng = np.random.RandomState(i)
    X, y = make_classification(random_state=rng)
    rf = RandomForestClassifier(random_state=rng)
    X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                        random_state=rng)
    p1=rf.fit(X_train, y_train).score(X_test, y_test)
    
   ###########  code for random_state= integer ##############

    X, y = make_classification(random_state=i)
    rf = RandomForestClassifier(random_state=i)
    X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                        random_state=i)
    p2=rf.fit(X_train, y_train).score(X_test, y_test)
    print(i,p1,p2)

Output

0  0.84   0.92
1  1.0    0.92
2  0.88   0.92
3  0.84   0.88
4  1.0    1.0

I think random_state = 0 doesn't use random value and you always will have the same results. But rng = np.random.RandomState(i) can generate random values whenever you use rng. Try to use the same rng to create two trees and you get different results. Try to use the same i to create two trees and you get the same results. — furas
– furas, Commented Aug 24, 2022 at 12:05

francand · Accepted Answer · 2022-08-24 12:45:46Z

1

Setting the random_state = 1 sets a fixed seed (e.g. 1) for the splitting of train/test sets.

Setting the random_state = np.random.RandomState(1) will set the seed as a random variable with seed 1. At each iteration, np.random.RandomState instance will change randomly each time, splitting in a non repeatable way the sets.

Use a normal integer as random_state if you want repeatable splits, or use nothing to have random splits.

Using RandomState makes sense only if you want to split randomly your sets according to some particular distribution (with fixed seed). See the official numpy docs about it

answered Aug 24, 2022 at 12:45

francand

1,4957 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How does random_state=0 and random_state= numpy.random.RandomState(0) differ from each other?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related