0

I am testing some possibilities around random_state. Can you explain how random_state = 0 and random_state = numpy.random.RandomState(0) differ from each other ?

Code

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np
import random

for i in range(5):
    
    ###########  code for random_state= numpy.random.RandomState(i) ##############
    
    rng = np.random.RandomState(i)
    X, y = make_classification(random_state=rng)
    rf = RandomForestClassifier(random_state=rng)
    X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                        random_state=rng)
    p1=rf.fit(X_train, y_train).score(X_test, y_test)
    
   ###########  code for random_state= integer ##############

    X, y = make_classification(random_state=i)
    rf = RandomForestClassifier(random_state=i)
    X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                        random_state=i)
    p2=rf.fit(X_train, y_train).score(X_test, y_test)
    print(i,p1,p2)

Output

0  0.84   0.92
1  1.0    0.92
2  0.88   0.92
3  0.84   0.88
4  1.0    1.0
1
  • I think random_state = 0 doesn't use random value and you always will have the same results. But rng = np.random.RandomState(i) can generate random values whenever you use rng. Try to use the same rng to create two trees and you get different results. Try to use the same i to create two trees and you get the same results. Commented Aug 24, 2022 at 12:05

1 Answer 1

1

Setting the random_state = 1 sets a fixed seed (e.g. 1) for the splitting of train/test sets.

Setting the random_state = np.random.RandomState(1) will set the seed as a random variable with seed 1. At each iteration, np.random.RandomState instance will change randomly each time, splitting in a non repeatable way the sets.

Use a normal integer as random_state if you want repeatable splits, or use nothing to have random splits.

Using RandomState makes sense only if you want to split randomly your sets according to some particular distribution (with fixed seed). See the official numpy docs about it

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.