I am testing some possibilities around random_state. Can you explain how random_state = 0 and random_state = numpy.random.RandomState(0) differ from each other ?
Code
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np
import random
for i in range(5):
########### code for random_state= numpy.random.RandomState(i) ##############
rng = np.random.RandomState(i)
X, y = make_classification(random_state=rng)
rf = RandomForestClassifier(random_state=rng)
X_train, X_test, y_train, y_test = train_test_split(X, y,
random_state=rng)
p1=rf.fit(X_train, y_train).score(X_test, y_test)
########### code for random_state= integer ##############
X, y = make_classification(random_state=i)
rf = RandomForestClassifier(random_state=i)
X_train, X_test, y_train, y_test = train_test_split(X, y,
random_state=i)
p2=rf.fit(X_train, y_train).score(X_test, y_test)
print(i,p1,p2)
Output
0 0.84 0.92
1 1.0 0.92
2 0.88 0.92
3 0.84 0.88
4 1.0 1.0
random_state = 0doesn't use random value and you always will have the same results. Butrng = np.random.RandomState(i)can generate random values whenever you userng. Try to use the samerngto create two trees and you get different results. Try to use the sameito create two trees and you get the same results.