Python scikit-learn (using grid_search.GridSearchCV)

Question

I'm using grid search to fit machine learning model parameters.

I typed in the following code (modified from the sklearn documentation page: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html)

from sklearn import svm, grid_search, datasets, cross_validation

# getting data
iris = datasets.load_iris()

# grid of parameters
parameters = {'kernel':('linear', 'poly'), 'C':[1, 10]}

# predictive model (support vector machine)
svr = svm.SVC()

# cross validation procedure
mycv = cross_validation.StratifiedKFold(iris.target, n_folds = 2)

# grid search engine
clf = grid_search.GridSearchCV(svr, parameters, mycv)

# fitting engine
clf.fit(iris.data, iris.target)

However, when I look at clf.estimator, I get the following:

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

How did I end up with a 'rbf' kernel? I didn't specify it as an option in my parameters.

What's going on?

Thanks!

P.S. I'm using '0.15-git' version for sklearn.

Addendum: I noticed that clf.best_estimator_ gives the right output. So what is clf.estimator doing?

I believe in your parameters dictionary the kernel key should have a list as its values. i.e. ['linear', 'poly'] (square brackets). rbf just showed up because it is the default. — o-90
– o-90, Commented Jun 6, 2014 at 18:12
Thanks. So clf.estimator doesn't really do anything? It's more like a placeholder for default values? — monkeybiz7
– monkeybiz7, Commented Jun 6, 2014 at 19:53
estimator is an object of the GridSearchCV class. If you create an instance of this class, i.e. clf, .estimator will return the object and in this case, since your initial code was erroneous, it returned the default. — o-90
– o-90, Commented Jun 6, 2014 at 20:08
Got it! Thanks! Although fixing the code to 'kernel':['linear', 'poly'] still returns kernel='rbf' for the clf.estimator attribute. — monkeybiz7
– monkeybiz7, Commented Jun 6, 2014 at 21:09

DavidS · Accepted Answer · 2015-01-26 19:31:46Z

1

clf.estimator is simply a copy of the estimator passed as the first argument to the GridSearchCV object. Any parameters not grid searched over are determined by this estimator. Since you did not explicitly set any parameters for the SVC object svr, it was given all default values. Therefore, because clf.estimator is just a copy of svr, printing the value of clf.estimator returns an SVC object with default parameters. Had you instead written, e.g.,

svr = svm.SVC(C=4.3)

then the value of clf.estimator would have been:

SVC(C=4.3, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

There is no real value to the user in accessing clf.estimator, but then again it wasn't meant to be a public attribute anyways (since it doesn't end with a "_").

edited Jan 26, 2015 at 19:31

answered Jun 6, 2014 at 22:43

DavidS

2,4641 gold badge19 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

monkeybiz7 Over a year ago

Interesting. Thanks!

Fred Foo Over a year ago

It's certainly meant to be a public attribute since it doesn't start with _. Attributes whose names end in _ are estimated by fit. So both estimator and best_estimator_ are public. The latter is the "result" of the grid search.

Stacy Martin · Accepted Answer · 2025-02-25 11:46:01Z

0

Grid Search is a hyperparameter tuning method that systematically tests multiple parameter combinations to find the best model configuration. In Scikit-Learn, this can be done using GridSearchCV.
For a more detailed explanation, you can check this video: https://youtu.be/819tMzaZ94s

answered Feb 25 at 11:46

Stacy Martin

1

1 Comment

Community Feb 26 at 9:27

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

Python scikit-learn (using grid_search.GridSearchCV)

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related