0

I'm using grid search to fit machine learning model parameters.

I typed in the following code (modified from the sklearn documentation page: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html)

from sklearn import svm, grid_search, datasets, cross_validation

# getting data
iris = datasets.load_iris()

# grid of parameters
parameters = {'kernel':('linear', 'poly'), 'C':[1, 10]}

# predictive model (support vector machine)
svr = svm.SVC()

# cross validation procedure
mycv = cross_validation.StratifiedKFold(iris.target, n_folds = 2)

# grid search engine
clf = grid_search.GridSearchCV(svr, parameters, mycv)

# fitting engine
clf.fit(iris.data, iris.target)

However, when I look at clf.estimator, I get the following:

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

How did I end up with a 'rbf' kernel? I didn't specify it as an option in my parameters.

What's going on?

Thanks!

P.S. I'm using '0.15-git' version for sklearn.

Addendum: I noticed that clf.best_estimator_ gives the right output. So what is clf.estimator doing?

4
  • I believe in your parameters dictionary the kernel key should have a list as its values. i.e. ['linear', 'poly'] (square brackets). rbf just showed up because it is the default. Commented Jun 6, 2014 at 18:12
  • Thanks. So clf.estimator doesn't really do anything? It's more like a placeholder for default values? Commented Jun 6, 2014 at 19:53
  • estimator is an object of the GridSearchCV class. If you create an instance of this class, i.e. clf, .estimator will return the object and in this case, since your initial code was erroneous, it returned the default. Commented Jun 6, 2014 at 20:08
  • Got it! Thanks! Although fixing the code to 'kernel':['linear', 'poly'] still returns kernel='rbf' for the clf.estimator attribute. Commented Jun 6, 2014 at 21:09

2 Answers 2

1

clf.estimator is simply a copy of the estimator passed as the first argument to the GridSearchCV object. Any parameters not grid searched over are determined by this estimator. Since you did not explicitly set any parameters for the SVC object svr, it was given all default values. Therefore, because clf.estimator is just a copy of svr, printing the value of clf.estimator returns an SVC object with default parameters. Had you instead written, e.g.,

svr = svm.SVC(C=4.3)

then the value of clf.estimator would have been:

SVC(C=4.3, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

There is no real value to the user in accessing clf.estimator, but then again it wasn't meant to be a public attribute anyways (since it doesn't end with a "_").

Sign up to request clarification or add additional context in comments.

2 Comments

Interesting. Thanks!
It's certainly meant to be a public attribute since it doesn't start with _. Attributes whose names end in _ are estimated by fit. So both estimator and best_estimator_ are public. The latter is the "result" of the grid search.
0

Grid Search is a hyperparameter tuning method that systematically tests multiple parameter combinations to find the best model configuration. In Scikit-Learn, this can be done using GridSearchCV.
For a more detailed explanation, you can check this video: https://youtu.be/819tMzaZ94s

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.