Difference between .score() and .predict in the sklearn library?

Question

I have instantiated a SVC object using the sklearn library with the following code:

clf = svm.SVC(kernel='linear', C=1, cache_size=1000, max_iter = -1, verbose = True)

I then fit data to it using:

model = clf.fit(X_train, y_train)

Where X_train is a (301,60) and y_train is (301,) ndarray (y_train consisting of class labels "1", "2" and "3").

Now, before I stumbled across the .score() method, to determine the accuracy of my model on the training set i was using the following:

prediction = np.divide((y_train == model.predict(X_train)).sum(), y_train.size, dtype = float)

which gives a result of approximately 62%.

However, when using the model.score(X_train, y_train) method I get a result of approximately 83%.

Therefore, I was wondering if anyone could explain to me why this should be the case because as far as I understand, they should return the same result?

ADDENDUM:

The first 10 values of y_true are:

2, 3, 1, 3, 2, 3, 2, 2, 3, 1, ...

Whereas for y_pred (when using model.predict(X_train)), they are:

2, 3, 3, 2, 2, 3, 2, 3, 3, 3, ...

That's weird, can you post some subset of your data (at least some y_true and y_pred values)? — elyase
– elyase, Commented Jan 22, 2015 at 23:32

Andreas Mueller · Accepted Answer · 2015-01-23 03:22:56Z

6

Because your y_train is (301, 1) and not (301,) numpy does broadcasting, so

(y_train == model.predict(X_train)).shape == (301, 301)

which is not what you intended. The correct version of your code would be

np.mean(y_train.ravel() == model.predict(X_train))

which will give the same result as

model.score(X_train, y_train)

answered Jan 23, 2015 at 3:22

Andreas Mueller

28.9k8 gold badges65 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

precicely Over a year ago

Unfortunately, i was incorrect when stating the question, y_train is in fact a (301,) - my mistake (question has been edited)!

precicely Over a year ago

That being said, when using np.mean(y_train.ravel() == model.predict(X_train)) I still get a training accuracy of 60ish percent. :(

Andreas Mueller Over a year ago

What is shape and dtype of y_train, X_train, model.predict(X_train)`` and y_train == model.predict(X_train)?

precicely Over a year ago

y_train : (301,) int64; X_train : (301, 60) float64; model.predict(X_train) : (301,) int64; y_train == model.predict(X_train) : bool; Does that help?

precicely Over a year ago

Turns out, due to a nuance in the way I way handling my data set, X_train was slightly modified between the two function calls, hence the discrepancy in the accuracy results. Thank you for your help and I apologise for sending you on a wild goose chase. Cheers!

|

Collectives™ on Stack Overflow

Difference between .score() and .predict in the sklearn library?

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related