9

I have instantiated a SVC object using the sklearn library with the following code:

clf = svm.SVC(kernel='linear', C=1, cache_size=1000, max_iter = -1, verbose = True)

I then fit data to it using:

model = clf.fit(X_train, y_train)

Where X_train is a (301,60) and y_train is (301,) ndarray (y_train consisting of class labels "1", "2" and "3").

Now, before I stumbled across the .score() method, to determine the accuracy of my model on the training set i was using the following:

prediction = np.divide((y_train == model.predict(X_train)).sum(), y_train.size, dtype = float)

which gives a result of approximately 62%.

However, when using the model.score(X_train, y_train) method I get a result of approximately 83%.

Therefore, I was wondering if anyone could explain to me why this should be the case because as far as I understand, they should return the same result?

ADDENDUM:

The first 10 values of y_true are:

  • 2, 3, 1, 3, 2, 3, 2, 2, 3, 1, ...

Whereas for y_pred (when using model.predict(X_train)), they are:

  • 2, 3, 3, 2, 2, 3, 2, 3, 3, 3, ...
1
  • That's weird, can you post some subset of your data (at least some y_true and y_pred values)? Commented Jan 22, 2015 at 23:32

1 Answer 1

6

Because your y_train is (301, 1) and not (301,) numpy does broadcasting, so

(y_train == model.predict(X_train)).shape == (301, 301)

which is not what you intended. The correct version of your code would be

np.mean(y_train.ravel() == model.predict(X_train))

which will give the same result as

model.score(X_train, y_train)
Sign up to request clarification or add additional context in comments.

7 Comments

Unfortunately, i was incorrect when stating the question, y_train is in fact a (301,) - my mistake (question has been edited)!
That being said, when using np.mean(y_train.ravel() == model.predict(X_train)) I still get a training accuracy of 60ish percent. :(
What is shape and dtype of y_train, X_train, model.predict(X_train)`` and y_train == model.predict(X_train)?
y_train : (301,) int64; X_train : (301, 60) float64; model.predict(X_train) : (301,) int64; y_train == model.predict(X_train) : bool; Does that help?
Turns out, due to a nuance in the way I way handling my data set, X_train was slightly modified between the two function calls, hence the discrepancy in the accuracy results. Thank you for your help and I apologise for sending you on a wild goose chase. Cheers!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.