Difference between predict vs predict_proba in scikit-learn

Question

Suppose I have created a model, and my target variable is either 0, 1 or 2. It seems that if I use predict, the answer is either of 0, or 1 or 2. But if I use predict_proba, I get a row with 3 cols for each row as follows, for example

   model = ... Classifier       # It could be any classifier
   m1 = model.predict(mytest)
   m2= model.predict_proba(mytest)

   # Now suppose  m1[3] = [0.6, 0.2, 0.2]

Suppose I use both predict and predict_proba. If in index 3, I get the above result with the result of predict_proba, in index 3 of the result of predict I should see 0. Is this the case? I am trying to understand how using both predict and predict_proba on the same model relate to each other.

Please, instead of "suppose", post an actual code example of using both predict and predict_proba, so we can ground the discussion in an actual (and not hypothetical) case. — desertnaut
– desertnaut, Commented Apr 13, 2020 at 9:50
Still unclear. m1 is supposed to contain single numbers (classes), while here you show it as if containing probabilities. Please, take your time, focus, and update/clarify the question accordingly (the idea was to get rid of "suppose", by showing an actual example of both predict and predict_proba on the same test sample and focus the question on this, but you haven't done so). — desertnaut
– desertnaut, Commented Apr 13, 2020 at 15:37
Possible duplicate: stackoverflow.com/questions/56397128/… — M.Mavini
– M.Mavini, Commented Oct 16, 2021 at 9:39

Giorgos Myrianthous · Accepted Answer · 2022-03-14 17:43:28Z

38

predict() is used to predict the actual class (in your case one of 0, 1, or 2).
predict_proba() is used to predict the class probabilities

From the example output that you shared,

predict() would output class 0 since the class probability for 0 is 0.6.
[0.6, 0.2, 0.2] is the output of predict_proba that simply denotes that the class probability for classes 0, 1, and 2 are 0.6, 0.2, and 0.2 respectively.

Now as the documentation mentions for predict_proba, the resulting array is ordered based on the labels you've been using:

The returned estimates for all classes are ordered by the label of classes.

Therefore, in your case where your class labels are [0, 1, 2], the corresponding output of predict_proba will contain the corresponding probabilities. 0.6 is the probability of the instance to be classified as 0 and 0.2 are the probabilities that the instance is categorised as 1 and 2 respectively.

For a more comprehensive explanation, refer to the article What is the difference between predict() and predict_proba() in scikit-learn on TDS.

edited Mar 14, 2022 at 17:43

answered Apr 13, 2020 at 10:03

Giorgos Myrianthous

40.4k21 gold badges156 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

User 19826 Over a year ago

@Giorgos, please note my question is regarding the relationship between exact indexes of these two. Also, I wonder if there is a typo in your answer, there are two ones as output of predict()

Collectives™ on Stack Overflow

Difference between predict vs predict_proba in scikit-learn

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related