27

Suppose I have created a model, and my target variable is either 0, 1 or 2. It seems that if I use predict, the answer is either of 0, or 1 or 2. But if I use predict_proba, I get a row with 3 cols for each row as follows, for example

   model = ... Classifier       # It could be any classifier
   m1 = model.predict(mytest)
   m2= model.predict_proba(mytest)

   # Now suppose  m1[3] = [0.6, 0.2, 0.2]

Suppose I use both predict and predict_proba. If in index 3, I get the above result with the result of predict_proba, in index 3 of the result of predict I should see 0. Is this the case? I am trying to understand how using both predict and predict_proba on the same model relate to each other.

4
  • 4
    Please, instead of "suppose", post an actual code example of using both predict and predict_proba, so we can ground the discussion in an actual (and not hypothetical) case. Commented Apr 13, 2020 at 9:50
  • Thanks, I will edit my question Commented Apr 13, 2020 at 15:26
  • 1
    Still unclear. m1 is supposed to contain single numbers (classes), while here you show it as if containing probabilities. Please, take your time, focus, and update/clarify the question accordingly (the idea was to get rid of "suppose", by showing an actual example of both predict and predict_proba on the same test sample and focus the question on this, but you haven't done so). Commented Apr 13, 2020 at 15:37
  • 1
    Possible duplicate: stackoverflow.com/questions/56397128/… Commented Oct 16, 2021 at 9:39

1 Answer 1

38
  • predict() is used to predict the actual class (in your case one of 0, 1, or 2).
  • predict_proba() is used to predict the class probabilities

From the example output that you shared,

  • predict() would output class 0 since the class probability for 0 is 0.6.
  • [0.6, 0.2, 0.2] is the output of predict_proba that simply denotes that the class probability for classes 0, 1, and 2 are 0.6, 0.2, and 0.2 respectively.

Now as the documentation mentions for predict_proba, the resulting array is ordered based on the labels you've been using:

The returned estimates for all classes are ordered by the label of classes.

Therefore, in your case where your class labels are [0, 1, 2], the corresponding output of predict_proba will contain the corresponding probabilities. 0.6 is the probability of the instance to be classified as 0 and 0.2 are the probabilities that the instance is categorised as 1 and 2 respectively.


For a more comprehensive explanation, refer to the article What is the difference between predict() and predict_proba() in scikit-learn on TDS.

Sign up to request clarification or add additional context in comments.

1 Comment

@Giorgos, please note my question is regarding the relationship between exact indexes of these two. Also, I wonder if there is a typo in your answer, there are two ones as output of predict()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.