2

I am working on a binary classification using random forest algorithm

Currently, am trying to explain the model predictions using SHAP values.

So, I referred this useful post here and tried the below.

from shap import TreeExplainer, Explanation
from shap.plots import waterfall
sv = explainer(ord_test_t)
exp = Explanation(sv.values[:,:,1], 
                  sv.base_values[:,1], 
                  data=ord_test_t.values, 
                  feature_names=ord_test_t.columns)
idx = 20
waterfall(exp[idx])

I like the above approach as it allows to display the feature values along with waterfall plot. So, I wish to use this approach

However, this doesn't help me get the waterfall for a specific row in ord_test_t (test data).

For example, let's consider that ord_test_t.Index.tolist() returns 3,5,8,9 etc...

Now, I want to plot the waterfall plot for ord_test_t.iloc[[9]] but when I pass exp[9], it just gets the 9th row but not the index named as 9.

When I try exp.iloc[[9]] it throws error as explanation object doesnt have iloc.

Can help me with this please?

2
  • What model do you use? LGB, XGB, RF? Commented May 19, 2022 at 2:55
  • I built my classification model using random forest algorithm Commented May 19, 2022 at 5:24

1 Answer 1

1

My suggestion is as following:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from shap import TreeExplainer, Explanation
from shap.plots import waterfall

import shap

print(shap.__version__)

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

idx = 9
model = RandomForestClassifier(max_depth=5, n_estimators=100).fit(X, y)
explainer = TreeExplainer(model)
sv = explainer(X.loc[[idx]])    # corrected, pass the row of interest as df
exp = Explanation(
    sv.values[:, :, 1],         # class to explain
    sv.base_values[:, 1],
    data=X.loc[[idx]].values,   # corrected, pass the row of interest as df
    feature_names=X.columns,
)
waterfall(exp[0])               # pretend you have only 1 data point which is 0th 

0.40.0

enter image description here

Proof:

model.predict_proba(X.loc[[idx]]) # corrected

array([[0.95752656, 0.04247344]])
Sign up to request clarification or add additional context in comments.

14 Comments

One quick question. For display purpose, is it okay if i switch the sign of values? For ex: my business wants to have negative sign for positive outcome (minority cass of interest)
And positive sign for majority class. Of course, model will work towards predictingclass 1 (minority class) bit i wish to change signs alonebut retain the shap value as it is. So, final probability becomes 1 - P
is it okay to do this way?
It sounds to me as if you want substitute class 1 for class 0. Why not do it this way?
but that will degrade the model performance because as you may know that some models metrics like precisin, recall, etc focus on optimizing towards class positive (majority) only. You can read more about this in the url linked. They say inverting label is the best option. stats.stackexchange.com/questions/563545/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.