SHAP plotting waterfall using an index value in dataframe

Question

I am working on a binary classification using random forest algorithm

Currently, am trying to explain the model predictions using SHAP values.

So, I referred this useful post here and tried the below.

from shap import TreeExplainer, Explanation
from shap.plots import waterfall
sv = explainer(ord_test_t)
exp = Explanation(sv.values[:,:,1], 
                  sv.base_values[:,1], 
                  data=ord_test_t.values, 
                  feature_names=ord_test_t.columns)
idx = 20
waterfall(exp[idx])

I like the above approach as it allows to display the feature values along with waterfall plot. So, I wish to use this approach

However, this doesn't help me get the waterfall for a specific row in ord_test_t (test data).

For example, let's consider that ord_test_t.Index.tolist() returns 3,5,8,9 etc...

Now, I want to plot the waterfall plot for ord_test_t.iloc[[9]] but when I pass exp[9], it just gets the 9th row but not the index named as 9.

When I try exp.iloc[[9]] it throws error as explanation object doesnt have iloc.

Can help me with this please?

I built my classification model using random forest algorithm — The Great
– The Great, Commented May 19, 2022 at 5:24

Sergey Bushmanov · Accepted Answer · 2022-05-30 17:16:12Z

1

My suggestion is as following:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from shap import TreeExplainer, Explanation
from shap.plots import waterfall

import shap

print(shap.__version__)

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

idx = 9
model = RandomForestClassifier(max_depth=5, n_estimators=100).fit(X, y)
explainer = TreeExplainer(model)
sv = explainer(X.loc[[idx]])    # corrected, pass the row of interest as df
exp = Explanation(
    sv.values[:, :, 1],         # class to explain
    sv.base_values[:, 1],
    data=X.loc[[idx]].values,   # corrected, pass the row of interest as df
    feature_names=X.columns,
)
waterfall(exp[0])               # pretend you have only 1 data point which is 0th

0.40.0

Proof:

model.predict_proba(X.loc[[idx]]) # corrected

array([[0.95752656, 0.04247344]])

edited May 30, 2022 at 17:16

answered May 19, 2022 at 17:06

Sergey Bushmanov

25.5k8 gold badges63 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

The Great Over a year ago

One quick question. For display purpose, is it okay if i switch the sign of values? For ex: my business wants to have negative sign for positive outcome (minority cass of interest)

The Great Over a year ago

And positive sign for majority class. Of course, model will work towards predictingclass 1 (minority class) bit i wish to change signs alonebut retain the shap value as it is. So, final probability becomes 1 - P

The Great Over a year ago

is it okay to do this way?

Sergey Bushmanov Over a year ago

It sounds to me as if you want substitute class 1 for class 0. Why not do it this way?

The Great Over a year ago

but that will degrade the model performance because as you may know that some models metrics like precisin, recall, etc focus on optimizing towards class positive (majority) only. You can read more about this in the url linked. They say inverting label is the best option. stats.stackexchange.com/questions/563545/…

|

Collectives™ on Stack Overflow

SHAP plotting waterfall using an index value in dataframe

1 Answer 1

14 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

14 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related