export SHAP waterfall plot to dataframe

Question

I am working on a binary classification using random forest model, neural networks in which am using SHAP to explain the model predictions. I followed the tutorial and wrote the below code to get the waterfall plot shown below

row_to_show = 20
data_for_prediction = ord_test_t.iloc[row_to_show]  # use 1 row of data here. Could use multiple rows if desired
data_for_prediction_array = data_for_prediction.values.reshape(1, -1)
rf_boruta.predict_proba(data_for_prediction_array)
explainer = shap.TreeExplainer(rf_boruta)
# Calculate Shap values
shap_values = explainer.shap_values(data_for_prediction)
shap.plots._waterfall.waterfall_legacy(explainer.expected_value[0], shap_values[0],ord_test_t.iloc[row_to_show])

This generated the plot as shown below

However, I want to export this to dataframe and how can I do it?

I expect my output to be like as shown below. I want to export this for the full dataframe. Can you help me please?

desertnaut · Accepted Answer · 2025-05-26 00:58:54Z

2

Let's do a small experiment:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from shap import TreeExplainer

X, y = load_breast_cancer(return_X_y=True)
model = RandomForestClassifier(max_depth=5, n_estimators=100).fit(X, y)
explainer = TreeExplainer(model)

What is explainer here? If you do dir(explainer) you'll find out it has some methods and attributes among which is:

explainer.expected_value

which is of interest to you because this is base on which SHAP values add up.

Furthermore:

sv = explainer.shap_values(X)
len(sv)

will give a hint sv is a list consisting of 2 objects which are most probably SHAP values for 1 and 0, which must be symmetric (because what moves towards 1 moves exactly by the same amount, but with opposite sign, towards 0).

Hence:

sv1 = sv[1]

Now you have everything to pack it to the desired format:

df = pd.DataFrame(sv1, columns=X.columns)
df.insert(0, 'bv', explainer.expected_value[1])

edited May 26 at 0:58

desertnaut

60.8k32 gold badges155 silver badges183 bronze badges

answered Mar 31, 2022 at 17:46

Sergey Bushmanov

25.5k8 gold badges63 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

The Great Over a year ago

When you do 'sv[1], does it get shap value for class 1 or class 0? Orbit isbjust index position?, because we want shap values of class 1.

Sergey Bushmanov Over a year ago

These are SV for class 1.

The Great Over a year ago

One thing is, I realized that when we put shap values in dataframe (df), it loses the indices. How do we know then which row belongs to which record in X?

The Great Over a year ago

Should we assume that it's in the same order (shap df) as our input dataframe (which is X)?... If my X dataframe first row (starts with an index of 14), am I right to understand that index 0 in shap df belongs to index 14 of input dataframe?

Sergey Bushmanov Over a year ago

I believe it's not an assumption but the way it works: if you put your data to Pandas df, the order is preserved.

|

Sergey Bushmanov · Accepted Answer · 2022-04-05 09:38:25Z

2

If I recall correctly, you can do something like this with pandas

import pandas as pd
    
shap_values = explainer.shap_values(data_for_prediction)
shap_values_df = pd.DataFrame(shap_values)

to get the feature names, you should do something like this (if data_for_prediction is a dataframe):

feature_names = data_for_prediction.columns.tolist()
shap_df = pd.DataFrame(shap_values.values, columns=feature_names)

edited Apr 5, 2022 at 9:38

Sergey Bushmanov

25.5k8 gold badges63 silver badges84 bronze badges

answered Mar 31, 2022 at 13:03

Nikaido

4,6675 gold badges37 silver badges51 bronze badges

8 Comments

The Great Over a year ago

thanks for the help. Upvoted. Can you show me how can we get the base value, shap value and feature name along with row index?

The Great Over a year ago

Currentky, your code guves only shap value and features?, how can I get the base valir and row ids?

Nikaido Over a year ago

@TheGreat I don't have an example to run right now, I edited the post. Let me now if with the update there are some improvements for what you want

The Great Over a year ago

But your code doesn't get base value. Right? It only gets shap values?

Nikaido Over a year ago

@TheGreat if you mean the base features values, you should do a concat of the dataframe with shapvalues with your original data, doing a reshape if necessary pd.concat([base_values, shap_values_df], axis=1) or a join with the instance id

|

Gwendal Yviquel · Accepted Answer · 2022-03-31 13:12:00Z

1

I'm a currenty using that :

def getShapReport(classifier,X_test):
   shap_values = shap.TreeExplainer(classifier).shap_values(X_test)
   shap.summary_plot(shap_values, X_test)
   shap.summary_plot(shap_values[1], X_test)
   return pd.DataFrame(shap_values[1])

It first displays the shap values for the model, and for each prediction after that, and finally it returns the dataframe for the positive class(i'm on an imbalance context)

It is for a Tree explainer and not a waterfall, but it is basically the same.

answered Mar 31, 2022 at 13:12

Gwendal Yviquel

4025 silver badges14 bronze badges

7 Comments

The Great Over a year ago

Thanks for the help. Upvoted. Yes, am working on imbalanced data as well and need to find for positive class (label 1). Your code also does the same? Can you also show in your code, how can we get the base value for each instance/row along with shapley value?

Gwendal Yviquel Over a year ago

Pandas keep the index, so you can basicaly (pandas.)concat your shapley values and you prediction set to match feature values and feature importance

The Great Over a year ago

Thanks but my comment is based on shapley base/expected value. It is different from feature SHAP value

Gwendal Yviquel Over a year ago

I don't get it right. The "by row" shapley values is based on predict, if you want "base" row values, you need to predict on test, just to compute it. Or if i'm not wrong on some models like lgbm, you can ask the model to compute it, not sure if it's available on train.

The Great Over a year ago

If you look at my expected output, I would like to get the feature importance and base value for test data dataframe

|

Collectives™ on Stack Overflow

export SHAP waterfall plot to dataframe

3 Answers 3

11 Comments

8 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

11 Comments

8 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related