CatBoost Shap Summary Plot

Python

In this code snippet we use CatBoost's native SHAP implementation to calculate and plot the SHAP values, which explain how much each feature contributes to the final prediction.

We first create a Pool object using the test data as an argument. We then compute the SHAP values for the test data using the shap_values = model.get_feature_importance(pool, type='ShapValues') function, which returns a matrix of SHAP values for each instance and feature.

Finally, we plot the SHAP values using the shap.summary_plot() function, and pass the SHAP values and the test data as arguments.

 1|  from catboost import CatBoostRegressor, Pool
 2|  from sklearn.metrics import mean_squared_error, mean_absolute_error, max_error, explained_variance_score, mean_absolute_percentage_error
 3|  import matplotlib.pyplot as plt
 4|  
 5|  # Step 1: Initialise and fit CatBoost regression model
 6|  model = CatBoostRegressor(loss_function='RMSE', 
 7|                            n_estimators=1000,
 8|                            max_depth=4,
 9|                            learning_rate=0.1,
10|                            colsample_bylevel=0.9,
11|                            subsample=0.9,
12|                            random_state=101)
13|  model.fit(X_train, y_train)
14|  
15|  # Step 2: Evaluate feature importance using SHAP values
16|  pool = Pool(X_test)
17|  shap_values = model.get_feature_importance(pool, type='ShapValues')
18|  
19|  shap.summary_plot(shap_values[:,:-1], X_test)
Did you find this snippet useful?

Sign up for free to to add this to your code library