Never Forget Another Line of Code

Datasnips is a free code snippet hosting platform for Data Science & AI. It enables your code snippets to be organized, searchable & shareable.

PUBLIC SNIPPETS

LATEST SNIPPETS

TOP SNIPPETS

POPULAR TAGS

CatBoost Shap Summary Plot

Python

In this code snippet we use CatBoost's native SHAP implementation to calculate and plot the SHAP values, which explain how much each feature contributes to the final prediction.

We first create a Pool object using the test data as an argument. We then compute the SHAP values for the test data using the shap_values = model.get_feature_importance(pool, type='ShapValues') function, which returns a matrix of SHAP values for each instance and feature.

Finally, we plot the SHAP values using the shap.summary_plot() function, and pass the SHAP values and the test data as arguments.

 1|  from catboost import CatBoostRegressor, Pool
 2|  from sklearn.metrics import mean_squared_error, mean_absolute_error, max_error, explained_variance_score, mean_absolute_percentage_error
 3|  import matplotlib.pyplot as plt
 4|  
 5|  # Step 1: Initialise and fit CatBoost regression model
 6|  model = CatBoostRegressor(loss_function='RMSE', 
 7|                            n_estimators=1000,
 8|                            max_depth=4,
 9|                            learning_rate=0.1,
10|                            colsample_bylevel=0.9,
11|                            subsample=0.9,
12|                            random_state=101)
13|  model.fit(X_train, y_train)
14|  
15|  # Step 2: Evaluate feature importance using SHAP values
16|  pool = Pool(X_test)
17|  shap_values = model.get_feature_importance(pool, type='ShapValues')
18|  
19|  shap.summary_plot(shap_values[:,:-1], X_test)