CatBoostClassifier - Binary Classification with Catboost
Python
This code snippet trains a binary classification model using CatBoost. The model is trained using the training data and some specified parameters such as iterations, depth, learning_rate and l2_leaf_reg.
The code then plots the feature importances of the trained model using a horizontal bar chart. The importance values are obtained using Catboost's get_feature_importance() function.
The code makes predictions on the test set (X_test) using the trained model and evaluates the performance of the model using the classification report along with the log loss and ROC AUC evaluation metrics.
1| from catboost import CatBoostClassifier, Pool 2| import matplotlib.pyplot as plt 3| from sklearn.metrics import classification_report, log_loss, roc_auc_score 4| 5| # Step 1: Initialise and fit CatBoost binary classification model 6| model = CatBoostClassifier( 7| iterations=1000, 8| depth=4, 9| learning_rate=0.1, 10| l2_leaf_reg=1, 11| random_seed=101, 12| thread_count=-1, 13| train_dir='/train-dir' 14| ) 15| model.fit( 16| X_train, y_train, 17| eval_set=(X_test, y_test), 18| verbose=False, 19| plot=False 20| ) 21| 22| model.save_model('catboost_classification.model') 23| 24| # Step 2: Plot feature importances 25| features = X_train.columns 26| importance_values = model.get_feature_importance() 27| 28| plt.barh(y=range(len(features)), 29| width=importance_values, 30| tick_label=features) 31| plt.show() 32| 33| # Step 3: Make predictions for test data & evaluate performance 34| test_pool = Pool(X_test) 35| y_pred = model.predict(test_pool) 36| print('Classification Report:',classification_report(y_test, y_pred)) 37| print('Log Loss:',log_loss(y_test, y_pred)) 38| print('ROC AUC:',roc_auc_score(y_test, y_pred))
149
132
127
119