Never Forget Another Line of Code

Datasnips is a free code snippet hosting platform for Data Science & AI. It enables your code snippets to be organized, searchable & shareable.

PUBLIC SNIPPETS

LATEST SNIPPETS

TOP SNIPPETS

POPULAR TAGS

CatBoostClassifier - Binary Classification with Catboost

Python

This code snippet trains a binary classification model using CatBoost. The model is trained using the training data and some specified parameters such as iterations, depth, learning_rate and l2_leaf_reg.

The code then plots the feature importances of the trained model using a horizontal bar chart. The importance values are obtained using Catboost's get_feature_importance() function.

The code makes predictions on the test set (X_test) using the trained model and evaluates the performance of the model using the classification report along with the log loss and ROC AUC evaluation metrics.

 1|  from catboost import CatBoostClassifier, Pool
 2|  import matplotlib.pyplot as plt
 3|  from sklearn.metrics import classification_report, log_loss, roc_auc_score
 4|  
 5|  # Step 1: Initialise and fit CatBoost binary classification model
 6|  model = CatBoostClassifier(
 7|      iterations=1000,
 8|      depth=4,
 9|      learning_rate=0.1,
10|      l2_leaf_reg=1,
11|      random_seed=101,
12|      thread_count=-1,
13|      train_dir='/train-dir'
14|  )
15|  model.fit(
16|      X_train, y_train,
17|      eval_set=(X_test, y_test),
18|      verbose=False,
19|      plot=False
20|  )
21|  
22|  model.save_model('catboost_classification.model')
23|  
24|  # Step 2: Plot feature importances
25|  features = X_train.columns
26|  importance_values = model.get_feature_importance()
27|  
28|  plt.barh(y=range(len(features)),
29|           width=importance_values,
30|           tick_label=features)
31|  plt.show()
32|  
33|  # Step 3: Make predictions for test data & evaluate performance
34|  test_pool = Pool(X_test)
35|  y_pred = model.predict(test_pool)
36|  print('Classification Report:',classification_report(y_test, y_pred))
37|  print('Log Loss:',log_loss(y_test, y_pred))
38|  print('ROC AUC:',roc_auc_score(y_test, y_pred))