How to Train XGBoost with Imbalanced Data Using Scale_pos_weight

Python

In this example we train a binary classification model using XGBoost with data that is imbalanced.

To remedy this we calculate spw which is the number of records with a negative target (records that have value 0 in y_train) divided by the number of positive records (those with a value 1 in y_train)

Finally this weight is passed to the model as the scale_pos_weight parameter.

 1|  from xgboost import XGBClassifier
 2|  
 3|  # Step 1: Calculate the ratio negative to positive 
 4|  # records in y_train
 5|  positive_records = y_train.sum()
 6|  negative_records = len(y_train) - positive_records
 7|  spw = negative_records / positive_records
 8|  
 9|  # Step 2: Pass this ratio to the scale_pos_weight 
10|  # parameter
11|  model = XGBClassifier(booster='gbtree', 
12|  			objective='binary:logistic', 
13|  			max_depth=12, learning_rate=0.1, 
14|  			n_estimators=10, 
15|  			scale_pos_weight=spw, 
16|  			random_state=101, 
17|  			n_jobs=-1)
18|  
19|  # Step 3: Fit model
20|  model.fit(X_train, y_train)
Did you find this snippet useful?

Sign up for free to to add this to your code library