How to Train XGBoost with Imbalanced Data Using Scale_pos_weight
Python
In this example we train a binary classification model using XGBoost with data that is imbalanced.
To remedy this we calculate spw which is the number of records with a negative target (records that have value 0 in y_train) divided by the number of positive records (those with a value 1 in y_train)
Finally this weight is passed to the model as the scale_pos_weight parameter.
1| from xgboost import XGBClassifier 2| 3| # Step 1: Calculate the ratio negative to positive 4| # records in y_train 5| positive_records = y_train.sum() 6| negative_records = len(y_train) - positive_records 7| spw = negative_records / positive_records 8| 9| # Step 2: Pass this ratio to the scale_pos_weight 10| # parameter 11| model = XGBClassifier(booster='gbtree', 12| objective='binary:logistic', 13| max_depth=12, learning_rate=0.1, 14| n_estimators=10, 15| scale_pos_weight=spw, 16| random_state=101, 17| n_jobs=-1) 18| 19| # Step 3: Fit model 20| model.fit(X_train, y_train)
152
134
129
122