Sklearn Cross Validation with Logistic Regression
Python
Here we use the sklearn cross_validate function to score our model by splitting the data into five folds.
We start by importing our data and splitting this into a dataframe containing our model features and a series containing out target. We then initialise a simple logistic regression model.
We then score the model over five folds using the cross_validate function using accuracy as the evaluation metric. Each fold will be trained and tested and the scores will stored in an array. We can then print the test scores for each fold.
For additional metrics use the scoring parameter reference from scikit-learn documentation: https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
1| import pandas as pd 2| from sklearn.linear_model import LogisticRegression 3| from sklearn.model_selection import cross_validate 4| 5| df = pd.read_csv('data/processed_data.csv') 6| X = df[['Retail_Price','Discount']] 7| y = df['Returned_Units'] 8| 9| model = LogisticRegression() 10| 11| """ 12| We pass in our logistic regression model, our features and target, 13| our scoring metric and the number of folds we want to consider. 14| """ 15| scores = cross_validate(model, X, y, scoring='accuracy', cv=5) 16| print(scores['test_score'])
147
131
125
118