1 Upvote

Sklearn Cross Validation with Logistic Regression

Supervised Learning

Here we use the sklearn cross_validate function to score our model by splitting the data into five folds.

We start by importing our data and splitting this into a dataframe containing our model features and a series containing out target. We then initialise a simple logistic regression model.

We then score the model over five folds using the cross_validate function using accuracy as the evaluation metric. Each fold will be trained and tested and the scores will stored in an array. We can then print the test scores for each fold.

For additional metrics use the scoring parameter reference from scikit-learn documentation: https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate

df = pd.read_csv('data/processed_data.csv')
X = df[['Retail_Price','Discount']]
y = df['Returned_Units']

model = LogisticRegression()

We pass in our logistic regression model, our features and target, 
our scoring metric and the number of folds we want to consider.
scores = cross_validate(model, X, y, scoring='accuracy', cv=5)

By analyseup - Last Updated Jan. 10, 2022, 11:27 p.m.

Did you find this snippet useful?

Sign up to bookmark this in your snippet library

Top Contributors