2 Upvotes

Stratified K-Fold - Splitting Data & Saving to File

Python
Data Preparation for Models

An example where the data is split into 5 stratified training and validation folds with each set saved to new csv files for later use.

import pandas as pd
from sklearn.model_selection import StratifiedKFold

df = pd.read_csv('data/raw/train.csv')

skf = StratifiedKFold(n_splits=5)
target = df.loc[:,'label']

fold_no = 1
for train_index, val_index in skf.split(df, target):
    train = df.loc[train_index,:]
    val = df.loc[val_index,:]
    train.to_csv('data/processed/folds/' + 'train_fold_' + str(fold_no) + '.csv')
    val.to_csv('data/processed/folds/' + 'val_fold_' + str(fold_no) + '.csv')
    fold_no += 1

By detro - Last Updated Dec. 13, 2020, 9 p.m.

Did you find this snippet useful?

Sign up to bookmark this in your snippet library

COMMENTS
RELATED SNIPPETS
Scale Data Using Standard Scaler in Sklearn
Python
Data Preparation for Models

Sklearn

3
Create Dummy Variables with Pandas
Python
Data Preparation for Models

Pandas

1
Top Contributors
75