Never Forget Another Line of Code

Datasnips is a free code snippet hosting platform for Data Science & AI. It enables your code snippets to be organized, searchable & shareable.

PUBLIC SNIPPETS

LATEST SNIPPETS

TOP SNIPPETS

POPULAR TAGS

Pandas Undersampling for Imbalanced Binary Classification

Python

An example of how to handle imbalanced data in Python. This is based on the titanic dataset. Here we split the main dataframe into separate survived and deceased dataframe. The deceased dataframe is the larger dataframe so we sample the same number of rows from this dataframe as there are in the survived dataframe to make them the same size. We then concat both data frames back together to create a dataframe that is balanced.

 1|  survived = df[df['survived']==1]
 2|  deceased = df[df['survived']==0]
 3|  deceased = deceased.sample(n=len(survived), random_state=101)
 4|  df = pd.concat([survived,deceased],axis=0)