SNIPPET
1 Upvote

Undersampling Imbalanced Data for Binary Classification

Python
Data Preprocessing

An example of how to handle imbalanced data in Python. This is based on the titanic dataset. Here we split the main dataframe into separate survived and deceased dataframe. The deceased dataframe is the larger dataframe so we sample the same number of rows from this dataframe as there are in the survived dataframe to make them the same size. We then concat both data frames back together to create a dataframe that is balanced.

survived = df[df['survived']==1]
deceased = df[df['survived']==0]
deceased = deceased.sample(n=len(survived), random_state=101)
df = pd.concat([survived,deceased],axis=0)

By analyseup - Last Updated Jan. 23, 2021, 5:28 p.m.

COMMENTS
RELATED SNIPPETS
Pivoting Pandas Dataframes
Python
Data Preprocessing

Pandas

3
SparkSession instantiate
Python
Data Preprocessing

2
Converting Data Types
Python
Data Preprocessing

Data-cleaning

2
Data Minification
Python
Data Preprocessing

Cleaning

2
Search Snippets by Tag: