1 Upvote

How to Perform Stratified Sampling Using Pandas

Python
Statistics & Probability

A stratified sample is one that takes a sample with an even amount of representation from a certain group within the population. For example if we were taking a sample from data relating to individuals we might want to make sure we had equal representation of men and women or equal representation from each age group.

In the example below we want create a sample from our df dataframe that contains equal representation of data with the three categories A, B & C.

# Create three stratums, one for each category
stratum_A = df[df['category']=='A']
stratum_B = df[df['category']=='B']
stratum_C = df[df['category']=='C']

strata = [stratum_A,
          stratum_B,
          stratum_C]

# Create empty dataframe that will contain the stratified sample
stratified_sample = pd.DataFrame(columns=df.columns)

# Loop through each stratum, sample 100 rows from each and add to stratified sample dataframe
for stratum in strata:
    sample = stratum.sample(n=100,random_state=101)
    stratified_sample = pd.concat([stratified_sample, sample])

By GregHe1979 - Last Updated July 9, 2022, 10:26 a.m.

Did you find this snippet useful?

Sign up to bookmark this in your snippet library

COMMENTS
RELATED SNIPPETS
How to Calculate the Sampling Error
Python
Statistics & Probability

Sampling

1
Numpy Descriptive Statistics
Python
Statistics & Probability

1
Top Contributors
103
100