 GREGHE1979
1 Upvote

# How to Perform Stratified Sampling Using Pandas

Python
Statistics & Probability

A stratified sample is one that takes a sample with an even amount of representation from a certain group within the population. For example if we were taking a sample from data relating to individuals we might want to make sure we had equal representation of men and women or equal representation from each age group.

In the example below we want create a sample from our df dataframe that contains equal representation of data with the three categories A, B & C.

```# Create three stratums, one for each category
stratum_A = df[df['category']=='A']
stratum_B = df[df['category']=='B']
stratum_C = df[df['category']=='C']

strata = [stratum_A,
stratum_B,
stratum_C]

# Create empty dataframe that will contain the stratified sample
stratified_sample = pd.DataFrame(columns=df.columns)

# Loop through each stratum, sample 100 rows from each and add to stratified sample dataframe
for stratum in strata:
sample = stratum.sample(n=100,random_state=101)
stratified_sample = pd.concat([stratified_sample, sample])```

By GregHe1979 - Last Updated July 9, 2022, 10:26 a.m. Did you find this snippet useful?

RELATED SNIPPETS
Get z-score for a column and add to Datafame
Python
Statistics & Probability
3
How to Calculate a Factorial Using Python
Python
Statistics & Probability
1
How to Calculate the Sampling Error
Python
Statistics & Probability
1
Numpy Descriptive Statistics
Python
Statistics & Probability

1
Top Contributors 103 100 48
Find Snippets by Language
Find Snippets by Use