# Stratified Sampling with Pandas: How to Create a Representative Sample by Category

Python
Statistics & Probability

A stratified sample is one that takes a sample with an even amount of representation from a certain group within the population. For example if we were taking a sample from data relating to individuals we might want to make sure we had equal representation of men and women or equal representation from each age group.

In the example below we want create a sample from our df dataframe that contains equal representation of data with the three categories A, B & C.

``` 1|  # Create three stratums, one for each category
2|  stratum_A = df[df['category']=='A']
3|  stratum_B = df[df['category']=='B']
4|  stratum_C = df[df['category']=='C']
5|
6|  strata = [stratum_A,
7|            stratum_B,
8|            stratum_C]
9|
10|  # Create empty dataframe that will contain the stratified sample
11|  stratified_sample = pd.DataFrame(columns=df.columns)
12|
13|  # Loop through each stratum, sample 100 rows from each and add to stratified sample dataframe
14|  for stratum in strata:
15|      sample = stratum.sample(n=100,random_state=101)
16|      stratified_sample = pd.concat([stratified_sample, sample])
``` GREG HE
1 Upvote Did you find this snippet useful?

Get z-score for a column and add to Datafame
Python
Statistics & Probability
3
How to Calculate a Factorial Using Python
Python
Statistics & Probability
1
How to Calculate the Sampling Error
Python
Statistics & Probability
1
Numpy Descriptive Statistics
Python
Statistics & Probability

1