2 Upvotes

Pandas Groupby Transform

Python
Data Preprocessing

In this example we demonstrate how to use the groupby transform function from Pandas and add the outputs to the original dataframe.

We first create a dataframe players that contains the points and assists of 3 players over different games. We want to create two additional columns total_points and total_assists for each player that is added to the players dataframe. To do this we use groupby transform.

Groupby transform applies a function to each column in the dataframe in turn and returns a series that is the same length as the original dataframe but with values groupby whatever column(s) we pass to the groupby function.

In the below example, we show two equivalent ways to sum the points and assist columns grouped by each player using both a lambda function and a regular function. The outputs of the groupby transform function are then declared as new columns total_points and total_assists in the players dataframe. Finally we show how to groupby transform only the points column.

players = pd.DataFrame({'player':['A', 'A', 'B', 'B','C'], 
                        'points':[22,27,8,12,16], 'assists':[1,2,7,4,11]})

"""GROUPBY TRANSFORM USING LAMBDA"""
players[['total_points','total_assists']] = players.groupby('player').transform(lambda x: x.sum())


"""GROUPBY TRANSFORM USING A REGULAR FUNCTION"""
def get_totals(x):
    return x.sum()
players[['total_points','total_assists']] = players.groupby('player').transform(get_totals)

>> 
player	points	assists	total_points	total_assists
0	A	22	1	49	3
1	A	27	2	49	3
2	B	8	7	20	11
3	B	12	4	20	11
4	C	16	11	16	11


"""GROUPBY TRANSFORM FOR THE POINTS COLUMN ONLY"""
players['total_points'] = players.groupby('player')['points'].transform(lambda x: x.sum())

>> 
player	points	assists	total_points
0	A	22	1	49
1	A	27	2	49
2	B	8	7	20
3	B	12	4	20
4	C	16	11	16

By analyseup - Last Updated Jan. 16, 2022, 1:02 a.m.

Did you find this snippet useful?

Sign up to bookmark this in your snippet library

COMMENTS
RELATED SNIPPETS
Pivoting Pandas Dataframes
Python
Data Preprocessing

Pandas

3
Top Contributors
104
100