Pandas: Group By Index and Perform Calculation


You can use the following methods to group by one or more index columns in pandas and perform some calculation:

Method 1: Group By One Index Column

df.groupby('index1')['numeric_column'].max()

Method 2: Group By Multiple Index Columns

df.groupby(['index1', 'index2'])['numeric_column'].sum()

Method 3: Group By Index Column and Regular Column

df.groupby(['index1', 'numeric_column1'])['numeric_column2'].nunique()

The following examples show how to use each method with the following pandas DataFrame that has a MultiIndex:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   'position': ['G', 'G', 'G', 'F', 'F', 'G', 'G', 'F', 'F', 'F'],
                   'points': [7, 7, 7, 19, 16, 9, 10, 10, 8, 8],
                   'rebounds': [8, 8, 8, 10, 11, 12, 13, 13, 15, 11]})

#set 'team' column to be index column
df.set_index(['team', 'position'], inplace=True)

#view DataFrame
df

		 points	 rebounds
team	position		
A	G	 7	 8
        G	 7	 8
        G	 7	 8
        F	 19	 10
        F	 16	 11
B	G	 9	 12
        G	 10	 13
        F	 10	 13
        F	 8	 15
        F	 8	 11

Method 1: Group By One Index Column

The following code shows how to find the max value of the ‘points’ column, grouped by the ‘position’ index column:

#find max value of 'points' grouped by 'position index column
df.groupby('position')['points'].max()

position
F    19
G    10
Name: points, dtype: int64

Method 2: Group By Multiple Index Columns

The following code shows how to find the sum of the ‘points’ column, grouped by the ‘team’ and ‘position’ index columns:

#find max value of 'points' grouped by 'position index column
df.groupby(['team', 'position'])['points'].sum()

team  position
A     F           35
      G           21
B     F           26
      G           19
Name: points, dtype: int64

Method 3: Group By Index Column & Regular Column

The following code shows how to find the number of unique values in the ‘rebounds’ column, grouped by the index column ‘team’ and the ordinary column ‘points’:

#find max value of 'points' grouped by 'position index column
df.groupby(['team', 'points'])['rebounds'].nunique()

team  points
A     7         1
      16        1
      19        1
B     8         2
      9         1
      10        1
Name: rebounds, dtype: int64

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

x
Scroll to Top