Perform Label Encoding in Python (With Example)


Often in machine learning, we want to convert into some type of numeric format that can be readily used by algorithms.

One way to do this is through label encoding, which assigns each categorical value an integer value based on alphabetical order.

For example, the following screenshot shows how to convert each unique value in a categorical variable called Team into an integer value based on alphabetical order:

You can use the following syntax to perform label encoding in Python:

from sklearn.preprocessing import LabelEncoder

#create instance of label encoder
lab = LabelEncoder()

#perform label encoding on 'team' column
df['my_column'] = lab.fit_transform(df['my_column'])

The following example shows how to use this syntax in practice.

Example: Label Encoding in Python

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29]})

#view DataFrame
print(df)

  team  points
0    A      25
1    A      12
2    B      15
3    B      14
4    B      19
5    B      23
6    C      25
7    C      29

We can use the following code to perform label encoding to convert each categorical value in the team column into an integer value:

from sklearn.preprocessing import LabelEncoder

#create instance of label encoder
lab = LabelEncoder()

#perform label encoding on 'team' column
df['team'] = lab.fit_transform(df['team'])

#view updated DataFrame
print(df)

   team  points
0     0      25
1     0      12
2     1      15
3     1      14
4     1      19
5     1      23
6     2      25
7     2      29

From the output we can see:

  • Each “A” value has been converted to 0.
  • Each “B” value has been converted to 1.
  • Each “C” value has been converted to 2.

Note that you can also use the inverse_transform() function to obtain the original values from the team column:

#display original team labels
lab.inverse_transform(df['team'])

array(['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'], dtype=object)

Additional Resources

x
Scroll to Top