**White’s test** is used to determine if is present in a regression model.

Heteroscedasticity refers to the unequal scatter of at different levels of a , which violates the that the residuals are equally scattered at each level of the response variable.

The following step-by-step example shows how to perform White’s test in Python to determine whether or not heteroscedasticity is a problem in a given regression model.

**Step 1: Load Data**

In this example we will fit a using the **mtcars** dataset.

The following code shows how to load this dataset into a pandas DataFrame:

from sklearn.linear_model import LinearRegression from statsmodels.stats.diagnostic import het_white import statsmodels.api as sm import pandas as pd #define URL where dataset is located url = "https://raw.githubusercontent.com/Statology/Python-Guides/main/mtcars.csv" #read in data data = pd.read_csv(url) #view summary of data data.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 32 entries, 0 to 31 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 model 32 non-null object 1 mpg 32 non-null float64 2 cyl 32 non-null int64 3 disp 32 non-null float64 4 hp 32 non-null int64 5 drat 32 non-null float64 6 wt 32 non-null float64 7 qsec 32 non-null float64 8 vs 32 non-null int64 9 am 32 non-null int64 10 gear 32 non-null int64 11 carb 32 non-null int64 dtypes: float64(5), int64(6), object(1)

**Step 2: Fit Regression Model**

Next, we will fit a regression model using **mpg **as the response variable and **disp ** and **hp **as the two predictor variables:

#define response variable y = data['mpg'] #define predictor variables x = data[['disp', 'hp']] #add constant to predictor variables x = sm.add_constant(x) #fit regression model model = sm.OLS(y, x).fit()

**Step 3: Perform White’s Test**

Next, we will use the function from the statsmodels package to perform White’s test to determine if heteroscedasticity is present in the regression model:

#perform White's test white_test = het_white(model.resid, model.model.exog) #define labels to use for output of White's test labels = ['Test Statistic', 'Test Statistic p-value', 'F-Statistic', 'F-Test p-value'] #print results of White's test print(dict(zip(labels, white_test))) {'Test Statistic': 7.076620330416624, 'Test Statistic p-value': 0.21500404394263936, 'F-Statistic': 1.4764621093131864, 'F-Test p-value': 0.23147065943879694}

Here is how to interpret the output:

- The test statistic is X
^{2}=**7.0766**. - The corresponding p-value is
**0.215**.

White’s test uses the following null and alternative hypotheses:

**Null (H**: Homoscedasticity is present (residuals are equally scattered)_{0})**Alternative (H**Heteroscedasticity is present (residuals are not equally scattered)_{A}):

This means we do not have sufficient evidence to say that heteroscedasticity is present in the regression model.

**What To Do Next**

If you fail to reject the null hypothesis of White’s test then heteroscedasticity is not present and you can proceed to interpret the output of the original regression.

However, if you reject the null hypothesis, this means heteroscedasticity is present. In this case, the standard errors that are shown in the output table of the regression may be unreliable.

There are two common ways to fix this issue:

**1. Transform the response variable.**

You can try performing a transformation on the response variable, such as taking of the response variable. This often causes heteroscedasticity to go away.

**2. Use weighted regression.**

Weighted regression assigns a weight to each data point based on the variance of its fitted value. Essentially, this gives small weights to data points that have higher variances, which shrinks their squared residuals. When the proper weights are used, this can eliminate the problem of heteroscedasticity.

**Additional Resources**

The following tutorials provide additional information about linear regression in Python: