Homework 3: Heteroskedasticity in Regression

Problem 1

Using ordinary least squares, is there evidence that the training program positively influenced earnings?

lmtrain <- lm(re78 ~ re75 + educ + age + train, data=data)
summary(lmtrain)

## 
## Call:
## lm(formula = re78 ~ re75 + educ + age + train, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -62.475  -4.653  -0.481   4.017 115.850 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.60487    1.22266   0.495   0.6208    
## re75         0.79532    0.01652  48.151   <2e-16 ***
## educ         0.61154    0.07281   8.399   <2e-16 ***
## age         -0.04655    0.02063  -2.257   0.0241 *  
## train       -0.59937    0.84109  -0.713   0.4761    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.28 on 2670 degrees of freedom
## Multiple R-squared:  0.5684, Adjusted R-squared:  0.5677 
## F-statistic: 878.9 on 4 and 2670 DF,  p-value: < 2.2e-16

No. The regression coefficient on train is not statistically significantly different from 0 (p-value=0.4761)

Problem 2

Plot the residuals (vertical axis) against the predicted values (horizontal axis). Is there evidence of heteroskedasticity? If so, describe the behavior of the variance of the residuals.

reg.df <- data.frame(resids=lmtrain$residuals, predicts=lmtrain$fitted.values)

ggplot(reg.df, aes(x=predicts, y=resids)) +
  geom_point() +
  theme_bw() +
  labs(title="Visual Inspection for Changing Variance",
       x="Predicted Values",
       y="Residuals")

Yes. The residuals fan out as the predicted values increase.

Problem 3

Is there statistical evidence for heteroskedasticity? Conduct the appropriate hypothesis test.

lm.ht <- lm(I(resids^2) ~ predicts + I(predicts^2), data=reg.df)
summary(lm.ht)

## 
## Call:
## lm(formula = I(resids^2) ~ predicts + I(predicts^2), data = reg.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -911.3   -87.0   -69.6   -30.2 13336.9 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   101.41377   21.53978   4.708 2.63e-06 ***
## predicts       -4.01343    1.54824  -2.592  0.00959 ** 
## I(predicts^2)   0.15436    0.02526   6.110 1.14e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 412.3 on 2672 degrees of freedom
## Multiple R-squared:  0.02959,    Adjusted R-squared:  0.02887 
## F-statistic: 40.74 on 2 and 2672 DF,  p-value: < 2.2e-16

Yes, there is evidence of heteroskedastcitity. With p-value \(<2.2 \times 10^{-16}\) there is sufficient statistical evidence that the squared residuals can be explained by the predicted values.

Problem 4

Using heteroskedastic consistent standard errors, is there evidence that the training program positive influenced earnings?

lmtrain <- lm(re78 ~ re75 + educ + age + train, data=data)

vv <- vcovHC(lmtrain, type="HC1")
coeftest(lmtrain, vcov=vv)

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  0.604867   1.382882  0.4374   0.66186    
## re75         0.795321   0.030123 26.4028 < 2.2e-16 ***
## educ         0.611545   0.087973  6.9515 4.527e-12 ***
## age         -0.046554   0.022314 -2.0863   0.03705 *  
## train       -0.599370   0.745676 -0.8038   0.42159    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

No. The regression coefficient on train is not statistically significantly different from 0 (p-value=0.4216).

Homework 3: Heteroskedasticity in Regression

Answer Key

ECO 307: Econometrics

Problem 1

Problem 2

Problem 3

Problem 4