Problem 1
Using ordinary least squares, is there evidence that the training program positively influenced earnings?
##
## Call:
## lm(formula = re78 ~ re75 + educ + age + train, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.475 -4.653 -0.481 4.017 115.850
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.60487 1.22266 0.495 0.6208
## re75 0.79532 0.01652 48.151 <2e-16 ***
## educ 0.61154 0.07281 8.399 <2e-16 ***
## age -0.04655 0.02063 -2.257 0.0241 *
## train -0.59937 0.84109 -0.713 0.4761
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.28 on 2670 degrees of freedom
## Multiple R-squared: 0.5684, Adjusted R-squared: 0.5677
## F-statistic: 878.9 on 4 and 2670 DF, p-value: < 2.2e-16
No. The regression coefficient on train
is not statistically significantly different from 0 (p-value=0.4761)
Problem 2
Plot the residuals (vertical axis) against the predicted values (horizontal axis). Is there evidence of heteroskedasticity? If so, describe the behavior of the variance of the residuals.
reg.df <- data.frame(resids=lmtrain$residuals, predicts=lmtrain$fitted.values)
ggplot(reg.df, aes(x=predicts, y=resids)) +
geom_point() +
theme_bw() +
labs(title="Visual Inspection for Changing Variance",
x="Predicted Values",
y="Residuals")
Yes. The residuals fan out as the predicted values increase.
Problem 3
Is there statistical evidence for heteroskedasticity? Conduct the appropriate hypothesis test.
##
## Call:
## lm(formula = I(resids^2) ~ predicts + I(predicts^2), data = reg.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -911.3 -87.0 -69.6 -30.2 13336.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 101.41377 21.53978 4.708 2.63e-06 ***
## predicts -4.01343 1.54824 -2.592 0.00959 **
## I(predicts^2) 0.15436 0.02526 6.110 1.14e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 412.3 on 2672 degrees of freedom
## Multiple R-squared: 0.02959, Adjusted R-squared: 0.02887
## F-statistic: 40.74 on 2 and 2672 DF, p-value: < 2.2e-16
Yes, there is evidence of heteroskedastcitity. With p-value \(<2.2 \times 10^{-16}\) there is sufficient statistical evidence that the squared residuals can be explained by the predicted values.
Problem 4
Using heteroskedastic consistent standard errors, is there evidence that the training program positive influenced earnings?
lmtrain <- lm(re78 ~ re75 + educ + age + train, data=data)
vv <- vcovHC(lmtrain, type="HC1")
coeftest(lmtrain, vcov=vv)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.604867 1.382882 0.4374 0.66186
## re75 0.795321 0.030123 26.4028 < 2.2e-16 ***
## educ 0.611545 0.087973 6.9515 4.527e-12 ***
## age -0.046554 0.022314 -2.0863 0.03705 *
## train -0.599370 0.745676 -0.8038 0.42159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
No. The regression coefficient on train
is not statistically significantly different from 0 (p-value=0.4216).