Problem 1

Using ordinary least squares, is there evidence that the training program positively influenced earnings?

## 
## Call:
## lm(formula = re78 ~ re75 + educ + age + train, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -62.475  -4.653  -0.481   4.017 115.850 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.60487    1.22266   0.495   0.6208    
## re75         0.79532    0.01652  48.151   <2e-16 ***
## educ         0.61154    0.07281   8.399   <2e-16 ***
## age         -0.04655    0.02063  -2.257   0.0241 *  
## train       -0.59937    0.84109  -0.713   0.4761    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.28 on 2670 degrees of freedom
## Multiple R-squared:  0.5684, Adjusted R-squared:  0.5677 
## F-statistic: 878.9 on 4 and 2670 DF,  p-value: < 2.2e-16

No. The regression coefficient on train is not statistically significantly different from 0 (p-value=0.4761)

Problem 2

Plot the residuals (vertical axis) against the predicted values (horizontal axis). Is there evidence of heteroskedasticity? If so, describe the behavior of the variance of the residuals.

Yes. The residuals fan out as the predicted values increase.

Problem 3

Is there statistical evidence for heteroskedasticity? Conduct the appropriate hypothesis test.

## 
## Call:
## lm(formula = I(resids^2) ~ predicts + I(predicts^2), data = reg.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -911.3   -87.0   -69.6   -30.2 13336.9 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   101.41377   21.53978   4.708 2.63e-06 ***
## predicts       -4.01343    1.54824  -2.592  0.00959 ** 
## I(predicts^2)   0.15436    0.02526   6.110 1.14e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 412.3 on 2672 degrees of freedom
## Multiple R-squared:  0.02959,    Adjusted R-squared:  0.02887 
## F-statistic: 40.74 on 2 and 2672 DF,  p-value: < 2.2e-16

Yes, there is evidence of heteroskedastcitity. With p-value \(<2.2 \times 10^{-16}\) there is sufficient statistical evidence that the squared residuals can be explained by the predicted values.

Problem 4

Using heteroskedastic consistent standard errors, is there evidence that the training program positive influenced earnings?

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  0.604867   1.382882  0.4374   0.66186    
## re75         0.795321   0.030123 26.4028 < 2.2e-16 ***
## educ         0.611545   0.087973  6.9515 4.527e-12 ***
## age         -0.046554   0.022314 -2.0863   0.03705 *  
## train       -0.599370   0.745676 -0.8038   0.42159    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

No. The regression coefficient on train is not statistically significantly different from 0 (p-value=0.4216).