PDF file location: http://www.murraylax.org/rtutorials/regression_coefficients.pdf

HTML file location: http://www.murraylax.org/rtutorials/regression_coefficients.html

## 1. Example: Monthly Earnings and Years of Education

In this tutorial, we will focus on an example that explores the relationship between total monthly earnings and years of education. We will estimate the following regression equation:

$y_i = b_0 + b_1 x_i + e_i$

where $$y_i$$ denotes the income of individual $$i$$, and $$x_i$$ denotes the number of years of education of individual $$i$$.

The code below downloads a CSV file that includes data from 1980 for 935 individuals on variables including their total monthly earnings (MonthlyEarnings) and a number of variables that could influence income, including years of education (YearsEdu). The data set originally comes from textbook website for Stock and Watson’s Introduction to Econometrics.

wages <- read.csv("http://murraylax.org/datasets/wage2.csv");

We estimate the simple regression with the following call to lm() and store the output in an object we call lmwages:

lmwages <- lm(MonthlyEarnings ~ YearsEdu, data=wages)

We can print a summary of the results with the following call to the summary() function:

summary(lmwages)
##
## Call:
## lm(formula = MonthlyEarnings ~ YearsEdu, data = wages)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -877.38 -268.63  -38.38  207.05 2148.26
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  146.952     77.715   1.891   0.0589 .
## YearsEdu      60.214      5.695  10.573   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 382.3 on 933 degrees of freedom
## Multiple R-squared:  0.107,  Adjusted R-squared:  0.106
## F-statistic: 111.8 on 1 and 933 DF,  p-value: < 2.2e-16

These ‘Estimate’ column of the coefficients table implies the equation for the best fitting line is given by, $\hat{y}_i = 146.95 + 60.21 x_i.$