Homework 2: Introduction to Regression

Introduction

The questions below use the dataset discrim.RData which includes zip-code level data on prices for various items at fast food restaurants, including the average price of soda, fries, entrees like burgers and chicken sandwiches, and a number of variables that may determine these price levels. The last page of this assignment includes a full list and a description of the variables. The goal of the research project was to determine if there is racial discrimination in fast food prices. Do neighborhoods that have higher proportions of racial minorities experience higher fast food prices?

The dataset includes the following variables, but not all of these variables are used in the assignment.

  • psoda: price of medium soda

  • pfries: price of small fries

  • pentree: price entree (burger or chicken)

  • wagest: starting wage, 1st wave

  • nmgrs: number of managers, 1st wave

  • nregs: number of registers, 1st wave

  • hrsopen: hours open, 1st wave

  • emp: number of employees, 1st wave

  • psoda2: price of medium soda, 2nd wave

  • pfries2: price of small fries, 2nd wave

  • pentree2: price entree, 2nd wave

  • wagest2: starting wage, 2nd wave

  • nmgrs2: number of managers, 2nd wave

  • nregs2: number of registers, 2nd wave

  • hrsopen2: hours open, 2nd wave

  • emp2: number of employees, 2nd wave

  • compown: 1 if company owned

  • chain: BK = 1, KFC = 2, Roy Rogers = 3, Wendy’s = 4

  • density: population density, town

  • crmrte: crime rate, town

  • state: NJ = 1, PA = 2

  • prpblck: proportion black, zipcode

  • prppov: proportion in poverty, zipcode

  • prpncar: proportion no car, zipcode

  • hseval: median housing value, zipcode

  • nstores: number of stores, zipcode

  • income: median family income, zipcode

  • county: county label

  • lpsoda: log(psoda)

  • lpfries: log(pfries)

  • lhseval: log(hseval)

  • lincome: log(income)

  • ldensity: log(density)

  • NJ: 1 for New Jersey

  • BK: 1 if Burger King

  • KFC: 1 if Kentucky Fried Chicken

  • RR: 1 if Roy Rogers

The dataset should be stored in your Rstudio.cloud project files. You can open the dataset with the following call:

This loads the data frame into memory to an object named, discrim.

Load the tidyverse library to make use of these functions. If necessary you may need to install the series of packages with a call to install.packages()

You can view the structure of the data frame with a call to glimpse(), which is available in the tidyverse package.

## Observations: 410
## Variables: 37
## $ psoda    <dbl> 1.12, 1.06, 1.06, 1.12, 1.12, 1.06, 1.17, 1.17, 1.18,...
## $ pfries   <dbl> 1.06, 0.91, 0.91, 1.02, NA, 0.95, 0.95, 1.02, 1.02, 1...
## $ pentree  <dbl> 1.02, 0.95, 0.98, 1.06, 0.49, 1.01, 0.95, 1.06, 1.06,...
## $ wagest   <dbl> 4.25, 4.75, 4.25, 5.00, 5.00, 4.25, 4.65, 4.50, NA, 4...
## $ nmgrs    <dbl> 3, 3, 3, 4, 3, 4, 3, 3, 4, 3, 3, 4, 3, 4, 2, 4, 4, 3,...
## $ nregs    <int> 5, 3, 5, 5, 3, 4, 2, 5, 4, 5, 5, 2, 5, NA, 2, 4, 5, 2...
## $ hrsopen  <dbl> 16.0, 16.5, 18.0, 16.0, 16.0, 15.0, 16.0, 17.0, 17.0,...
## $ emp      <dbl> 27.5, 21.5, 30.0, 27.5, 5.0, 17.5, 22.5, 18.5, 17.0, ...
## $ psoda2   <dbl> 1.11, 1.05, 1.05, 1.15, 1.04, 1.05, 1.05, 1.11, 1.10,...
## $ pfries2  <dbl> 1.11, 0.89, 0.94, 1.05, 1.01, 0.94, 0.94, 1.06, 1.01,...
## $ pentree2 <dbl> 1.05, 0.95, 0.98, 1.05, 0.58, 1.00, 0.94, 1.05, 0.99,...
## $ wagest2  <dbl> 5.05, 5.05, 5.05, 5.05, 5.05, 5.05, 5.05, 5.05, 5.05,...
## $ nmgrs2   <dbl> 5, 4, 4, 4, 3, 3, 3, 3, 4, 6, 5, 4, 2, 0, 2, 5, 4, 3,...
## $ nregs2   <int> 5, 3, 5, 5, 3, 4, 2, 5, 4, 5, 5, 2, 5, NA, 2, 4, 5, 2...
## $ hrsopen2 <dbl> 15.0, 17.5, 17.5, 16.0, 16.0, 15.0, 16.0, 16.0, 18.0,...
## $ emp2     <dbl> 27.0, 24.5, 25.0, NA, 12.0, 28.0, 18.5, 17.0, 34.0, 2...
## $ compown  <int> 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,...
## $ chain    <int> 3, 1, 1, 3, 1, 1, 1, 3, 1, 3, 2, 4, 2, 1, 1, 1, 1, 4,...
## $ density  <dbl> 4030, 4030, 11400, 8345, 720, 4424, 2678, 6405, 18388...
## $ crmrte   <dbl> 0.0528866, 0.0528866, 0.0360003, 0.0484232, 0.0615890...
## $ state    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ prpblck  <dbl> 0.1711542, 0.1711542, 0.0473602, 0.0528394, 0.0344800...
## $ prppov   <dbl> 0.0365789, 0.0365789, 0.0879072, 0.0591227, 0.0254145...
## $ prpncar  <dbl> 0.0788428, 0.0788428, 0.2694298, 0.1366903, 0.0738020...
## $ hseval   <int> 148300, 148300, 169200, 171600, 249100, 148000, 21270...
## $ nstores  <int> 3, 3, 3, 3, 1, 2, 1, 1, 5, 5, 5, 5, 5, 1, 2, 2, 2, 1,...
## $ income   <int> 44534, 44534, 41164, 50366, 72287, 44515, 62056, 5365...
## $ county   <int> 18, 18, 12, 10, 10, 18, 10, 24, 10, 10, 10, 10, 10, 2...
## $ lpsoda   <dbl> 0.11332869, 0.05826885, 0.05826885, 0.11332869, 0.113...
## $ lpfries  <dbl> 0.058268853, -0.094310649, -0.094310649, 0.019802609,...
## $ lhseval  <dbl> 11.90699, 11.90699, 12.03884, 12.05292, 12.42561, 11....
## $ lincome  <dbl> 10.70401, 10.70401, 10.62532, 10.82707, 11.18840, 10....
## $ ldensity <dbl> 8.301521, 8.301521, 9.341369, 9.029418, 6.579251, 8.3...
## $ NJ       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ BK       <int> 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0,...
## $ KFC      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,...
## $ RR       <int> 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...

Answer the questions below. Type up your answers and submit to the appropriate Canvas assignment folder. Include in your answers (1) the code you used, (2) the output from the code, and (3) your written description of the interpretation of the output as appropriate to answer the question.

Problems

Ordinary Least Squares

Estimate a linear regression that predicts the average price of a fast food burger or chicken entree in a zip code based on the following explanatory variables: starting wage for fast food workers, median family income, the proportion of the population that lives in poverty, the average crime rate (per 1000 in population), the population density, and the proportion of the population that is black / African American.

  1. Report the estimated regression equation.

  2. Is there statistical evidence that there is racial discrimination in fast food prices, after accounting for fast food workers wage, median family income, proportion of the population in poverty, crime rate, and population density? Test the appropriate hypothesis.

  3. Report the explanatory variables where you have statistical evidence that they influence the fast food prices.

  4. What percentage of the variability in fast food prices is accounted for by your explanatory variables?

  5. Test the hypothesis that at least one of your regression variables is useful in explaining prices of fast food entrees.

  6. What does your regression predict would be the change in fast food entree prices for each $1,000 of additional median family income?

Log Transformations

Estimate a linear regression that predicts the natural log of the average price of a fast food burger or chicken entree in a zip code based on the following explanatory variables: starting wage for fast food workers, the natural log of median family income, the proportion of the population that lives in poverty, the average crime rate, the natural log of the population population density, and the proportion of the population that is black / African American.

  1. The same outcome and explanatory variables are used in this problem as the previous problem, but the outcome variable and some of the explanatory variables are expressed instead as a natural log. How much variability in this transformed outcome variable is accounted for by your explanatory variables? How does this compare to the previous model?

  2. With this new regression structure, is there statistical evidence that there is racial discrimination in fast food prices? Test the appropriate hypothesis.

  3. Accounting for all the explanatory variables in your regression model, how does a 1% increase in median income influence fast food prices? Construct and interpret a 95% confidence interval.

  4. Accounting for all the explanatory variables in your regression model, how does a 1% increase in population density influence fast food prices? Construct and interpret a 95% confidence interval.

Submission

Upload your submission to the Canvas assignment folder titled, Homework 2 - Introduction to Regression, by Monday, February 25, 5:30 PM.

Include in your upload both the .html document and the .Rmd document. Canvas will only accept files with these extension. I will only give credit for submissions correctly uploaded to Canvas.

To download these files in RStudio.cloud, go to the documents view tab in the lower right section, click the checkbox next to the file you want download, click More, and click Export.

ECO 307: Econometrics

Due Monday, February 25, 11:59 PM