Homework 2: Introduction to Regression
Introduction
The questions below use the dataset discrim.RData
which includes zip-code level data on prices for various items at fast food restaurants, including the average price of soda, fries, entrees like burgers and chicken sandwiches, and a number of variables that may determine these price levels. The last page of this assignment includes a full list and a description of the variables. The goal of the research project was to determine if there is racial discrimination in fast food prices. Do neighborhoods that have higher proportions of racial minorities experience higher fast food prices?
The dataset includes the following variables, but not all of these variables are used in the assignment.
psoda
: price of medium sodapfries
: price of small friespentree
: price entree (burger or chicken)wagest
: starting wage, 1st wavenmgrs
: number of managers, 1st wavenregs
: number of registers, 1st wavehrsopen
: hours open, 1st waveemp
: number of employees, 1st wavepsoda2
: price of medium soda, 2nd wavepfries2
: price of small fries, 2nd wavepentree2
: price entree, 2nd wavewagest2
: starting wage, 2nd wavenmgrs2
: number of managers, 2nd wavenregs2
: number of registers, 2nd wavehrsopen2
: hours open, 2nd waveemp2
: number of employees, 2nd wavecompown
: 1 if company ownedchain
: BK = 1, KFC = 2, Roy Rogers = 3, Wendy’s = 4density
: population density, towncrmrte
: crime rate, townstate
: NJ = 1, PA = 2prpblck
: proportion black, zipcodeprppov
: proportion in poverty, zipcodeprpncar
: proportion no car, zipcodehseval
: median housing value, zipcodenstores
: number of stores, zipcodeincome
: median family income, zipcodecounty
: county labellpsoda
: log(psoda)lpfries
: log(pfries)lhseval
: log(hseval)lincome
: log(income)ldensity
: log(density)NJ
: 1 for New JerseyBK
: 1 if Burger KingKFC
: 1 if Kentucky Fried ChickenRR
: 1 if Roy Rogers
The dataset should be stored in your Rstudio.cloud project files. You can open the dataset with the following call:
This loads the data frame into memory to an object named, discrim
.
Load the tidyverse
library to make use of these functions. If necessary you may need to install the series of packages with a call to install.packages()
# Necessary only one per rstudio.cloud project
# (So don't put it in your script!)
install.packages("tidyverse")
You can view the structure of the data frame with a call to glimpse()
, which is available in the tidyverse
package.
## Observations: 410
## Variables: 37
## $ psoda <dbl> 1.12, 1.06, 1.06, 1.12, 1.12, 1.06, 1.17, 1.17, 1.18,...
## $ pfries <dbl> 1.06, 0.91, 0.91, 1.02, NA, 0.95, 0.95, 1.02, 1.02, 1...
## $ pentree <dbl> 1.02, 0.95, 0.98, 1.06, 0.49, 1.01, 0.95, 1.06, 1.06,...
## $ wagest <dbl> 4.25, 4.75, 4.25, 5.00, 5.00, 4.25, 4.65, 4.50, NA, 4...
## $ nmgrs <dbl> 3, 3, 3, 4, 3, 4, 3, 3, 4, 3, 3, 4, 3, 4, 2, 4, 4, 3,...
## $ nregs <int> 5, 3, 5, 5, 3, 4, 2, 5, 4, 5, 5, 2, 5, NA, 2, 4, 5, 2...
## $ hrsopen <dbl> 16.0, 16.5, 18.0, 16.0, 16.0, 15.0, 16.0, 17.0, 17.0,...
## $ emp <dbl> 27.5, 21.5, 30.0, 27.5, 5.0, 17.5, 22.5, 18.5, 17.0, ...
## $ psoda2 <dbl> 1.11, 1.05, 1.05, 1.15, 1.04, 1.05, 1.05, 1.11, 1.10,...
## $ pfries2 <dbl> 1.11, 0.89, 0.94, 1.05, 1.01, 0.94, 0.94, 1.06, 1.01,...
## $ pentree2 <dbl> 1.05, 0.95, 0.98, 1.05, 0.58, 1.00, 0.94, 1.05, 0.99,...
## $ wagest2 <dbl> 5.05, 5.05, 5.05, 5.05, 5.05, 5.05, 5.05, 5.05, 5.05,...
## $ nmgrs2 <dbl> 5, 4, 4, 4, 3, 3, 3, 3, 4, 6, 5, 4, 2, 0, 2, 5, 4, 3,...
## $ nregs2 <int> 5, 3, 5, 5, 3, 4, 2, 5, 4, 5, 5, 2, 5, NA, 2, 4, 5, 2...
## $ hrsopen2 <dbl> 15.0, 17.5, 17.5, 16.0, 16.0, 15.0, 16.0, 16.0, 18.0,...
## $ emp2 <dbl> 27.0, 24.5, 25.0, NA, 12.0, 28.0, 18.5, 17.0, 34.0, 2...
## $ compown <int> 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,...
## $ chain <int> 3, 1, 1, 3, 1, 1, 1, 3, 1, 3, 2, 4, 2, 1, 1, 1, 1, 4,...
## $ density <dbl> 4030, 4030, 11400, 8345, 720, 4424, 2678, 6405, 18388...
## $ crmrte <dbl> 0.0528866, 0.0528866, 0.0360003, 0.0484232, 0.0615890...
## $ state <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ prpblck <dbl> 0.1711542, 0.1711542, 0.0473602, 0.0528394, 0.0344800...
## $ prppov <dbl> 0.0365789, 0.0365789, 0.0879072, 0.0591227, 0.0254145...
## $ prpncar <dbl> 0.0788428, 0.0788428, 0.2694298, 0.1366903, 0.0738020...
## $ hseval <int> 148300, 148300, 169200, 171600, 249100, 148000, 21270...
## $ nstores <int> 3, 3, 3, 3, 1, 2, 1, 1, 5, 5, 5, 5, 5, 1, 2, 2, 2, 1,...
## $ income <int> 44534, 44534, 41164, 50366, 72287, 44515, 62056, 5365...
## $ county <int> 18, 18, 12, 10, 10, 18, 10, 24, 10, 10, 10, 10, 10, 2...
## $ lpsoda <dbl> 0.11332869, 0.05826885, 0.05826885, 0.11332869, 0.113...
## $ lpfries <dbl> 0.058268853, -0.094310649, -0.094310649, 0.019802609,...
## $ lhseval <dbl> 11.90699, 11.90699, 12.03884, 12.05292, 12.42561, 11....
## $ lincome <dbl> 10.70401, 10.70401, 10.62532, 10.82707, 11.18840, 10....
## $ ldensity <dbl> 8.301521, 8.301521, 9.341369, 9.029418, 6.579251, 8.3...
## $ NJ <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ BK <int> 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0,...
## $ KFC <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,...
## $ RR <int> 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
Answer the questions below. Type up your answers and submit to the appropriate Canvas assignment folder. Include in your answers (1) the code you used, (2) the output from the code, and (3) your written description of the interpretation of the output as appropriate to answer the question.
Problems
Ordinary Least Squares
Estimate a linear regression that predicts the average price of a fast food burger or chicken entree in a zip code based on the following explanatory variables: starting wage for fast food workers, median family income, the proportion of the population that lives in poverty, the average crime rate (per 1000 in population), the population density, and the proportion of the population that is black / African American.
Report the estimated regression equation.
Is there statistical evidence that there is racial discrimination in fast food prices, after accounting for fast food workers wage, median family income, proportion of the population in poverty, crime rate, and population density? Test the appropriate hypothesis.
Report the explanatory variables where you have statistical evidence that they influence the fast food prices.
What percentage of the variability in fast food prices is accounted for by your explanatory variables?
Test the hypothesis that at least one of your regression variables is useful in explaining prices of fast food entrees.
What does your regression predict would be the change in fast food entree prices for each $1,000 of additional median family income?
Log Transformations
Estimate a linear regression that predicts the natural log of the average price of a fast food burger or chicken entree in a zip code based on the following explanatory variables: starting wage for fast food workers, the natural log of median family income, the proportion of the population that lives in poverty, the average crime rate, the natural log of the population population density, and the proportion of the population that is black / African American.
The same outcome and explanatory variables are used in this problem as the previous problem, but the outcome variable and some of the explanatory variables are expressed instead as a natural log. How much variability in this transformed outcome variable is accounted for by your explanatory variables? How does this compare to the previous model?
With this new regression structure, is there statistical evidence that there is racial discrimination in fast food prices? Test the appropriate hypothesis.
Accounting for all the explanatory variables in your regression model, how does a 1% increase in median income influence fast food prices? Construct and interpret a 95% confidence interval.
Accounting for all the explanatory variables in your regression model, how does a 1% increase in population density influence fast food prices? Construct and interpret a 95% confidence interval.
Submission
Upload your submission to the Canvas assignment folder titled, Homework 2 - Introduction to Regression, by Monday, February 25, 5:30 PM.
Include in your upload both the .html document and the .Rmd document. Canvas will only accept files with these extension. I will only give credit for submissions correctly uploaded to Canvas.
To download these files in RStudio.cloud, go to the documents view tab in the lower right section, click the checkbox next to the file you want download, click More, and click Export.