Consider the Newmarket 5K dataset. A dataset with certain rows deleted (where there was no Age…
Problem 4 (20 points). Consider the Newmarket 5K dataset. A dataset with certain rows deleted (where there was no Age and/or Sex entered) has been created. It is calledNewmarket_Cleaned.csv.
(Links to an external site.)
. Direct HTTP link: https://unh.box.com/shared/static/p8x4xlbean3rlslmfskfu74fe89yjui8.csv
Please use it for this problem. Consider the “Model 4” developed in class, which contained Age, Age^2, and Sex as independent variables. First re-create this model with the cleaned dataset (you will need to create the Age^2 column; you can verify your model summary against the one from class), and then proceed with this problem.
Make the necessary changes to the model to incorporate Year as a categorical explanatory variable (you may need to change how R interprets the variable, before running this model). Does the year of the race seem to be related to mean finishing time, after taking into account the other variables in Model 4? Use the 0.05 significance level. Explain your work/logic, and key output to support your answer.
Using the updated model, after accounting for age and sex, what is the expected difference in finishing times for runners from 2008 versus runners from 2004?
Using the updated model, after accounting for age and sex, what is the expected difference in finishing times for runners from 2014 versus 2011?
In the above, the directions are to treat Year as a categorical variable. Explain how the regression model would be different if it instead were treated as a numeric variable. What are the assumptions being made about the effect of Year in the two different approaches?