Compile your answers in one Microsoft Word document. Copy and paste all figures from Excel.
Question 1 (5 Points)
1. You are provided with data for several nations from the Human Development Report, 2017.
a. Compute summary statistics (mean, median, mode, standard deviation etc.) for the variables of GDP (GDP per capita in thousands of dollars) and Internet use.Provide descriptions of the distributions based on the summary statistics. Address skewness in your description.
b.Generate a scatterplot with clearly labeled axes Human Development Index and Life Expectancy at Birth. Life Expectancy will be on the Y axis.
c.Compute the correlation between Human Development Index and Life Expectancy at Birth and interpret it. Refer to both the strength and direction of the correlation in your interpretation.
d. Using the variable Internet Use, construct the 1st , 2nd, and 3rd standard deviation from the mean intervals and calculate the percentages for each. Then make a short interpretation on whether they match the percentages in the Empirical Rule.
Question 2 (5 points)
In the excel file entitled earthquake_asst1, you are given data on the earthquake magnitudes and depths for various geological events in a Midwestern state over time. As a researcher, you are particularly interested in the linear relationship between these 2 variables. Hopefully the results will help emergency officials prepare better for these natural phenomenon.
a.Generate a scatterplot of the two variables, magnitude and depth. Describe the relationship depicted on the scatterplot.
b.Compute the Pearson’s correlation coefficient between the two variables.
c.Provide an interpretation of the correlation obtained. Refer to both the strength and direction of the correlation in your interpretation.
d.Also interpret the correlation in terms of r-squared (coefficient of determination).
e.Manually construct a relative frequency distribution for the variable magnitudes. Hint: there are 50 observations, so you should use 5 or 6 classes. Make a statement about the distribution for this variable.
Question 3 (5 Points)
Using the data provided from the excel file entitled homes_asst1, estimate a sample regression function of a linear form to predict the sale price of a home using the acreage for the respective home. Present the results of the regression by pasting the excel output into your document, and then answer the questions below.
a.What is the value of the intercept a?
b.What is the value of the slope b?
c.What is the sale price for a home with 1.39 acres? Calculate the sample residual for this observation. Show your work.
d.Interpret the value of the R-squared for the regression model.
e. Which other 2 variable(s) in the data set, or just in general, do you think would also be a good predictor for the sale price of a home? Explain in a sentence or two.
Question 4 (5 points)
Find an experimental study from a journal of your choice. (Provide the citation).
a.Identify the explanatory variable
b.Identify the dependent/response variable
c.What were the treatments?
d.What were the experimental units?
e.How were the experimental units assigned to the treatments?
f.Can you identify any source of bias? Explain.
Question 5 (10 Points)
A data set is provided, entitled oldfaithful_asst, on the duration and height of the Old Faithful geyser in the Yellowstone National Park.
a.Construct a scatterplot using Excel or any software (SPSS or Minitab) between the variables “duration” and “height.” Please title the graph “Scatterplot 1 Old Faithful” and create labels for both axes.
b.There seems to be a an outlier in the data set. Although an outlier is not a detriment to the data analysis, as part of an exercise, identify the outlier and delete it (simply erase its value, do not replace with zero).
Show the new (second) scatterplot and title the graph “Scatterplot 2 Old Faithful”
Describe the pattern that emerges. What might this relationship imply?
c.Compute the correlation coefficient between the two variables and interpret this correlation. Refer to both the strength and direction of the correlation in your interpretation. Also interpret the correlation in terms of r-squared (coefficient of determination).
d.Conduct a regression to predict duration of an eruption using height and interpret the sample slope coefficient in words.
e.What is the predicted duration for an eruption that is 115?