Wine These data give ratings and prices of 257 red and white wines that appeared in Wine Spectator in 2009. For this analysis, we are interested in how the rating given to a wine is associated with its price, and if this association depends on whether it’s a red or white wine.
(a) Plot the natural log of the price on the score given to the wine. Use color-coding or distinct symbols to distinguish red from white wines in the plot. Is the association between price and score dependent on the type of wine?
(b) Fit a multiple regression of log price on the score to the fit of a multiple regression that allows the effect of the score on price to depend on the type of wine, using an interaction. Check whether this model meets the conditions of the MRM, noting any flaws.
(c) Assume for the following test that the model meets the conditions for the MRM. Use the incremental
-test (see Exercise 45) to assess the difference between a simple regression and a model that allows the effect of score to differ for red and white wines. Does the test agree with the t-statistics observed in (b)? Explain any differences.
(d) Refine the model fit in (b) and summarize the results, noting any problems that remain. If problems remain, note how these affect your conclusions.
Exercise 45
R&D Expenses This data file contains a variety of accounting and financial values that describe companies operating in the information and professional services sectors of the economy. One column gives the expenses on research and development (R&D), and another gives the total assets of the companies. Both of these columns are reported in millions of dollars. This data table adds data for professional services. To estimate regression models, we need to transform both expenses and assets to a log scale.
(a) Plot the log of R&D expenses on the log of assets for both sectors together in one scatterplot. Use color-coding or distinct symbols to distinguish the groups. Does it appear that the relationship is different in these two sectors or can you capture the association with a single simple regression?
A common question asked when fitting models to subsets is “Do the equations for the two groups differ from each other?” For example, does the equation for the information sector differ from the equation for professional services? We’ve been answering this question informally, using the
statistics for the slopes of the dummy variable and interaction. There’s just one small problem: We’re using two tests to answer one question. What’s the chance for a false-positive error? If you’ve got one question, better to use one test.
To see if there’s any difference, we can use a variation on the
-test for
. The idea is to test both slopes at once rather than separately. The method uses the change in the size of
. If the
of the model increases by a statistically significant amount when we add both the dummy variable and interaction to the model, then something changed and the model is different. The form of this incremental, or partial,
-test is
In this formula,
denotes the number of variables in the model with the extra features, including dummy variables and interactions.
full is the
for that model. As usual, a big value for this
-statistic is 4.
(b) Add a dummy variable (coded as 0 for information companies and 1 for those in professional services) and its interaction with Log Assets to the model. Does the fit of this model meet the conditions for the MRM? Comment on the consequences of any problem that you identify.
(c) Assuming that the model meets the conditions for the MRM, use the incremental
-test to assess the size of the change in
. Does the test agree with your visual impression? (The value of
for the model with dummy and interaction is 3, with 2 slopes added. You will need to fit the simple regression of Log R&D Expenses on Log Assets to get the
from this model.)
(d) Summarize the fit of the model that best captures what is happening in these two sectors.