The dataset ModelingPVCapacity.xls has data on the amount of installed photovoltaic cells (solar...

Question

The dataset ModelingPVCapacity.xls has data on the amount of installed photovoltaic cells (solar panels) in New Zealand. The amount installed is called the Capacity, and is measured in MW. We want to model this as a function of Time (measured in hundreds of days since 31 July 2013).

Using Analyze> Curve Estimation, fit a cubic regression model for Capacity vs Time. Is the cube power term significant? Show the graph.

It is unbelievable that the amount of installed PV capacity will continue to increase steeply in future years. Eventually it must flatten out. Therefore use the data to fit a nonlinear regression model Capacity = C*exp(b0+ b1*Time)/(1+exp(b0+b1*Time)).

Guess at initial parameter estimates. (If the model doesn’t converge try other guesses until you get convergence.) Save the predicted values.

Show the nonlinear regression output. Also use Graph > Legacy Dialogs > Scatter/Dot > Overlay Scatterplot > Define to plot the Capacity vs Time and Predicted values vs Time overlaid on the same graph. (Change the plotting symbol for the predicted values to +).

What is the predicted maximum capacity based on this model?

Does this nonlinear regression model fit better or worse than the cubic model? Quote evidence.

From other considerations, the NZ Electricity Authority believes that
in the long run
(i.e. asymptotically) about half of New Zealand households will find it economic to install PV systems. In view of the current population, that equates to about C =2800 MW of Capacity. Calculate a new variable

logitC = ln( Capacity/ (2800 – Capacity) ) .
and fit a regression model logitC = b0 + b1* Time + b2* Change
Save the predicted values. (The variable Change is in the data file)

Show the regression output, and use Graphs > Legacy Dialogs etc. to plot an overlay scatterplot of logitC vs Time and the predicted values vs Time.

The second variable Change was chosen because in November 2014 Contact Energy slashed the amount it would pay when customers sold PV-generated power back to the national grid, a move that was followed by other companies. Is there evidence that there was a change in the slope of the line for logitC vs time? Quote evidence.

Using the estimated regression coefficients, predict the value of logitC at 30 April 2017 (Time= 13.69, Change= 8.82). Convert that back to estimate the PV Capacity in MW on 30 April 2017.

(Hint if Y is the predicted value of logitC then Capacity = 2800* e^Y / (1+ e^Y). )
Q2. The dataset Dengue.xls reports a study of 196 people living in a Mexican city, of whom 57 were found to have dengue fever, a nasty mosquito-borne disease. The response variable is Dengue (=1 if diseased and 0 if not). Explanatory variables include the person’s Age (in years), whether or not they used a mosquito net (MosNet=1 if yes, 0 if no), and which Sector of the city the person lived in (sector 1,2,3,4 or 5)

Fit a binary logistic regression of Dengue vs Age. Save the predicted probabilities..

Show the regression output . Also plot the predicted probabilities against Age.

At what age is it 50% likely that the person will have dengue fever?

Remove Age, and add the
indicator variables for Sector
(Sector1, Sector2, Sector3, Sector4) to the regression.

Quote an overall statistic and sig value for whether the probability of dengue differs between the sectors.

Looking at the coefficients, which sectors have significantly higher rates of dengue fever than other sectors?

Re-fit the binary logistic regression with Age, MosNet and the sector indicator variables.

Does the use of a mosquito net make a significant difference to the probability of having dengue fever?

What proportion of individuals are correctly classified by the binary logistic model.

Q3 . The dataset NSWHospitals.sav contains information about the costs and treatment of various hospitals in different areas (“SLA”s) of New South Wales, Australia.
The response variable we will focus on is called StdCostRatio, which is a standardised measure of how expensive various hospitals are, per patient (average= 100). The remaining variables are described in an appendix. You don’t need to know the definitions.

Fit a linear regression of StdCostRatio on all the columns from SupplyBeds10000 to Nocar. (i.e. exclude StdSeparationRatio. ) Include Collinearity statistics. Show output.

Comment on what evidence there is of multicollinearity in the regression.

Fit a stepwise linear regression of StdCostRatio on the columns from SupplyBeds10000 to Nocar. Show output.

Use backwards elimination to choose a model for StdCostRatio.

Show the Model Summary and Coefficients results.

Which model is best in terms of adjusted R²
?

Which model is best in terms of Std Error of the Estimate?

Is the final backwards elimination model better or worse than the model chosen by Stepwise? (State you reasons for your answer).

Fit the models chosen in part (c) iii and iv. (Use Enter now, not stepwise etc.) and save the studentized dffits. Show output. Compare the standard deviation of deleted residuals. Which model is best by this criterion? What is the idea behind this criterion?

Using the regression coefficients for the model you have chosen in part (d), write a sentence or two describing the type of SLA which is predicted to have a high StdCostRatio.

Re-fit that last regression, showing the partial regression plots (added variable plots) and saving the standardised DFFITS . Which graph seems to be most dependent on a single point for its slope? Identify the point, e.g. by row number.

005_jkxsazo-eamarhqk.sav 005_naxsazo-yvlr4r0l.docx

David · Accepted Answer

1 
 
161.221 Assignment 2       Due Sunday 4 June 2017 
Q1.  The dataset ModelingPVCapacity.xls has data on the amount of installed photovoltaic cells (solar 
panels) in New Zealand.  The amount installed is called the Capacity, and is measured in MW.   We want to 
model this as a function of  Time   (measured in  hundreds of days since 31 July 2013). 
(a) Using Analyze> Curve Estimation, fit a cubic regression model for Capacity vs Time. Is the cube 
power term significant?  Show the graph.
Regression mode: 
Capacity = 4.524 + 1.813*time + .266*time^2 - .010*time^3 
 
Ho_i: beta_i is not significant 
H1_i: beta_i is significant 
With p-value  Legacy Dialogs > Scatter/Dot 
> Overlay Scatterplot > Define    to plot the Capacity vs Time and Predicted values vs Time 
overlaid on the same graph. (Change the plotting symbol for the predicted values to +).
Parameter Estimates 
Parameter Estimate Std. Error 95% Confidence Interval 
Lower Bound Upper Bound 
c 52.847 1.303 50.190 55.504 
a -2.196 .020 -2.237 -2.155 
b .318 .007 .303 .333 
Correlations of Parameter Estimates 
 c a b 
c 1.000 -.389 -.928 
a -.389 1.000 .042 
b -.928 .042 1.000 
ANOVA
a
 
Source Sum of Squares df Mean Squares 
Regression 18927.740 3 6309.247 
Residual 5.717 31 .184 
Uncorrected Total 18933.457 34  
Corrected Total 3919.920 33  
Dependent variable: Capacity 
a. R squared = 1 - (Residual Sum of Squares) / (Corrected Sum of 
Squares) = .999. 
3 
 
 
 
ii. What is the predicted maximum capacity based on this model?
Capacity = 52.847*exp(-2.196+ .318*Time)/(1+exp(-2.196+.318*Time))
iii. Does this nonlinear regression model fit better or worse than the cubic model? Quote 
evidence.
On basis of R^2, both models are equal as their R^2 = 99.9%. But comparing SSE, I 
prefer cubic model as its value of SE is 2.553 which is less than SE (5.717) of non-linear 
model.
(c) From other considerations, the NZ Electricity Authority believes that  in the long run (i.e. 
asymptotically)  about half of New Zealand households will find it economic to install PV systems.  
In view of the current population, that equates to about C =2800 MW of Capacity.  Calculate a new 
variable   
      logitC = ln( Capacity/ (2800 – Capacity) )  .  
and fit a regression model   logitC = b0 + b1* Time + b2* Change 
Save the predicted values.   (The variable Change is in the data file)
i. Show the regression output, and use Graphs > Legacy Dialogue etc. to plot an overlay 
scatterplot of logitC vs Time and the predicted values vs Time.
Variables Entered/Removed
a
 
Model Variables 
Entered 
Variables 
Removed 
Method 
1 Change, Time
b
 . Enter 
a. Dependent Variable: ln_C 
b. All requested variables entered.
4 
 
Model Summary
b
 
Model R R Square Adjusted R 
Square 
Std. Error of the 
Estimate 
1 .999
a
 .997 .997 .03285 
a. Predictors: (Constant), Change, Time 
b. Dependent Variable: ln_C 
ANOVA
a
 
Model Sum of Squares df Mean Square F Sig. 
1 
Regression 12.432 2 6.216 5759.154 .000
b
 
Residual .033 31 .001   
Total 12.465 33    
a. Dependent Variable: ln_C 
b. Predictors: (Constant), Change, Time 
Coefficients
a
 
Model Unstandardized Coefficients Standardized 
Coefficients 
t Sig. 
B Std.

The dataset ModelingPVCapacity.xls has data on the amount of installed photovoltaic cells (solar panels) in New Zealand. The amount installed is called the Capacity, and is measured in MW. We want to...

Answer To: The dataset ModelingPVCapacity.xls has data on the amount of installed photovoltaic cells (solar...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3