The dataset ModelingPVCapacity.xls has data on the amount of installed photovoltaic cells (solar panels) in New Zealand. The amount installed is called the Capacity, and is measured in MW. We want to...

1 answer below »
The dataset ModelingPVCapacity.xls has data on the amount of installed photovoltaic cells (solar panels) in New Zealand. The amount installed is called the Capacity, and is measured in MW. We want to model this as a function of Time (measured in hundreds of days since 31 July 2013).

  1. Using Analyze> Curve Estimation, fit a cubic regression model for Capacity vs Time. Is the cube power term significant? Show the graph.



  1. It is unbelievable that the amount of installed PV capacity will continue to increase steeply in future years. Eventually it must flatten out. Therefore use the data to fit a nonlinear regression model Capacity = C*exp(b0+ b1*Time)/(1+exp(b0+b1*Time)).


Guess at initial parameter estimates. (If the model doesn’t converge try other guesses until you get convergence.) Save the predicted values.


  1. Show the nonlinear regression output. Also use Graph > Legacy Dialogs > Scatter/Dot > Overlay Scatterplot > Define to plot the Capacity vs Time and Predicted values vs Time overlaid on the same graph. (Change the plotting symbol for the predicted values to +).



  1. What is the predicted maximum capacity based on this model?



  1. Does this nonlinear regression model fit better or worse than the cubic model? Quote evidence.



  1. From other considerations, the NZ Electricity Authority believes that
    in the long run
    (i.e. asymptotically) about half of New Zealand households will find it economic to install PV systems. In view of the current population, that equates to about C =2800 MW of Capacity. Calculate a new variable


logitC = ln( Capacity/ (2800 – Capacity) ) .
and fit a regression model logitC = b0 + b1* Time + b2* Change
Save the predicted values. (The variable Change is in the data file)


  1. Show the regression output, and use Graphs > Legacy Dialogs etc. to plot an overlay scatterplot of logitC vs Time and the predicted values vs Time.



  1. The second variable Change was chosen because in November 2014 Contact Energy slashed the amount it would pay when customers sold PV-generated power back to the national grid, a move that was followed by other companies. Is there evidence that there was a change in the slope of the line for logitC vs time? Quote evidence.



  1. Using the estimated regression coefficients, predict the value of logitC at 30 April 2017 (Time= 13.69, Change= 8.82). Convert that back to estimate the PV Capacity in MW on 30 April 2017.


(Hint if Y is the predicted value of logitC then Capacity = 2800* e^Y / (1+ e^Y). )
Q2. The dataset Dengue.xls reports a study of 196 people living in a Mexican city, of whom 57 were found to have dengue fever, a nasty mosquito-borne disease. The response variable is Dengue (=1 if diseased and 0 if not). Explanatory variables include the person’s Age (in years), whether or not they used a mosquito net (MosNet=1 if yes, 0 if no), and which Sector of the city the person lived in (sector 1,2,3,4 or 5)

  1. Fit a binary logistic regression of Dengue vs Age. Save the predicted probabilities..



  1. Show the regression output . Also plot the predicted probabilities against Age.

  2. At what age is it 50% likely that the person will have dengue fever?



  1. Remove Age, and add the
    indicator variables for Sector
    (Sector1, Sector2, Sector3, Sector4) to the regression.



  1. Quote an overall statistic and sig value for whether the probability of dengue differs between the sectors.

  2. Looking at the coefficients, which sectors have significantly higher rates of dengue fever than other sectors?



  1. Re-fit the binary logistic regression with Age, MosNet and the sector indicator variables.



  1. Does the use of a mosquito net make a significant difference to the probability of having dengue fever?

  2. What proportion of individuals are correctly classified by the binary logistic model.



Q3 . The dataset NSWHospitals.sav contains information about the costs and treatment of various hospitals in different areas (“SLA”s) of New South Wales, Australia.
The response variable we will focus on is called StdCostRatio, which is a standardised measure of how expensive various hospitals are, per patient (average= 100). The remaining variables are described in an appendix. You don’t need to know the definitions.

  1. Fit a linear regression of StdCostRatio on all the columns from SupplyBeds10000 to Nocar. (i.e. exclude StdSeparationRatio. ) Include Collinearity statistics. Show output.


Comment on what evidence there is of multicollinearity in the regression.


  1. Fit a stepwise linear regression of StdCostRatio on the columns from SupplyBeds10000 to Nocar. Show output.



  1. Use backwards elimination to choose a model for StdCostRatio.



  1. Show the Model Summary and Coefficients results.

  2. Which model is best in terms of adjusted R2
    ?

  3. Which model is best in terms of Std Error of the Estimate?

  4. Is the final backwards elimination model better or worse than the model chosen by Stepwise? (State you reasons for your answer).





  1. Fit the models chosen in part (c) iii and iv. (Use Enter now, not stepwise etc.) and save the studentized dffits. Show output. Compare the standard deviation of deleted residuals. Which model is best by this criterion? What is the idea behind this criterion?



  1. Using the regression coefficients for the model you have chosen in part (d), write a sentence or two describing the type of SLA which is predicted to have a high StdCostRatio.



  1. Re-fit that last regression, showing the partial regression plots (added variable plots) and saving the standardised DFFITS . Which graph seems to be most dependent on a single point for its slope? Identify the point, e.g. by row number.

Answered Same DayDec 26, 2021

Answer To: The dataset ModelingPVCapacity.xls has data on the amount of installed photovoltaic cells (solar...

David answered on Dec 26 2021
115 Votes
1

161.221 Assignment 2 Due Sunday 4 June 2017
Q1. The dataset ModelingPVCapacity.xls has data on the amount of installed photovoltaic cells (solar
panels) in New Zealand. The amount installed is called the Capacity, and is measured in MW. We want to
model this as a function of Time (measured in hundreds of days since 31 July 2013).
(a) Using Analyze
> Curve Estimation, fit a cubic regression model for Capacity vs Time. Is the cube
power term significant? Show the graph.
Regression mode:
Capacity = 4.524 + 1.813*time + .266*time^2 - .010*time^3

Ho_i: beta_i is not significant
H1_i: beta_i is significant
With p-value < 0.05, I reject ho at 5% level of significance and conclude that all independent
variables time, time^2 and time^3 are significant variables.
Model Summary
R R Square Adjusted R
Square
Std. Error of the
Estimate
1.000 .999 .999 .292
The independent variable is Time.
ANOVA
Sum of Squares df Mean Square F Sig.
Regression 3917.367 3 1305.789 15342.694 .000
Residual 2.553 30 .085
Total 3919.920 33
The independent variable is Time.
Coefficients
Unstandardized Coefficients Standardized
Coefficients
t Sig.
B Std. Error Beta
Time 1.813 .180 .504 10.051 .000
Time ** 2 .266 .039 .812 6.814 .000
Time ** 3 -.010 .002 -.321 -4.353 .000
(Constant) 4.524 .225 20.079 .000

2



(b) It is unbelievable that the amount of installed PV capacity will continue to increase steeply in future
years. Eventually it must flatten out. Therefore use the data to fit a nonlinear regression model
Capacity = C*exp(b0+ b1*Time)/(1+exp(b0+b1*Time)).
Guess at initial parameter estimates. (If the model doesn’t converge try other guesses until you get
convergence.) Save the predicted values.
i. Show the nonlinear regression output. Also use Graph > Legacy Dialogs > Scatter/Dot
> Overlay Scatterplot > Define to plot the Capacity vs Time and Predicted values vs Time
overlaid on the same graph. (Change the plotting symbol for the predicted values to +).
Parameter Estimates
Parameter Estimate Std. Error 95% Confidence Interval
Lower Bound Upper Bound
c 52.847 1.303 50.190 55.504
a -2.196 .020 -2.237 -2.155
b .318 .007 .303 .333
Correlations of Parameter Estimates
c a b
c 1.000 -.389 -.928
a -.389 1.000 .042
b -.928 .042 1.000
ANOVA
a

Source Sum of Squares df Mean Squares
Regression 18927.740 3 6309.247
Residual 5.717 31 .184
Uncorrected Total 18933.457 34
Corrected Total 3919.920 33
Dependent variable: Capacity
a. R squared = 1 - (Residual Sum of Squares) / (Corrected Sum of
Squares) = .999.
3



ii. What is the predicted maximum capacity based on this model?
Capacity = 52.847*exp(-2.196+ .318*Time)/(1+exp(-2.196+.318*Time))
iii. Does this nonlinear regression model fit better or worse than the cubic model? Quote
evidence.
On basis of R^2, both models are equal as their R^2 = 99.9%. But comparing SSE, I
prefer cubic model as its value of SE is 2.553 which is less than SE (5.717) of non-linear
model.
(c) From other considerations, the NZ Electricity Authority believes that in the long run (i.e.
asymptotically) about half of New Zealand households will find it economic to install PV systems.
In view of the current population, that equates to about C =2800 MW of Capacity. Calculate a new
variable
logitC = ln( Capacity/ (2800 – Capacity) ) .
and fit a regression model logitC = b0 + b1* Time + b2* Change
Save the predicted values. (The variable Change is in the data file)
i. Show the regression output, and use Graphs > Legacy Dialogue etc. to plot an overlay
scatterplot of logitC vs Time and the predicted...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here