Introduction
It is late October, 2021. You are the CEO of Metro Hospital System. You are concerned that the hospitals in your health network will not have enough intensive-care (ICU) beds to provide hospitalization to all who need it.[1] If that occurs, you plan to re-route patients to other hospitals in the state. However, the Governor of your state is concerned that the whole state might run out of ICU beds, and that very much concerns your hospital system as well. (See the last page of this case for your state assignments.)
Your job is to develop a regression model that predicts the number of ICU beds needed to accommodate future covid patients. You will use past data on daily statewide
ICU usage by covid patients
as the regression model’s
dependent variable.
The goal of your model is to estimate the effect of
newly diagnosed cases (and perhaps temperature and vaccination rates)
on the need for ICU beds for covid patients in the state.
The CDC gives states and hospital systems projections of new covid cases two months ahead. Your model will be used by the state government and the Metro Hospital System executives to predict how many ICU beds will be
needed by covid patients
two months into the future (based on CDC projected new cases), so the policy-makers and hospital administrations can take appropriate actions to adjust resources as needed.
You will summarize your regression model in a memo to the Governor and to your hospital system’s Board.
To create your regression model, you have daily data for the period from Sept.1, 2020 to late October, 2021 on the following variables:
· #intensive care beds
currently in use by covid patients in the state
(each day).
This should be your dependent variable.
· The New Covid Cases
diagnosed
each day.
· Number of people fully vaccinated in the state (as of each day).
· The average temperature each month in the state.
(Data is under QM222 All Sections\Resources\Case Assignments, with different states’ data on different worksheet tables.
DO NOT collect
any additional data to use in your analysis.)
READ AND FOLLOW THE DIRECTIONS EXACTLY
Creating Your Regression Model
You need to develop a regression model showing how covid ICU bed usage (demand) depends on new covid cases, so you can predict future ICU needs based on the CDC’s new case predictions. You will probably end up running quite a few regressions until you find the best fitting one. You have collected data on some additional variables that might affect covid bed usage (even controlling for new cases), and you will include them in your model if they do have an impact.
A few notes on other variables you may want to include:
· You know that covid spreads more easily and also may cause more serious illness in cold weather.
· You know that vaccinations make it less likely that new covid cases will worsen and lead to hospitalization and intensive care. As a result, you may decide to include the # of vaccinated individuals in your regression model.
· You know that it is possible that the demand (usage) for ICU beds might differ for different days of the week (although you are not sure why).
You realize that it takes a while for new cases to progress from being diagnosed to ending up in an ICU bed. Therefore, you need to include
lagged values of new cases
in your model. Moreover, some people may get sicker faster than others, which suggests that you should include more than one lag. Your professor will demonstrate how to make variables for “new cases lagged 1”, “new cases lagged 2” etc. For instance, you might add 3 lags and run:
Covid ICU Beds = b0
+ b1
NewCasest
+ b2
NewCasest-1
+ b3
NewCasest-2
+ b4
NewCasest-3
Which lags should you include? Ideally, you would want to start by putting into the regression all possible lags up to the maximum you think it could be. However, we don’t know this maximum, and certainly not for each state.[2] And unfortunately, Excel doesn’t let you put in more than 16 X variables. So try a reasonable first guess (such as including from 1 day lag to a 14 day lag), and see which lags you might drop out (
one at a time
) because |t|
The choices of which lag or lags to use should be based on which combinations, in addition to the other variables in the model, predict ICU bed usage (demand) most accurately.
Training data and Testing data
You plan to do what is standard practice for data analysts:
You will develop the regression model using some of your data – which is called “training the model” (with the “training data set”)
– and then use the rest of the data (the “testing data set”) to test how well your model fits.
Specifically, you will
use the data through sometime in August, 2021
(which is around 90% of the data)
as your “training” data set, and use it to develop your regression model
showing how covid ICU bed usage depends on past new covid cases, vaccination rates, and average temperatures.
The exact cutoff date in August will be different for different students, as described on the last page of this assignment.
Your second task is to
use the regression equation that you develop to predict the covid ICU usage (demand) for each day of the testing period (from sometime in August 2021 to October 24, 2021
and then evaluate how accurately your regression model predicted ICU covid usage during the testing period.
To measure how well your regression model predicts the testing data, do the following:
(1) Use your regression model developed on the training data set to predict covid ICU bed usage for each day of the testing period, based on the actual data about new cases, vaccination rates (and average monthly temperatures) from the testing period.
(2) For each day of the testing period, calculate the difference between the
actual
covid ICU usage and the predicted covid ICU usage.
(3) Add up the
absolute values
of these daily differences.
(4) Divide this sum by the number of days in the testing period to get the
average prediction error
.
You should expect the
average prediction error
to be similar in magnitude to the SEE from your training data final regression, although it is likely to be somewhat larger. (The average prediction effort will be different in the different states). The lower the
average prediction error, the more accurate your model is in predicting future covid ICU bed usage.
[1]Very sick patients who require constant care are in hospitals’ intensive care units (ICUs). If the patients have covid, they are put on oxygen and may be on ventilators. Ventilators are in limited supply, ventilator technicians are in short supply, and ICU beds are in limited supply.
[2] Respiratory problems tend to start 5 to 10 days after you get covid. However, (a) some people may get diagnosed days after they actually get covid, and (b) if someone has respiratory problems bad enough to go to the ICU, they often end up staying there for a while, so that the date that people in the ICU got diagnosed can be weeks before.