First Go to Cocalc.com and log in using the following password and user name username XXXXXXXXXX Peaceonearth1!. and then click on Fall2021 There is a lesson file and a homework file. the home work...

1 answer below »
First Go to Cocalc.com and log in using the following password and user name
username [email protected] Peaceonearth1!.
There is a lesson file and a homework file. the home work number is Homework 4. you can do the work through the Cocal or you can do it on your own Jupiter and paste the code in CoCalC





First Go to Cocalc.com and log in using the following password and user name username [email protected] Peaceonearth1!. and then click on Fall2021 There is a lesson file and a homework file. the home work number is Homework 4. once you loged in in CoCalc, its the one that said Fall2021 there is two folder, one is home work folder and the other one is lesson folder some of the questions might ask you to go to the lesson to get the starting code
Answered 3 days AfterOct 03, 2021

Answer To: First Go to Cocalc.com and log in using the following password and user name username XXXXXXXXXX...

Robert answered on Oct 06 2021
135 Votes
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Problem Set up\n",
"*Note: this information is not included in the Canvas quiz.*\n",
"\n",
"The file *Airfares.xlsx* contains real data that were collected between Q3-1996 and Q2-1997. The first sheet contains variable descriptions while the second sheet contains the data. A csv file of the data is also provided (called *Airfares.csv*).\n",
"\n",
"We're copying the instructions from the presentation file here for ease of use.\n",
"\n",
"The following problem takes place in the United States in the late 1990s, when many major US cities were facing issues with airport congestion, partly as a result of the 1978 deregulation of airlines. Both fares and routes were freed from regulation, and low-fare carriers such as Southwest (SW) began competing on existing routes and starting non-stop service on routes that previously lacked it. Building new airports is not generally feasible, but sometimes decommissioned military bases or smaller municipal airports can be reconfigured as regional or larger commercial airports. There are numerous players and interests involved in the issue (airlines, city, state, a
nd federal authorities, civic groups, the military, airport operators), and an aviation consulting firm is seeking advisory contracts with these players. \n",
"\n",
"A consulting firm wishes to determine the maximum average fare (FARE) as a function of three variables: COUPON, HI, and DISTANCE. COUPON, HI, and DISTANCE are things that an airline could control, when determining where to locate new routes. \n",
"\n",
"Moreover, they need to impose constraints on \n",
"- the number of passengers on that route (PAX) $\\leq 20000$\n",
"- the starting city’s average personal income (S_INCOME) $\\leq 30000$\n",
"- the ending city’s average personal income (E_INCOME) $\\geq 30000$\n",
"\n",
"For additional constraints:\n",
"* restrict COUPON to no more than 1.5\n",
"* limit HI to between 4000 and 8000, inclusive\n",
"* consider only routes with DISTANCE between 500 and 1000 miles, inclusive.\n",
"\n",
"However, the variables PAX, S_INCOME, and E_INCOME are not decision variables so the firm must first model these variables using COUPON, HI, and DISTANCE as predictors using linear regression (predictive analytics). They'll also use linear regression to model a linear relation between FARE and COUPON, HI, and DISTANCE. Armed with these predictive models the firm will build a linear program (prescriptive analytics) to maximize the average fare.\n",
"\n",
"Suppose you are in the aviation consulting firm and you want to maximize airfares for the particular set circumstances described below. The file *Airfares.xlsx* contains real data that were collected between Q3-1996 and Q2-1997. The first sheet contains variable descriptions while the second sheet contains the data. A csv file of the data is also provided (called *Airfares.csv*).\n",
"\n",
"*NOTE: This problem scenario is developed from pp. 170-171 in Data Mining for Business Analytics: Concepts, Techniques, and Applications in R, by Shmueli, Bruce, Yahav, Patel, and Lichtendahl, Wiley, 2017)*\n",
"\n",
"## Part 1: The Predictive Models\n",
"Since each of these models uses the same predictors and the only thing that varies is the response variable, write a function that takes in the dataframe, a list of predictors and a response variable string which:\n",
"* runs the linear regression based on the \n",
"* returns the model\n",
"* prints the regression equation.\n",
"\n",
"Use a non-repetitive approach to run multiple linear regression **through the origin** using the average number of coupons (COUPON) for that route, the Herfindel Index (HI), and the distance between the two endpoint airports in miles (DISTANCE) as predictors. You'll build 4 multiple linear regression models, one for each of the following response variables:\n",
"\n",
"- the average fare (FARE)\n",
"- the number of passengers on that route (PAX)\n",
"- the starting city’s average personal income (S_INCOME)\n",
"- the ending city’s average personal income (E_INCOME)\n",
"\n",
"For each of the models, you'll need to:\n",
"\n",
"* print the resulting linear equation. For instance: $FARE = X_1COUPON + X_2HI + X_3DISTANCE$ with the $X_n$ coefficients filled in.\n",
"* print the $R^2$ for each model. (Hint, it's stored in a variable that can be accessed by calling .rsquared on whatever variable you created when you fit the model.)\n",
"* store the data in such a way that you can use the coefficients directly in the linear program.\n",
"\n",
"\n",
"\n",
"There are multiple ways you could do this to get full credit. You could write a function and call it 4 times. You could use a loop, without a function. You could use a combination of loop and function. Non-repetitive code means that you are not copy/pasting the same lines of code over and over again. To get full credit, you must not be replicating the same bits of code over and over. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import statsmodels.api as sm\n",
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn.linear_model import LinearRegression\n",
"import plotly.graph_objects as go\n",
"\n",
"airfare=pd.read_csv(\"D:\\\\New\\\\Airfares.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
FAREPAXS_INCOMEE_INCOMECOUPONHIDISTANCE
064.11786428637.021112.01.005291.99312
1174.47882026993.029838.01.065419.16576
2207.76645230124.029838.01.069185.28364
385.472514429260.029838.01.062657.35612
485.472514429260.029838.01.062657.35612
\n",
"
"
],
"text/plain": [
" FARE PAX S_INCOME E_INCOME COUPON HI DISTANCE\n",
"0 64.11 7864 28637.0 21112.0 1.00 5291.99 312\n",
"1 174.47 8820 26993.0 29838.0 1.06 5419.16 576\n",
"2 207.76 6452 30124.0 29838.0 1.06 9185.28 364\n",
"3 85.47 25144 29260.0 29838.0 1.06 2657.35 612\n",
"4 85.47 25144 29260.0 29838.0 1.06 2657.35 612"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(airfare['FARE'])\n",
"df['PAX'] = airfare['PAX']\n",
"df['S_INCOME'] = airfare['S_INCOME']\n",
"df['E_INCOME'] = airfare['E_INCOME']\n",
"df['COUPON'] = airfare['COUPON']\n",
"df['HI'] = airfare['HI']\n",
"df['DISTANCE'] = airfare['DISTANCE']\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# regModel() function"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [],
"source": [
"def regModel(X, y):\n",
" X = sm.add_constant(X)\n",
" model = sm.OLS(y,X)\n",
" result = model.fit()\n",
" beta_values = result.params\n",
" rsq_val = result.rsquared_adj\n",
" \n",
" B0 = beta_values[0]\n",
" B1 = beta_values[1]\n",
" B2 = beta_values[2]\n",
" B3 = beta_values[3]\n",
" \n",
" reg_eqn = str(y.name) + '=' + str(B0) + '+' + '(' + str(B1) + ')' + '*' + 'Coupon' + '+' + '(' + str(B2) + ')' + '*' + 'HI' + '+' + '(' + str(B3) + ')' + '*' + 'DISTANCE'\n",
" print(reg_eqn)\n",
" print(rsq_val)\n",
" print(result.summary())\n",
" return beta_values"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [],
"source": [
"X = df.iloc[:, 4:7]"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"FARE=0.15091723772984267+(22.472163154481805)*Coupon+(0.011792019438414687)*HI+(0.08335424286644894)*DISTANCE\n",
"0.5090942382057464\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: FARE R-squared: 0.511\n",
"Model: OLS Adj. R-squared: 0.509\n",
"Method: Least Squares F-statistic: 221.2\n",
"Date: Sun, 03 Oct 2021 Prob (F-statistic): 3.59e-98\n",
"Time: 15:11:25 Log-Likelihood: -3439.5\n",
"No. Observations: 638 AIC: 6887.\n",
"Df Residuals: 634 BIC: 6905.\n",
"Df Model: 3 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"const 0.1509 18.363 0.008 0.993 -35.909 36.211\n",
"COUPON 22.4722 15.829 1.420 0.156 -8.612 53.556\n",
"HI 0.0118 0.001 9.002 0.000 0.009 0.014\n",
"DISTANCE 0.0834 0.005 16.913 0.000 0.074 0.093\n",
"==============================================================================\n",
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here