Sheet4 magemeducmonprenpvisfagefeducomapsfmapscigsdrinkmalemwhtemblckmothfwhtefblckfothbwght 69 5262 472391010010697 6812310611146251111001001290...

Regression analysis


Sheet4 magemeducmonprenpvisfagefeducomapsfmapscigsdrinkmalemwhtemblckmothfwhtefblckfothbwght 695262472391010010697 6812310611146251111001001290 711236461227211210100101490 591618481678211000010011720 481246391229171301001001956 67114840849161401001001984 5412212461299171210100102050 711447511198151301001002068 56121953148914911001002148 581221261169913600100102180 6011785817991001001002266 421621248169922900010012310 2912311321399131001001002359 301246431299151101001002438 341621239169923900010012490 4317212391691014710010012500 331221636168913600010012523 5316225381689161200010012523 421421236169918700010012530 38133313316922601001002544 341721039179914500010012580 4014112361499161010100102580 441621244169913910010012590 401228401289191011001002608 511631137149916900010012630 2914115371581013911001002633 3116115341299111010101002637 301331039123623611001002637 421321042112518801001002639 331621042169917810101002658 4716110391689201010010012700 2811310351191016701001002721 371221028109911801001002722 3714310371291019811001002770 32122103912999501001002770 24121102612896810010012778 3212210291699161000010012780 231431436138919800101002790 331231238129914710100102790 471211343118912401001002799 3816212391799101010010012807 501611251169918810010012810 58162854179911700010012839 5211255812896500100102850 491021047123711810100102855 6412135491091010401001002890 5213376811299711001002899 48122866129912700100102900 471232062169924610010012905 50143846179919410010012920 571211155179914610010012920 621431254149916711001002939 70143959168981000010012948 61161155316995801001002955 611721273177817900010012980 61172134916994400010012990 421221662128915401001003000 6411835811992610100103000 51142206716992900010013005 341711244178912410010013005 421421330128916810100103010 461231250149917510100103030 381631039168914910010013030 4414120371291014610100103033 371621038169914510010013050 38114336128917401001003090 261611224166921400010013090 4014312451491017400100103090 321611236128915500100103104 29152123816996600100103130 43161124112994700100103147 42161935179917700010013147 44123946129911910100103150 4715239179919710010013170 421371126992410101003170 311611235169913410010013180 301611239169925810010013200 3612393214999810100103200 491723029179917410100103203 34172143817895800010013218 481621235169915400010013240 3911310317881511001003254 3812492912897600100103255 3916312321691015300010013300 33122153612991911001003300 441111226129910410100103310 381411241148911400100103317 25161172817995610010013320 39142123812898500100103325 40142123712996610100103330 451331249129911710100103345 361429331291013801001003359 331431225129912710100103370 481221246129920400100103374 3617284316892610100103389 341421243148914310011003430 301611240169910310011003440 29142730118912511001003449 4812384112998510100103455 511768388917600100013459 481621342169913210010013459 291221043125921200100103490 501611136178918400010013500 411421929179914510100103515 41162113017995801001003518 411621241169915500010013530 301011232129911410011003590 3514310351791013410010013600 311721242178921400011003600 281211346108912500100103600 461141123128913400010103600 38134177911710010013610 29162123416881801001003619 261621635168917711001003620 431936895810010013620 421327481710911611001003621 2912115321391017411001003629 30171154314998710100103629 34164123915995800010013647 4311211391291010811001003650 33132123614899500100103650 531321438149911310100103650 4216263717994510100103652 38168933169918300010013655 3416218391681015201001003657 401728361691013410011003657 44847448283410100103657 40122144012990700100103663 51122123712990400100103680 29125113712899100100103680 311611434179910500010013685 30122134512995410100103700 421421242149910311001003700 33162124216994110010013710 471611239149912400100103710 28152113514995410100103714 37124628128913400100103714 3712273712993610100103720 321611239169914200010013730 241621026169911410100103730 3217212291791013510010013742 231221236116910401001003742 33102123812898710100103742 47131174316990110100103742 3810393914895511001003742 37154739121099701001003746 321231130129107411001003756 361611437169919400010013770 411421235109916110010013770 40163103617996310100103770 38121123614993610100103775 391221234129910600100103790 41122303412377611001003792 37128540794010100013799 27162123616998010100103810 35172164717999200010013820 3212282512994300100103827 351378301110108601001003834 291511638159105611001003850 42122153812899511001003856 39122103514898500100103860 42121844128915201001003870 391721149178912500010013870 40122124812890210011003884 361236317916200100013900 35135531138107201001003905 281611334168911510100103910 50163102312381110100103912 32121122612996010100103912 491621235169916010010013912 3314311351310913001001003919 3413482616990510100103925 48142133516894111001003940 401311533128912111001003941 38143134114799500100103941 27171123117996200101003950 33162123616990010010013950 34132123812999510011003970 4116192716994010011004040 2912293712996011001004050 25122113312891501001004050 31122103816991100010014050 41152123416997110010014090 421611239128108511001004111 24154143515898010100104139 3812232149913010100104210 42162122616995110011004224 35122123512896210010014259 33162123516891310010014315 49112123916990300100104470 43156113612891110100104536 301251238573210100014610 411421247168910410100104660 33152153813880101001004678 31172643149108410100104791 39172103812890000011004933 Regression Model Development In this assignment, you are tasked with using the information in our course case to build a predictive model on a continuous response variable (Y-variable). This assignment encompasses feature engineering, model preparation, variable selection, and model development.    A) Deliverable: Jupyter Notebook or .zip file containing your Jupyter Notebook. Please use the following naming convention: FamilyName_FirstName_A1_Regression_Analysis.ipynb   B) Modeling Criteria and Violation Penalties Your deliverable needs to meet the following criteria. Failure to meet the coding criteria listed below will result and a reduction in your model points score.   Your grade will be determined by the performance of your final model as follows:   Final Model Points = Final Model R-Square on the Test Set – Modeling Violation Penalties   Criterion 1 – Train-Test Gap Gap between training and testing scores must be less than or equal to 0.05. In train-test split, make sure your random_state is set to 219 and your test_size is set to 0.25.   Violation Penalty If the gap is greater than 0.05, your final model points will be reduced by the amount of the gap that exceeds 0.05. For example, if the train-test gap is 0.06, your final model points will be reduced by 0.01. If your random_state is not set to 219 or your test_size is not set to 0.25, this will be manually adjusted when your deliverable is being graded and your grade will be reduced by by 0.025.     Criterion 2 – Response Variable Usage The response variable cannot be used in any form as an explanatory variable (the Y-variable cannot be used on the X-side). This includes logarithmic versions of the y-variable, and features that were engineered using the y-variable.   Violation Penalty Both of the following will occur if the response variable was used as an explanatory variable: · The model will be rescored after this variable has been removed. This will likely result in a major reduction in your final score, so be careful!     Criterion 3 – Model Types Model types are appropriate for the task at hand and come from statsmodels or scikit-learn (other packages and engines are not permitted).   Permitted Model Types · OLS Regression (standard linear regression) · Lasso Regression · Bayesian Automatic Relevance Determination (ARD) · K-Nearest Neighbors Regression (KNN)   Note that you are permitted to adjust the optional arguments of the permitted model types.   Violation Penalty Final models that are not in the list of permitted model types will be discarded and the last appropriate model that ran in your code will be used as your final model. Final model points will be reduced by 0.025.     Criterion 4 – Code is Well-Commented and Runs Without Errors For this assignment, aim for a minimum one quality comment for every 10 lines of code.   Violation Penalty · Not being well-commented will reduce your final model points by 0.025. · Submitting a code with at least one error will reduce your final model points by 0.025.     Criterion 5 – Code Processing Time Your code must process from beginning to end in 60 seconds or less, based on your computer’s processing speed. There is no need to calculate processing time in your code.   Violation Penalty · Going over the processing limit will result in a 0.25 reduction in model points.   Criterion 6 – Model Output Model results are outputted as a dynamic string (i.e., f-string) at the end of your script. This must be the last thing that your Jupyter Notebook outputs. Writing this as markdown or exporting as an Excel file are not acceptable (must be a dynamic string). Output table of candidate models is well-formatted and contains the following information: · Model Type · Training Score · Testing Score · Train-Test Gap · It is clear which model is your final model (label it accordingly). The final model MUST be labeled in your dynamic string to meet this criterion.   Violation Penalty Not including all of the above information in a well-formatted dynamic string will result in a 0.25 reduction in model points.   If it is unclear which model was selected as the final model, the following will occur, the last model in the dynamic string will be utilized as your final model.   Criterion 7 – X-variable usage The original and logarithmic versions of an x-variable may not be used in the same model. This does not include engineered features based on these variables.   Violation Penalty Using both the original and logarithmic versions of an x-variable will result in the logarithmic version of the x-variable being removed from the model, and the model will be run and rescored again. Your final model points will also be reduced by 0.025.   Criterion 8 - Full Dataset Usage You are not permitted to remove or modify any observations from the original dataset, with the exception of imputing missing values (you are not permitted to remove observations with missing values). Also, your Jupyter Notebook must be able to be run from the original dataset (no feature engineering or alterations in Excel or other tools are permitted).   Violation Penalty If the above is violated, your Jupyter Notebook will be rerun from the original dataset and any errors that result from this will be subject to Criterion 4 above. Also, your final model points will be reduced by 0.025.
Nov 18, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here