Please do work in an R Markdown file.

Please do work in an R Markdown file.


Assignment 2 MA 347, Spring 2024 Due End of Day, Sep 29, 2024 ˆ Reading – Textbook: read selective sections in Chapter 10 (Pg. 237 - 247) and Chapter 5 (skipping all discussions involving costs). – Read the application paper on logistic regression analysis by Jef- frey S. Morrison. ˆ Problems The problems should be written up neatly and any output clearly labeled-if your grader cannot read them, your grader cannot grade them. The final write up should be your own (Good rule of thumb: If a fellow student asks you to explain a problem to them say yes, if they ask how you approached solving the problem say no). Please show R code and relevant R output. If you are using R Markdown, please submit the knitted file (either an HTML page saved as a PDF file or a PDF file). 1. Import the “PredResults.csv” into R, and answer the questions below. This dataset includes a small set of predictive model val- idation results for a classification model, with both actual values and predicted probabilities. (a) Produce a confusion matrix for each cutoff of 0.25, 0.5, and 0.75. (b) Calculate error rate (mis-classification rate), sensitivity and specificity for the three different cutoffs. (c) If the goal is to find the strategy, i.e, setting the cutoff value, which will minimize error rate, what cutoff value will you choose? 1 (d) Create a lift chart. What is the lift of the 2nd 10% of the data according to the gains table? What does it tell you about the classification model compared to a naive model? 2. The file eBayLogistic.csv contains information on 1972 auctions transacted on eBay.com during May-June 2004. The goal is to use these data to build a model that will classify competitive auctions from noncompetitive ones. A competitive auction is defined as an auction with at least one bid placed on the item auctioned. Details of predictors and response are as follows. SellerRating A rating by eBay. Duration Number of days the auction lasted ClosePrice Price item sold (in USD) currency A categorical variable indicating type of currency used in a transac- tion Competitive Whether or not the auction is com- petitive 1 = competitive (yes), 0 = noncompetitive (no) (a) Randomly split the dataset into training and validation sets. Use 60% of the data in your training set. (b) Use the first four variables in the table above as predictors to fit a logistic regression model for this classification problem. (c) Write down the fitted logistic regression model. (d) Describe all of the dummy variables used in the logistic re- gression model. (e) Interpret the estimated logistic regression coefficient associ- ated with the predictors Duration and currency. (f) Produce predicted probabilities of competitive auctions for transactions in your validation set. (g) Produce ROCs on both the training and validation data. (h) Report the AUCs on both the training and validation data. 3. A data mining routine has been applied to a transaction dataset and has classified 88 records as fraudulent (30 correctly so) and 952 as non-fraudulent (920 correctly so). Construct the confusion matrix and calculate the overall error rate (i.e, mis-classification rate). 2 4. Let p denote the probability of event A and 0 ≤ p ≤ 1. Run the following R code and show the plot you obtain. Question: what is the relationship between odds and probability of event A?Hint: how does the odds of event A change when p increases? If odds increases, how does p change? >p<-seq(0,1,by=0.01)>y<-p (1-p)="">plot(p,y,type="l",ylab="odds",lwd=4) 3
Sep 22, 2024
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here