NO REFERANCE NEED . YOU NEED XL FRONTLINE ANALYTICS SOLVER AND DATA /DATA ANALYSIS TO SOLVE THIS PROBLEMS
1- ISYS3374 Business Analytics – Final Exam Note: You need to submit your answers in a word document. You need to transfer the results from the excel file into the word document. In addition, you must submit your Excel files but note that only the word document will be marked. If you think there is any issue with any questions, please make your assumptions and clearly explain them in your report. SECTION A: Discussion Questions 1- Explain the concept of having the imbalance data in classification techniques and the way that it should be treated in developing the classification models? 2- Explain the concept of over-fitting. Explain how overfitting can be avoided? 3- Give two examples of how logistic regression can be used. You only need to explain the problem. One example is the bank that are using logistic regression to classify its new customers for loan approval. The bank wanted to identify customers that are more likely to default on their loan. Explain why you cannot use linear regression in your examples. SECTION B: QUANTITATIVE QUESTIONS 1. There are 500 client records in the first sheet of the file Toy-Info which have shopped many special toys from an e-Business website. Each record includes data on types of product purchased (between 1-5), purchase amount ($), age, gender, marital status, whether the client has a membership and whether the customer has a discount card. A business analyst has applied the k-means clustering method on all seven variables. The analyst increased the number of clusters to recommend a proper value of k. The resultant tests for k=5 and k=6 shown in the following sheets of the file revealed the best k as k=6. a) Explain how the analyst found that k=6 is a proper number of clusters. Refer the relevant sheet name, table name and the values you compared. b) Describe all 6 clusters by their average characteristics. 2- A company provides maintenance service for washing machines in Victoria. The analyst of the company aims to estimate the repair time and the service cost for each maintenance. He assumes the repair time as the dependent variable which can be related to number of months since last service, type of repair and the repair person. The following table reports 10 samples of the maintenances. Repair time (hours) Months since last service Type of repair Repairperson 2.1 2 Mechanical John 2.8 2 Electrical John 1.6 3 Mechanical John 3.9 4 Electrical Bob 2.5 6 Mechanical John 3.1 6 Electrical John 4.5 7 Electrical Bob 4.7 8 Electrical Bob 3.8 9 Mechanical Bob 4.6 9 Electrical Bob a) Create an estimated simple regression model for this data where months since last service is the independent variable. What does the model indicate about the relationship between months since last service and repair time? How strong is the relationship? Report the accuracy measures and the equation. b) Calculate the residual errors for each repair exists in the table and interpret the meaning of positive and negative values of the residuals in this analysis. Which type of repair (electrical or mechanical) is more desirable and which repairperson (John or Bob) has worked more efficient? c) Create a scatter chart with months since last service on the x axis for which the points representing electrical and mechanical repairs are shown in different colors. Create a similar chart of months since last service and repair time for which the points representing repairs by John and Bob are shown in different colors. Do these charts suggest any potential modifications to your simple linear regression model? Why? 3- The following data is the results of a 4- year study conducted to assess how age, weight, and gender influence the risk of diabetes. Risk is interpreted as the probability (times 100) that the patient will have diabetes over the next 4-year period. a) Develop a multiple regression model that relates risk of diabetes to the person’s age, weight and the gender. Present the regression formula as a mathematical equation. Interpret the coefficients of the regression and comment on the strength of the regression. b) Develop an estimated multiple regression model that relates risk of diabetes to the person’s age, weight, gender and life style. Present the regression formula as a mathematical equation. Interpret the coefficients of the regression and comment on the strength of the regression. c) What is the risk percentage of diabetes over the next 4 years for a 55-year-old man living in a big city with 70 kg weight? Use both models to estimate the risk and compare the result. Age Weight (Kg) Gender Life style Risk (%) 53 78 Female Small town 40 24 77 Male Big city 23 77 83 Female Country 67 88 89 Female Small town 71 56 65 Male Big city 45 71 82 Female Country 54 53 79 Female Small town 48 70 66 Male Small town 49 80 80 Female Big city 65 78 67 Male Big city 59 71 69 Male Big city 56 70 78 Female Small town 59 67 75 Male Country 46 77 95 Female Big city 64 60 57 Male Country 39 82 100 Female Big city 73 66 85 Male Small town 63 80 96 Male Big city 87 62 83 Female Country 52 59 93 Male Big city 61 4- An internet provider company in Australia is interested in identifying the reason for individuals who are still undecided in buying the new NBN service of the company. The file NBN-service contains data on the first sheet which introduces a sample of customers with variables that tracked the decision outcome. A business analyst has created a standard partition of the data with all tracked variables and 40% of observations in the training set, 35% in the validation set, and 25% in the test set. The analyst applied two logistic regression models to classify undecided customers of the company. The resultant output of the Solver software for both models has been added in the following sheets. a) Determine the selected input variables in each model and explain why the analyst has changed one of the input variables. b) Write the obtained logistic regression equation for the first model shown in worksheet “4-1-1” and predict a customer with Contract duration of 16 months, Bonus data of 63 GB and Usage of 237 GB whether he/she will decide to buy the new service or not? Explain how you found the prediction. c) Find the class 1 and class 0 errors based on the sheet “4-1-2” and compare your results with the confusion matrix. Explain which kind of these errors are more undesirable in this model? d) In the second model (shown in worksheet “4-2-1”), compare the accuracy of the model with the first model. Which one do you recommend? 5- Paul has a new job in project management. He plans to invest the same amount of $15,000 into a retirement account at the end of every year for the next 30 years. Suppose that annual return is 6%, then: a) Create a data table which shows Paul the balance of retirement account for various levels of annual investments and returns. b) If Paul aim to gain $1,500,000 at the end of the 30th year, how much money he should put in the investment annually. 6- FSUB is a company that intended to introduce its product by advertising them in 3 relevant websites. The names of these websites are determined as A, B and C by the marketing manager of the company. Viewer estimates, cost per advertisement, and maximum usage