1- ISYS3374 Business Analytics – Final Exam Note: You need to submit your answers in a word document. You need to transfer the results from the excel file into the word document. In addition, you must...

1 answer below »

View more »
Answered Same DayMay 30, 2020

Answer To: 1- ISYS3374 Business Analytics – Final Exam Note: You need to submit your answers in a word...

Pooja answered on Jun 01 2020
143 Votes
Section A
1)
Approach to handling Imbalanced Datasets
1) Data Level approach: Resampling Techniques:
· Random Under-Sampling
· Random Over-Sampling
· Cluster-Based Over Sampling
· Informed Over Sampling: Synthetic Minority Over-sampling Technique
· Modified synthetic minority oversampling technique
(MSMOTE)
2) Algorithmic Ensemble Techniques
· Bagging Based
· Boosting-Based
When faced with imbalanced data sets there is no one-stop solution to improve the accuracy of the prediction model. One may need to try out multiple methods to figure out the best-suited sampling techniques for the dataset. In most cases, synthetic techniques like SMOTE and MSMOTE will outperform the conventional oversampling and under sampling methods.
For better results, one can use synthetic sampling methods like SMOTE and MSMOTE along with advanced boosting methods like gradient boosting and XG Boost.
One of the advanced bagging techniques commonly used to counter the imbalanced dataset problem is SMOTE bagging. It follows an entirely different approach from conventional bagging to create each Bag/Bootstrap. It generates the positive instances by the SMOTE Algorithm by setting a SMOTE resampling rate in each iteration. The set of negative instances is bootstrapped in each iteration.
Depending on the characteristics of the imbalanced data set, the most effective techniques will vary. Relevant evaluation parameters should be considered during the model comparison.
2)
Informed Over Sampling: Synthetic Minority Over-sampling Technique
The technique of Informed over Sampling Technique is Synthetic Minority Over-sampling. This is followed to avoid over-fitting which occurs when exact replicas of minority instances are added to the main dataset. A subset of data is taken from the minority class as an example and then new synthetic similar instances are created. These synthetic instances are then added to the original dataset. The new dataset is used as a sample to train the classification models.
3)
Logistic regression is used when the dependent variable is categorical in nature. 
Example 1: I want to predict the probability of default for a credit card company on the basis of income, age. In this case Logistic regression is applied. The expected regression equation is in the form of:
Example 2: I want to predict the probability of theft in an electricity department on the basis of the number of consumption units, grade of the area (categorized as either high or low). In this case, Logistic regression analysis is an appropriate measure for the analysis. The Logistic regression equation is given in the form of
 
Section B
1)
a)
Considering the table of original coordinates in sheet 1-a-1-1there are 5 clusters with number of observations as 122, 100, 61, 105, 112, and 500. All the observations are not approximately equally distributed
And 1-a-2-1 sheet, considering the table of original coordinates in the data summary section,  the number of observations in each cluster is approximately 80. This indicates that all points are nearly...
SOLUTION.PDF

Answer To This Question Is Available To Download

Submit New Assignment

Copy and Paste Your Assignment Here
April
January
February
March
April
May
June
July
August
September
October
November
December
2025
2025
2026
2027
SunMonTueWedThuFriSat
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30