LEARNING OUTCOMES 1. Evaluate the various data types, data storage systems and associated techniques for indexing and retrieving data. 2. Design feature engineering techniques to transform...

1 answer below »

View more »
Answered Same DayJan 18, 2021

Answer To: LEARNING OUTCOMES 1. Evaluate the various data types, data storage systems and associated...

Sakshi answered on Mar 12 2021
149 Votes
]Page 1
predictive analytics of automobile insurance.
Predictive Analytics of Automobile Insurance
Abstract
    For the past and coming decades automobile insurance would be the competitive business. The un predictable natural disasters made the humans life most complicated. The smart persons who have sensible to situations make their life easy by taking smarter decisions like insuring their possessions. This study tends to analyse the purchasing behaviour of customers on buying insurance policy with different attributes given. Also this analytics intended to predict the given customer, will buy car insurance or not. The descriptive statistical analytical method with data mining techniques is used to analyse the data. The data is analysed in 3 phases with 6 stages. Data observat
ion and data cleaning is performed in SAS studio. Exploratory analysis is done using tableau and SAS. Data transformation and feature engineering is done using data mining and statistical studies. The K-nearest neighbouring algorithm is used to predict the given test data. Using hive the hypothesis is formulated and results are generated.
Table of Contents
Abstract    2
Introduction    3
Related work    4
Phase 1: [using SAS studio]    7
Data observation    7
Data cleaning    8
Phase2: [using SAS and tableau]    8
EDA    8
Data transformation    8
Feature engineering    8
Phase 3: [SAS and Cloudera hadoop & hive]    8
Prediction &classification    8
Hypothesis    8
Discussion    8
i.    Initial data exploration    8
Data pre-processing/cleaning    11
Prediction/classification:    15
Loading on hdfs/hive    16
Hypothesis with Hivesql    16
Conclusion:    18
References    19
Introduction
    The automobile insurance business is one of the leading trends in insurance sectors. Identifying the best customers and retaining the good customers are challenging task to marketers. Predictive analysis is one of the best solutions to understand the customer’s behaviour using historical data and to predict the future customers.
    The goal of this study is, a comprehensive analysis of customer’s purchasing behaviour of automobile insurance and to predict the unknown customer’s buying behaviour. The car insurance data set is given to analyse the customer, will purchase car insurance or not. several interesting attributes are given with the data set, and it has been sliced as train and test data with 4000 and 1000 observations.
    First, the data is observed completely by loading and exploring the data with SAS studio, the number of attributes and their data types are addressed. Then the data set is examined for its inconsistency. The messy data is cleaned for further process, because the noisy data cannot produce proper results. The distribution of data is inspected and data transformation is dome to tame the data to avoid outliers.
    Feature engineering, feature extraction also done with the data set to address the imp actable variables on the outcome. The binary outcome attribute ‘car insurance’ is the target variable in this study. Exploratory analysis also prepared to know the relation between attributes with the target attribute. EDA is done using SAS and tableau. It revels several interesting correlation between the attributes. The hidden patterns are uncovered and the important features are addressed thru this data visualization
    The first part of sliced data is trained knn classification algorithm and the rest is tested and the outcomes are predicted. The cleaned and transposed data is used to train and test the model. and the model is evaluated with confusion matrix for accuracy. The data set is loaded into hadoop distribution file system of Cloudera distribution. the data is converted as table using queries. The simple and complex hypothesis is formed and results are scrutinized.
    Using statistical, data mining and exploratory analysis techniques the data is analysed and the test data is predicted and the prediction is calculated. The findings of hypothesis are very much useful to design the marketing strategies, to address the best customers and to sustain them for their revenue.
Related work
    The study of analysing “Predictive Analysis of Auto Insurance Purchasing Behavior” by “Roosevelt C. Mosley” proposes two major types of study. One is likelihood of customers; purchasing the quote another is likelihood of customers purchasing the policy. The outcomes helped to identify the potential of customers who buy maximum policy coverage, and designing marketing strategies. The proposed work is for analysing internet automobile insurance purchasing uses a dataset from ComScore.inc which provides 1 million internet users of US provide permission to track their online behaviour. Decision tree, Neural Network and linear regression are the predictive models used to predict the binary target variable. An ensemble method is created to calculate the accuracy. Univariate and multivariate exploratory analysis also did using SAS. The findings of study is ,The age of first driver from 25 to 45 have maximum likelihood of purchasing policy. Below 25 and above 45 has minimum likelihood. The status of education of customers, who had higher education also have maximum likelihood of policy purchase(Roosevelt C & Mosley J 2011)
    Kittipong Trongsawad and Jongsawas Chongwatpol of bankok analysed “Revaluating Policy and Claims Analytics”. The difference between behaviour of purchasing and claiming policies of corporate fleet and non-fleet customers is studied through this analytics. this study analyses how corporate companies sustain their best customers and policy claim and loss of claims. The SEMMA methodology used to analyse the two types of categorized data fleet and non-fleet with 25 variables. The target variable is claim_occured and sum_insured. The three main predicting methodologies of data mining , linear regression, decision tree, Neuralnetworks are used to predict the targets. False negative, prediction accuracy, misclassification rate are the three main criteria uses to fit the best model(Kittipong Trongsawad & Jongsawas Chongwatpo 2014).
    Richard & Wendy    analysed “A Predictive Analytics Case to Analyse Automobile Insurance Fraud”. Data extracted from alpha insurance company is used to identify fraud claim among the customers. In this study historical data is considered as good potential to analyse fraud claims. The data set has 22 variables and 4008 claims to be analysed. After cleansing and pre-processing, the redundant variables are removed from the data set. Hypothesis is defined and statistical analysis is done to determine the hypothesis. Five most important data mining techniques regression, neural network, gradient boosting, decision tree and ensemble were considered as models and best optimized model is selected based on accuracy measure. The accuracy measures used in this study is average square error, misclassification rate or receiver operating characteristic (ROC).( Richard McCarthy &Wendy Ceccucci 2018)
Dr. Joseph D. Petruccelli explains the study of “Data Mining for Car Insurance Claims Prediction”. The intention of the study is analysing the performance of statistical and data mining methods on predicting insurance claims. The data set taken to this analysis is downloaded from kaggle website. The data contains insurance claims of the Allstate Insurance Company from 2005 to 2011. Data from 2005 to 2007 is taken to train the model, and after 2008 to 2011 to test the best model. The 2005-2007 data consist of 34 variables and 13184290 cases. The main disadvantages that the data which challenge the analyst is, it has a weak correlation of claims and predictors, high dimensional, missing and noisy data, and last but not least it is a big data. The various statistical models used to train and test the data is Tweedie’s compound gamma-Poisson model, logistic regression, , response averaging, principal component analysis (PCA), and decision trees...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
April
January
February
March
April
May
June
July
August
September
October
November
December
2025
2025
2026
2027
SunMonTueWedThuFriSat
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30