Files have attachedAssignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 10 MIS771...

Question

Files have attachedAssignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 10  MIS771 Descriptive Analytics and Visualisation DEPARTMENT OF INFORMATION SYSTEMS AND BUSINESS ANALYTICS  DEAKIN BUSINESS SCHOOL  FACULTY OF BUSINESS AND LAW, DEAKIN UNIVERSITY  Assignment Two  Background  This is an individual assignment. You need to analyse a given dataset, and then interpret and draw  conclusions from your analysis. You then need to convey your findings in a written report to an expert in  Business Analytics.   Percentage of the final grade 35%  The Due Date and Time 8 pm Thursday 17th September 2020  Submission instructions  The assignment must be submitted by the due date, electronically in CloudDeakin. When submitting  electronically, you must check that you have submitted the work correctly by following the instructions  provided in CloudDeakin. Please note that we will NOT accept any paper or email copies, or part of the  assignment submitted after the due date.  Information for students seeking an extension BEFORE the due date  If you wish to seek an extension for this assignment before the due date, you need to apply directly to  the Unit Chair by completing the Assignment and Online Test Extension Application Form before Friday  5 pm 17th Thursday September 2020. Please make sure you attach all supporting documentation and a  draft of your assignment. The request for extension needs to occur as soon as you become aware that  you will have difficulty in meeting the due date.  Please note: Unit Chairs can only grant extensions of up to two weeks beyond the original due date. If  you require more than two weeks, or have already been provided with an extension by the Unit Chair  and require additional time, you must apply for Special Consideration via StudentConnect within 3  business days of the due date.  Conditions under which an extension will usually be considered include:  • Medical – to cover medical conditions of a severe nature, e.g. hospitalisation, serious injury or chronic illness. Note: temporary minor ailments such as headaches, colds and minor gastric upsets are not serious medical conditions and are unlikely to be accepted. However, serious cases of these may be considered. • Compassionate – e.g. death of a close family member, significant family and relationship problems. • Hardship/Trauma – e.g. sudden loss or gain of employment, severe disruption to domestic arrangements, a victim of crime. Note: misreading the due date, assignment anxiety, or multiple assignments will not be accepted as  grounds for consideration.  https://www.deakin.edu.au/students/faculties/buslaw/student-support/assignment-extensions MIS771 Descriptive Analytics and Visualisations  Page 2 of 10  Information for students seeking an extension AFTER the due date  If the due date has passed; you require more than two weeks extension, or you have already been  provided with an extension and require additional time, you must apply for Special Consideration via  StudentConnect. Please be aware that applications are governed by University procedures and must be  submitted within three business days of the due date or extension due date.  Please be aware that in most instances the maximum amount of time that can be granted for an  assignment extension is three weeks after the due date, as Unit Chairs are required to have all  assignment submitted before results/feedback can be released back to students.  Penalties for late submission  The following marking penalties will apply if you submit an assessment task after the due date without  an approved extension:  • 5% will be deducted from available marks for each day, or part thereof, up to five days.  • Work that is submitted more than five days after the due date will not be marked; you will  receive 0% for the task.  Note: 'Day' means calendar day.  The Unit Chair may refuse to accept a late submission where it is unreasonable or impracticable to  assess the task after the due date.  Additional information: For advice regarding academic misconduct, special consideration, extensions,  and assessment feedback, please refer to the document "Rights and responsibilities as a student" in the  "Unit Guide and Information" folder under the "Resources" section in the MIS771 CloudDeakin site.  The assignment uses the dataset file T22020MIS771_A2Data.xlsx, which can be downloaded from  CloudDeakin. Analysis of the data requires the use of techniques studied in Module-2.  Assurance of Learning  This assignment assesses the following Graduate Learning Outcomes and related Unit Learning  Outcomes:  Graduate Learning Outcome (GLO) Unit Learning Outcome (ULO)  GLO1: Discipline-specific knowledge and  capabilities - appropriate to the level of  study related to a discipline or profession.  GLO2: Communication - using oral, written and  interpersonal communication to inform,  motivate and effect change    GLO5: Problem Solving - creating solutions to  authentic (real world and ill-defined)  problems.    GLO6: Self-Management - working and learning  independently, and taking responsibility  for personal actions      ULO 1: Apply quantitative reasoning skills to  solve complex problems.    ULO 2: Plan, monitor, and evaluate own learning  as a data analyst.    ULO 3: Deduce clear and unambiguous solutions  in a form that they useful for decision  making and research purposes and for  communication to the wider public.       MIS771 Descriptive Analytics and Visualisations  Page 3 of 10    Feedback before submission  You can seek assistance from the teaching staff to ascertain whether the assignment conforms to  submission guidelines.  Feedback after submission  An overall mark together with feedback, will be released via CloudDeakin, usually within 15 working  days. You are expected to refer and compare your answers to the feedback to understand any areas of  improvement.    The Case Study    ANALYTICs7, a leading data analysis consulting company, has extensive experience in analysing data for  both local and global, small to medium companies. By solving their business problems, ANALYTICs 7  helps these businesses to plan ahead and thrive.     Your Role in ANALYTICS7   Dr Hugo Barra, the lead data scientist at ANALYTICs7 has engaged you to lead the modelling component  for the TPM and AP projects and construct a report of your key findings and recommendations in  response to the questions posed in the meeting minutes of the last team meeting on the next page.     Datasets (accessible via T22020MIS771_A2Data.xlsx file)    There are two datasets available for this assignment: TPM_Employee_Attrition and  Monthly_EnergyCon_MW       Employee Survey data (TPM_Employee_Attrition )– TassPaperMill (TPM), a subsidiary of Pinnon Paper  Industries (PPI), is an Australian company with a long history of manufacturing paper rolls. To address  numerous concerns raised in their recent employee survey TPM is currently reviewing how they  calculate salary increments for their employees. TPM has hired ANALYTICs7 to extract a random sample  of 1470 employee records from their HR database. Their ultimate goal is to adopt a more holistic  rewarding system factoring the key relations between remuneration indicators and demographic  characteristics, employment history and various other potential contributors to boost performance. In  addition, human resource manager at TPM reported in her recent presentation to the company  executive management team that the staff turnover rate at TPM is higher compared to their  competitors.  Thus, TMP wants to identify key contributing factors before they lose more talented,  motivated and focused employees who contribute to the organisation's overall success.      Energy consumption data (Monthly_EnergyCon_MW) – Australian Power (AP) is one of the largest  generators of electricity in Australia, servicing for more than three million households in Victoria. AP  operates an electric transmission system that covers much of Victoria and serves over 30% of the  electricity demand in Victoria. This dataset consists of monthly power consumption data in megawatts  (MW) comes from AP’s data warehouse during 2010-2019. AP wishes to review their current resources  allocation strategy to plan and prioritise the provision of resources based on rapidly growing energy  demand in Victoria.     A complete listing of variables is provided in the T22020MIS771_A2Data.xlsx file.  Note: All data, reports, people and scenarios in this assignment are either fictitious or have been  modified from their original state. Any similarity to actual events is purely coincidental. It has been  produced for the sole purpose of assessing performance of summative assessment task 2.   MIS771 Descriptive Analytics and Visualisations  Page 4 of 10  Form 210-3   ANALYTICS7 Team Meeting    ANALYTICS7  727 Collins St, Docklands VIC 3008  Phone: (+61 3 212 66 000)  infor@analytics7.com.au  Reference AP-211 TPM Project  Revised 27th August 2020  Level Expert Analysis    Meeting Chair Dr Hugo Barra  Date 24 August 2020 Time 10:00 AM Location ANALYTICS7 L4.320  Topic TPM  and AP Research Projects – Analytics Details    Meeting  Purpose:  Specifying and Allocating Data Analytics Tasks  Discussion  items:          1. Variable(s) description  2. Modelling PercentSalaryHike   3. Modelling the likelihood of an employee leaving the company  4. Forecasting monthly energy consumption in Megawatts   5. Producing a technical report  Detailed  Action  Items        Who:   Modeller      What:    1. Providing an overall summary of the following two variables:  1.1. Percentage increase in salary (PercentSalaryHike)   1.2. Attrition     2. Identify potential variables that may influence PercentSalaryHike:  2.1. Identify a list of possible variables that influence percentage increase in  salary. Which three independent variables have the more impactful linear  relationship with PercentSalaryHike? What form of relationship(s) exist  between the independent variable(s) and PercentSalaryHike? Are there any  potential multi-collinearity problems? If so, which variables are they?  2.2. Build a regression model to estimate percentage increase in salary.  2.3. Perform residual analysis. Based on your residual plots, does there appear to  be any problems with the regression model?    3. Hugo has performed some preliminary analysis and discovered that the  performance rating is a significant predictor of the Percentage increase in salary.  Prior research shows that the strength of the relationship between performance  rating and percentage increase in salary may vary according to satisfaction with  the job. Generally speaking increased job satisfaction creates a more productive  workforce as they are more motivated to improve their job performance.  MIS771 Descriptive Analytics and Visualisations  Page 5 of 10  Therefore, Hugo believes that the relationship between performance rating and  percentage increase in salary should be stronger for employees who

Biswajit · Accepted Answer

Assignment Solutions
Introduction-TassPaperMill (TPM), a subsidiary of Pinnon Paper Industries (PPI), is an Australian company with a long history of manufacturing paper rolls. To address numerous concerns raised in their recent employee survey,TPM is currently reviewing how they calculate salary increments for their employees.We will extract a random sample of 1470 employee records from their HR database.The ultimate goal is to adopt a more holistic rewarding system factoring the key relations between remuneration indicators and demographic characteristics, employment history and various other potential contributors to boost performance. In addition, human resource manager at TPM reported in her recent presentation to the company executive management team that the staff turnover rate at TPM is higher compared to their competitors.Thus,TMP wants to identify key contributing factors before they lose more talented, motivated and focused employees who contribute to the organisation's overall success.
Main Body-
	Variable Name 
	Description of variables
	Age
	Measured to the nearest year
	Attrition
	Employee leaving the company (0=No, 1=Yes)
	Education
	1 Below College; 2 College; 3 Bachelor; 4 Master
	EnvironmentSatisfaction
	SATISFACTION WITH THE ENVIROMENT (working conditions; development opportunities; company culture; management styles; hierarchies and human resources policies etc)-
Responses were obtained on a 4 point scale ranging from 1 to 4 where 1 indicating low and 4 indicating very high
	Gender
	(1=FEMALE, 2=MALE)
	JobInvolvement
	Degree to which a job is central to a person's identity - Responses were obtained on a 4 point scale ranging from 1 to 4 where 1 indicating low and 4 indicating very high
	Job level
	Responsibility level and expectations of roles - 1 'Low' 2 'Medium'3 'High' 4 'Very High
	JobSatisfaction
	SATISFACTION WITH THE JOB - Responses were obtained on a 4 point scale ranging from 1 to 4 where 1 indicating low and 4 indicating very high
	MaritalStatus
	(1=DIVORCED, 2=MARRIED, 3=SINGLE)
	NumCompaniesWorked
	NO. OF COMPANIES WORKED AT
	OverTime
	(1=NO, 2=YES)
	PercentSalaryHike
	PERCENTAGE INCREASE IN SALARY
	PerformanceRating
	PERFORMANCE RATING - Responses were obtained on a 4 point scale ranging from 1 to 4 where 1 indicating low and 4 indicating very high
	WorkLifeBalance
	TIME SPENT BEWTWEEN WORK AND OUTSIDE 1 'Bad' 2 'Good' 3 'Better' 4 'Best'
	YearsAtCompany
	TOTAL NUMBER OF YEARS AT THE COMPNAY
	YearsInCurrentRole
	YEARS IN CURRENT ROLE
	YearsSinceLastPromotion
	YEARS SINCE LAST PROMOTION
Job satisfaction rating of 11 was replaced with imputed value of 2 considering its gender & other variables score similar to this row.
Q1.Description variables :
There are in total 17 variables in the dataset.Out of these 17 variables,6 are categorical variables & rest 11 are continuous/discrete variables.
We analysed the two variables Attrition & PercentSalaryHike.Below are the details.
	PercentSalaryHike
	Mean
	15.21
	Standard Error
	0.10
	Median
	14.00
	Mode
	11.00
	Standard Deviation
	3.66
	Sample Variance
	13.40
	Kurtosis
	-0.30
	Skewness
	0.82
	Range
	14.00
	Minimum
	11.00
	Maximum
	25.00
	Sum
	22358.00
	Count
	1470.00
        Table 1
Assuming that the sample was representative of the entire employee population,the average salary hike for all employees was 15.21 & standard deviation was 3.66.The data is right skewed with skewness of 0.82 & also negative kurtosis.It does not follow the standard Bell curve used in HR.While the maximum salary increase is 25%,the minimum is 11%.
As evident from the histogram,it is a long tail distribution.Around 55% of employees have salary hike between 11 -14.3 %. Around 27 % of employees salary hike is within 11-12.1 %. 
  
Above Box plot comapres the percent salary hike for people who left & who stayed back.Better would be a hypothesis test to check the statistical significance of difference.
	Row Labels
	Average of PercentSalaryHike
	No
	15.23
	Yes
	15.10
	Grand Total
	15.21
	Row Labels
	StdDev of PercentSalaryHike
	No
	3.64
	Yes
	3.77
	Grand Total
	3.66
	Row Labels
	Max of PercentSalaryHike
	No
	25.00
	Yes
	25.00
	Grand Total
	25.00
	Row Labels
	Min of PercentSalaryHike
	No
	11.00
	Yes
	11.00
	Grand Total
	11.00
	Row Labels
	Count of Employees
	No
	1233
	Yes
	237
	Grand Total
	1470
          Table 2
In Table 2,Yes stands for employees who left the organisation & No stands for employees who stayed back.
Conclusion :From 1470 employees,237 employees left which is around 16.12%.From table 2,we see that Minimum salary hike for both categories of employees i.e who left & who stayed back  are 11 percent.Similarly,maximum salary hike for both categories of employees are also 25%.Average salary hike of employees who left is 15.10 % while those who stayed back are 15.23%.The standard deviation of salary hike for employees who left is 3.77 while who stayed back are 3.64.The coefficient of variation of salary hike for employees who left is 24.97% in comparison to 23.89% for employees who stayed back.So basically,there is higher variability of salary hike for employees who left.
Q2.1
Introduction : In this case,we have 17 variables out of which one PercentSalaryHike is dependent variable & rest 16 are independent variables.Out of the 16 independent variables,6 are categorical variables & rest 10 are continuous variables.Before building the regression model,we need to do feature/variable selection by using correlation matrix where we will check correlation coefficient between independent.We will pick only the variables which have significant correlation with dependent variables.This will help us identify the continuous variables which are significant.For categorical variables,we will be directly take those in the model & if p value is less than the assumed 0.05,we will accept or else drop those.
Description :
Correlation Matrix :
	 
	Age
	Education
	EnvironmentSatisfaction
	JobInvolvement
	JobLevel
	JobSatisfaction
	NumCompaniesWorked
	PercentSalaryHike
	PerformanceRating
	WorkLifeBalance
	YearsAtCompany
	YearsInCurrentRole
	YearsSinceLastPromotion
	Age
	1
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	Education
	0.04752484
	1
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	EnvironmentSatisfaction
	0.01014643
	-0.00754
	1
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	JobInvolvement
	-0.01183258
	0.002385
	-0.0095
	1
	 
	 
	 
	 
	 
	 
	 
	 
	 
	JobLevel
	0.50960423
	0.031794
	0.001212
	-0.056062233
	1
	 
	 
	 
	 
	 
	 
	 
	 
	JobSatisfaction
	0.01479432
	-0.05039
	-0.03417
	0.398666134
	-0.018348
	1
	 
	 
	 
	 
	 
	 
	 
	NumCompaniesWorked
	0.07866392
	0.052225
	0.013452
	-0.351162338
	0.036036
	-0.388621527
	1
	 
	 
	 
	 
	 
	 
	PercentSalaryHike
	0.00363358
	-0.04893
	-0.0317
	0.550392895
	-0.03473
	0.689970561
	-0.510569943
	1
	 
	 
	 
	 
	 
	PerformanceRating
	-0.00686535
	-0.04805
	-0.03536
	0.335945013
	-0.010407
	0.431331828
	-0.245052431
	0.764765885
	1
	 
	 
	 
	 
	WorkLifeBalance
	-0.02149003
	-0.02017
	0.027627
	-0.019344333
	0.037818
	0.023805154
	0.014317645
	-0.003279636
	0.002898784
	1
	 
	 
	 
	YearsAtCompany
	0.31130877
	-0.00607
	0.001458
	-0.068457051
	0.534739
	-0.028841981
	-0.001827975
	-0.035991262
	0.003300928
	0.012089185
	1
	 
	 
	YearsInCurrentRole
	0.21290106
	0.048399
	0.018007
	-0.030374295
	0.389447
	0.000304992
	-0.002212759
	-0.001520027
	0.033066195
	0.049856498
	0.758753737
	1
	 
	YearsSinceLastPromotion
	0.21651337
	0.008263
	0.016194
	-0.065300624
	0.353885
	-0.006542608
	0.009712666
	-0.022154313
	0.01199545
	0.008941249
	0.618408865
	0.548056248
	1
Q2.2
First we create multiple linear regression with all variables 
	SUMMARY OUTPUT
	 
	 
	 
	 
	 
	 
	 
	 
	Regression Statistics
	 
	 
	 
	 
	 
	 
	 
	 
	Multiple R
	0.909365
	 
	 
	 
	 
	 
	 
	 
	R Square
	0.826945
	 
	 
	 
	 
	 
	 
	 
	Adjusted R Square
	0.825039
	 
	 
	 
	 
	 
	 
	 
	Standard Error
	1.530891
	 
	 
	 
	 
	 
	 
	 
	Observations
	1470
	 
	 
	 
	 
	 
	 
	 
	ANOVA
	 
	 
	 
	 
	 
	 
	 
	 
	 
	df
	SS
	MS
	F
	Significance F
	 
	 
	 
	Regression
	16
	16272.18
	1017.011
	433.9475618
	0
	 
	 
	 
	Residual
	1453
	3405.29
	2.343627
	 
	 
	 
	 
	 
	Total
	1469
	19677.47
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	Coefficients
	Standard Error
	t Stat
	P-value
	Lower 95%
	Upper 95%
	Lower 95.0%
	Upper 95.0%
	Intercept
	-2.45267
	0.435049
	-5.63769
	2.0675E-08
	-3.306066038
	-1.59928
	-3.30607
	-1.59928
	Age
	0.012813
	0.0052
	2.464003
	0.013854579
	0.002612631
	0.023014
	0.002613
	0.023014
	Education2
	-0.01757
	0.113468
	-0.15486
	0.876952406
	-0.24015004
	0.205007
	-0.24015
	0.205007
	Education3
	0.040894
	0.110801
	0.369079
	0.712122274
	-0.176453037
	0.258242
	-0.17645
	0.258242
	Education4
	-0.18423
	0.300032
	-0.61403
	0.539293303
	-0.772769475
	0.404314
	-0.77277
	0.404314
	Gender
	-0.8442
	0.092629
	-9.11376
	2.56733E-19
	-1.02589697
	-0.6625
	-1.0259
	-0.6625
	JobInvolvement
	0.633671
	0.049803
	12.72344
	3.07961E-35
	0.535976447
	0.731365
	0.535976
	0.731365
	Joblevel2
	-0.09591
	0.096155
	-0.9975
	0.318686742
	-0.28453263
	0.092703
	-0.28453
	0.092703
	Joblevel3
	-0.08817
	0.128929
	-0.68388
	0.494158156
	-0.341078021
	0.164734
	-0.34108
	0.164734
	Joblevel4
	-0.15145
	0.180404
	-0.83953
	0.401310951
	-0.505333689
	0.202426
	-0.50533
	0.202426
	Joblevel5
	-0.41499
	0.211314
	-1.96385
	0.049737758
	-0.829503976
	-0.00048
	-0.8295
	-0.00048
	JobSatisfaction
	0.934493
	0.042254
	22.1161
	1.18132E-93
	0.85160792
	1.017379
	0.851608
	1.017379
	Married
	-0.07691
	0.093049
	-0.82659
	0.408605586
	-0.25943844
	0.105612
	-0.25944
	0.105612
	Divorced
	-0.16533
	0.110929
	-1.49041
	0.136334201
	-0.382927229
	0.052269
	-0.38293
	0.052269
	NumCompaniesWorked
	-0.27864
	0.018575
	-15.001
	2.11031E-47
	-0.315076903
	-0.2422
	-0.31508
	-0.2422
	OverTime
	-0.0642
	0.089028
	-0.72111
	0.47096053
	-0.238834902
	0.110438
	-0.23883
	0.110438
	PerformanceRating
	4.799953
	0.12132
	39.56434
	6.2769E-233
	4.561971183
	5.037934
	4.561971
	5.037934
From the above table,we see that adjusted R square is around 82.5% & standard error is 1.53 & model is statistically significant as shown in lower p value 0.05.Below model is the refined model with all significant variables.Here in the final model we find  Female gender,Job involvement,Job satisfaction,Num companies worked,performance rating are significant.
	SUMMARY OUTPUT
	 
	 
	 
	 
	 
	 
	 
	 
	Regression Statistics
	 
	 
	 
	 
	 
	 
	 
	 
	Multiple R
	0.908644
	 
	 
	 
	 
	 
	 
	 
	R Square
	0.825634
	 
	 
	 
	 
	 
	 
	 
	Adjusted R Square
	0.825038
	 
	 
	 
	 
	 
	 
	 
	Standard Error
	1.530895
	 
	 
	 
	 
	 
	 
	 
	Observations
	1470
	 
	 
	 
	 
	 
	 
	 
	ANOVA
	 
	 
	 
	 
	 
	 
	 
	 
	 
	df
	SS
	MS
	F
	Significance F
	 
	 
	 
	Regression
	5
	16246.38
	3249.276
	1386.424
	0
	 
	 
	 
	Residual
	1464
	3431.087
	2.343639
	 
	 
	 
	 
	 
	Total
	1469
	19677.47
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	Coefficients
	Standard Error
	t Stat
	P-value
	Lower 95%
	Upper 95%
	Lower 95.0%
	Upper 95.0%
	Intercept
	-2.17115
	0.392657
	-5.52937
	3.8E-08
	-2.94138
	-1.40092
	-2.94138
	-1.40092
	Female
	-0.84082
	0.092404
	-9.0994
	2.89E-19
	-1.02207
	-0.65956
	-1.02207
	-0.65956
	JobInvolvement
	0.6423
	0.049547
	12.96332
	1.84E-36
	0.545108
	0.739491
	0.545108
	0.739491
	JobSatisfaction
	0.939155
	0.042092
	22.31182
	3.68E-95
	0.856587
	1.021722
	0.856587
	1.021722
	NumCompaniesWorked
	-0.27491
	0.018407
	-14.9352
	4.71E-47
	-0.31102
	-0.2388
	-0.31102
	-0.2388
	PerformanceRating
	4.795484
	0.121134
	39.58816
	1.2E-233
	4.557869
	5.033099
	4.557869
	5.033099
Q2.3
Residual analysis  shows that the dependent variable percentsalary hike is not exactly normally distributed & the residuals of independent variables Job involvement,performance rating & job satisfaction are not homoscedastic.The details of the residual analysis graphs are given in the Excel sheet.
Conclusion :As seen from above correlation matrix,we can see the variables such as Performance rating,Job involvement,Job satisfaction,NumCompanies worked are significantly correlated with Percentsalary hike as correlation coefficient is higher.Also,the individual scatter plots between the independent & dependent variables shows that the relationship is linear.Also,the correlation among independent variables seem to be low hence eliminating the possibility of multicollinearity.The significant variables are marked green.As most independent variables have less than 0.5 correlation coefficient among themselves,there is less chance of multicolleanirity.But still we can check for multicolleanirity if any by finding the VIF (variance inflation factor) in regression.Eventhough the model violates few of the assumptions of multiple linear regression,still we can accept it for descriptive purpose.
As per the final model,the famales have on an average 0.8402 % less PercentsalaryHike than that of male. 1 unit increase in job involvement results in 0.6423 % increase in  PercentSalaryHike.1 unit increase in job satisfaction results in 0.9391 % increase in PercentSalaryhike on average.Every additional no of companies worked results in -.2741 % decrease in PercentSalaryHike.1 unit increase in performance rating results in 4.79% increase in PercentSalaryHike.
The model is statistically significant as P value in ANOVA test less than 0.05 & adjusted R square is 82.5% 
Q3.
	SUMMARY OUTPUT
	 
	 
	 
	 
	 
	 
	 
	 
	Regression Statistics
	 
	 
	 
	 
	 
	 
	 
	 
	Multiple R
	0.91014884
	 
	 
	 
	 
	 
	 
	 
	R Square
	0.828370911
	 
	 
	 
	 
	 
	 
	 
	Adjusted R Square
	0.827667032
	 
	 
	 
	 
	 
	 
	 
	Standard Error
	1.519350169
	 
	 
	 
	 
	 
	 
	 
	Observations
	1470
	 
	 
	 
	 
	 
	 
	 
	ANOVA
	 
	 
	 
	 
	 
	 
	 
	 
	 
	df
	SS
	MS
	F
	Significance F
	 
	 
	 
	Regression
	6
	16300.24098
	2716.707
	1176.866
	0
	 
	 
	 
	Residual
	1463
	3377.225683
	2.308425
	 
	 
	 
	 
	 
	Total
	1469
	19677.46667
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	Coefficients
	Standard Error
	t Stat
	P-value
	Lower 95%
	Upper 95%
	Lower 95.0%
	Upper 95.0%
	Intercept
	3.627857474
	1.262195506
	2.874244
	0.004109
	1.151951411
	6.103763536
	1.151951411
	6.103763536
	Female
	-0.825286881
	0.091763043
	-8.99367
	7.22E-19
	-1.005288056
	-0.645285707
	-1.005288056
	-0.645285707
	JobInvolvement
	0.633468834
	0.049207801
	12.87334
	5.28E-36
	0.53694346
	0.729994208
	0.53694346
	0.729994208
	JobSatisfaction
	-0.826316075
	0.367873788
	-2.24619
	0.02484
	-1.547932447
	-0.104699703
	-1.547932447
	-0.104699703
	NumCompaniesWorked
	-0.276258441
	0.018270092
	-15.1208
	4.18E-48
	-0.312096813
	-0.240420069
	-0.312096813
	-0.240420069
	PerformanceRating
	2.877047033
	0.414958503
	6.933337
	6.15E-12
	2.063069905
	3.691024161
	2.063069905
	3.691024161
	Performance rating xJob satisfaction
	0.57910428
	0.119888274
	4.830366
	1.51E-06
	0.343933023
	0.814275538
	0.343933023
	0.814275538
The final model with interaction term between performance rating & job satisfaction is 
PercentSalaryHike = 3.62 -0.82 Female +0.63 JobInvolvement -0.82 Jobsatisfaction-0.27 NumCompaniesworked +2.87 performancerating+0.57 performancerating x job satisfaction.
As the p value associated with interaction term is less than 0.05,so this term is statistically significant.So increase in performance rating affects increase in percent salary in different way depending upon job satisfaction.
Conclusion :
So  final model shows that Female gender,Job involvement,performance rating,job satisfaction,number of companies worked & interaction between performance rating & job satisfaction are important for salary hike.
Females have on average 0.82 % salary hike in comparison to male.1 unit increase in job involvement increases % salary hike by 0.63%.1 unit increase in job satisfaction results in (0.57 performance rating -0.82) % salary hike.so basically job satisfaction results in higher % salary hike in the presence of higher performance rating.similarly higher performance rating results in higher % salary hike in presence of high job satisfaction.
Q4.The employee attrition problem :
We did a logistic regression for the dependent variable Attrition (1=yes,0=No) with all other variables as independent variables.Among these Education,Gender,Job level,Marital status,overtime are categorical variables for which dummy codes were used.Education has 4 levels so 3 dummy codes were created with Education1 being reference level.Gender 1 was female with male being reference level.similarly,job level 1 was reference category with others being dummy coded.marrital status is dummy coded with single being reference & so also overtime where “No” is reference category.
	Variable name
	Coeff value
	odds
	 intercept
	1.706330944
	 
	Age
	-0.030831391
	0.970
	Education2
	0.096060844
	1.101
	Education3
	0.307956911
	1.361
	Education4
	0.227199341
	1.255
	EnvironmentSatisfaction
	-0.367395095
	0.693
	Female
	-0.319340636
	0.727
	JobInvolvement
	-0.175083026
	0.839
	Joblevel2
	-0.961435136
	0.382
	Joblevel3
	-0.319954909
	0.726
	Joblevel4
	-1.682080948
	0.186
	Joblevel5
	-1.08606482
	0.338
	JobSatisfaction
	-0.1699398
	0.844
	Married
	-0.857916847
	0.424
	Divorced
	-1.194120663
	0.303
	NumCompaniesWorked
	0.045318512
	1.046
	OverTime
	1.631282149
	5.110
	PercentSalaryHike
	0.036695796
	1.037
	PerformanceRating
	0.049571141
	1.051
	WorkLifeBalance
	-0.227926049
	0.796
	YearsAtCompany
	-0.00855152
	0.991
	YearsInCurrentRole
	-0.153279842
	0.858
	YearsSinceLastPromotion
	0.142827598
	1.154
In the table above,we have attached coefficient value with odds of each.where the odds are greater than 1,it means higher value of the variable results in greater probability of exit & where odds are lower,the probability of exit is lower.
Different cases :
1. Medium satisfaction with job & work environment,5 years since last promotion,the person has higher probability of leaving as odds is higher (around 2.17 if we plug in the values of medium as 3 & 5 for years since last promotion)
2. Higher number of years in current role & overtime results in higher odds of exit
3. A 45 years old married employee with a very-high level job classification and maintaining a good work-life balance has lower odds of leaving.
The final equation is  log(p/1-p) = 1.7 +0.3 Age +0.09 Education2 +0.307 Education3+0.227 Education4 +0.367 Environmentsatisfaction +.319 female + 0.175 Job involvement +0.961 Job level2 +0.313 Job level3+ 1.68 Job level4 -1.08 job level5 -.169 Job satisfaction+.857 married +1.19 divorce +0.045 Numcompaniesworked +1.63 Overtime +0.036 Percent salary hike +0.049 performance rating+.227 worklife balance-.008 yearsAtCompany+0.153YearsInCurrentRole+0.142 YearsSinceLastPromotion where p is the probability of attrition.
If we plug the individual values of an employee in the above equation,it will give the predicted probability of exit for each employee.

Assignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 10 MIS771 Descriptive Analytics and Visualisation DEPARTMENT OF INFORMATION SYSTEMS AND BUSINESS ANALYTICS DEAKIN BUSINESS SCHOOL...

Answer To: Assignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 10 MIS771 Descriptive...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

Sun	Mon	Tue	Wed	Thu	Fri	Sat
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	1	2	3