42685assignment4-331eojvp (1).docx
Assignment 4 Scope and Methods Correlation
1) You want to study the relationship between salary and job turnover. You gathered data on the average salary within an agency, along with the number of employees who leave on average yearly. Run a Pearson’s r on the data below in SPSS.
Pay
Turnover
45.8
102
44.2
100
67.1
44
44.1
98
36.7
77
28.5
198
29
83
29
123
33
78
34
66
31
188
32
211
55
34
56
122
44
154
34
211
44.5
78
34.12
123
35
133
a) What is the correlation coefficient value? Is pay associated with job turnover? In 2-3 sentences, explain this relationship in simplistic terms.
Correlations
pay
turnover
pay
Pearson Correlation
1
-.522*
Sig. (2-tailed)
.022
N
19
19
turnover
Pearson Correlation
-.522*
1
Sig. (2-tailed)
.022
N
19
19
*. Correlation is significant at the 0.05 level (2-tailed).
Correlation coefficient = -0.522
There is a moderate negative linear relationship between pay and turnover. As the value of pay increases, the value of turnover decreases slightly.
b) Now determine the p-value. Is there a significant relationship between pay and job turnover?
P-value = 0.022
Ho: there is no significant linear relationship between pay and turnover
H1: there is a significant linear relationship between pay and turnover
With p<5%, I reject ho and conclude that there is a significant linear relationship between pay and turnover.
2) Input the following data into Excel to determine whether population, median income, poverty, and federal grants are correlated with one another:
Population
Med
Poverty
Federal Allocation
83429
55.2
14
52
21002
44.9
16
140
12457
39.6
15.2
85.8
1220033
32
18.7
59.1
898
38
14.5
175.3
1342
45
12.4
224
1567
42
13.5
82.5
345213
45
12.8
111.5
8923
36.2
16
156
4544
26.9
23.7
69
109012
61
7.5
950
85323
32.2
20
572
121452
37.7
13.5
334
89012
47.4
10.4
63
92111
35.5
14.2
257
14533
58.2
7.1
247
a) Run a Pearson’s r correlation for each of these four variables. Report the correlation coefficient (r) in a correlation matrix.
Correlations
Population
Med
Poverty
Federal_Allocation
Population
Pearson Correlation
1
-.229
.211
-.138
Sig. (2-tailed)
.393
.434
.609
N
16
16
16
16
Med
Pearson Correlation
-.229
1
-.864**
.330
Sig. (2-tailed)
.393
.000
.211
N
16
16
16
16
Poverty
Pearson Correlation
.211
-.864**
1
-.323
Sig. (2-tailed)
.434
.000
.222
N
16
16
16
16
Federal_Allocation
Pearson Correlation
-.138
.330
-.323
1
Sig. (2-tailed)
.609
.211
.222 N
16
16
16
16
**. Correlation is significant at the 0.01 level (2-tailed).
b) Finally, calculate the p-values for each variable
P-values for various pairs is given below:
(Population, Med) = 0.393
(Population, Poverty) = 0.434
(Population, Federal_Allocation) = 0.609
(Med, Poverty) = 0.000
(Med, Federal_Allocation) = 0.211
(poverty, Federal_Allocation) = 0.222
c) Explain your results.
With p>5%, I can say that there is no significant linear relationship between
(Population, Med), (Population, Poverty), (Population, Federal_Allocation), (Med, Federal_Allocation), and
(poverty, Federal_Allocation).
With p<5%, there is significant linear relationship between Med, Poverty.
3) You want to evaluate citizen’s perceptions of city services using survey data from three variables. Input the following data into Stata:
City_Services
Resident
Miles_from_City
4
4
14
4
4
16
2
3
22
3
2
7
3
2
19
3
4
0.5
2
5
18
1
5
33
2
1
12
3
1
15
5
1
11
5
2
14
4
2
16
3
3
6
5
2
3
Codebook: City services. Citizen’s perception of public service from 1 (poor) to 5(excellent)
Resident: How long has a citizen resided in the city 1= one year or less 2= two years or less 3 = three years or less 4= four years or less 5= more than 4 years
Miles from city: The distance a citizen lives from the town center
a) Run a Kendall’s tau b analysis on the three variables above.
Correlations
City_Services
Resident
Miles_from_City
Kendall's tau_b
City_Services
Correlation Coefficient
1.000
-.289
-.372
Sig. (2-tailed)
.
.186
.072
N
15
15
15
Resident
Correlation Coefficient
-.289
1.000
.222
Sig. (2-tailed)
.186
.
.282
N
15
15
15
Miles_from_City
Correlation Coefficient
-.372
.222
1.000
Sig. (2-tailed)
.072
.282
.
N
15
15
15
b) Now, recode the miles from city variable to reflect the following ordinal values:
0-1 miles = 1
1-5 miles = 2
6-10 miles = 3
11-15 =4
15 or more miles = 5
c) Now, run a Kendall’s tau b on the recoded variables.
Correlations
City_Services
Resident
miles_recoded
Kendall's tau_b
City_Services
Correlation Coefficient
1.000
-.289
-.267
Sig. (2-tailed)
.
.186
.233
N
15
15
15
Resident
Correlation Coefficient
-.289
1.000
.193
Sig. (2-tailed)
.186
.
.386
N
15
15
15
miles_recoded
Correlation Coefficient
-.267
.193
1.000
Sig. (2-tailed)
.233
.386
.
N
15
15
19
d) In your own words, what can you conclude from your analyses? Are there any differences that occurred from recoding?
By recoding miles as a variable measure by the ordinal scale of measurement, the value of Kendall’s tau b becomes more in-significant.
The Spearman correlation coefficient is valid for knowing the relationship for variable measured by the ordinal scale of measurement.
42685assignment4-bf4ap2wj.sav
42685assignment4-yxqj3org.spv
outputViewer0000000000.xml
Output
&[PageTitle]
Page &[Page]
Log
NEW FILE.
DATASET CLOSE DataSet0.
CORRELATIONS
/VARIABLES=pay turnover
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
00000000011_8971747191211481305_notesData.bin
00000000011_8971747191211481305_notes.xml
Contents
Notes
00000000013_6451138774767556825_tableData.bin
00000000013_6451138774767556825_table.xml
Variables Statistics Variables
Correlations
*. Correlation is significant at the 0.05 level (2-tailed).
outputViewer0000000001_heading.xml
Output Correlations Title
Correlations Notes 00000000011_8971747191211481305_notes.xml 00000000011_8971747191211481305_notesData.bin Active Dataset
[DataSet1] Correlations 00000000013_6451138774767556825_table.xml 00000000013_6451138774767556825_tableData.bin
outputViewer0000000002.xml
Output Log
SAVE OUTFILE='C:\TRANSWEB\2019\42685_assignment4.sav'
/COMPRESSED.
CORRELATIONS
/VARIABLES=Population Med Poverty Federal_Allocation
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
00000000031_5126236059390971744_notesData.bin
00000000031_5126236059390971744_notes.xml
Contents
Notes
00000000033_1924176724330549088_tableData.bin
00000000033_1924176724330549088_table.xml
Variables Statistics Variables
Correlations
**. Correlation is significant at the 0.01 level (2-tailed).
outputViewer0000000003_heading.xml
Output Correlations Title
Correlations Notes 00000000031_5126236059390971744_notes.xml 00000000031_5126236059390971744_notesData.bin Active Dataset
[DataSet1] C:\TRANSWEB\2019\42685_assignment4.sav Correlations 00000000033_1924176724330549088_table.xml 00000000033_1924176724330549088_tableData.bin
outputViewer0000000004.xml
Output Log
SAVE OUTFILE='C:\TRANSWEB\2019\42685_assignment4.sav'
/COMPRESSED.
NONPAR CORR
/VARIABLES=City_Services Resident Miles_from_City
/PRINT=KENDALL TWOTAIL NOSIG
/MISSING=PAIRWISE.
00000000051_558741612307571830_notesData.bin
00000000051_558741612307571830_notes.xml
Contents
Notes
a. Based on availability of workspace memory
00000000053_854290337853760630_tableData.bin
00000000053_854290337853760630_table.xml
Variables2 Statistics Variables1 Type
Correlations
outputViewer0000000005_heading.xml
Output Nonparametric Correlations Title
Nonparametric Correlations Notes 00000000051_558741612307571830_notes.xml 00000000051_558741612307571830_notesData.bin Active Dataset
[DataSet1] C:\TRANSWEB\2019\42685_assignment4.sav Correlations 00000000053_854290337853760630_table.xml 00000000053_854290337853760630_tableData.bin
outputViewer0000000006.xml
Output Log
RECODE Miles_from_City (0 thru 1=1) (1 thru 5=2) (6 thru 10=3) (11 thru 15=4) (ELSE=5) INTO miles_recoded.
VARIABLE LABELS miles_recoded 'miles_recoded'.
EXECUTE.
NONPAR CORR
/VARIABLES=City_Services Resident miles_recoded
/PRINT=KENDALL TWOTAIL NOSIG
/MISSING=PAIRWISE.
00000000071_5523678726506939857_notesData.bin
00000000071_5523678726506939857_notes.xml
Contents
Notes
a. Based on availability of workspace memory
00000000073_6674911381253522897_tableData.bin
00000000073_6674911381253522897_table.xml
Variables2 Statistics Variables1 Type
Correlations
outputViewer0000000007_heading.xml
Output Nonparametric Correlations Title
Nonparametric Correlations Notes 00000000071_5523678726506939857_notes.xml 00000000071_5523678726506939857_notesData.bin Active Dataset
[DataSet1] C:\TRANSWEB\2019\42685_assignment4.sav Correlations 00000000073_6674911381253522897_table.xml 00000000073_6674911381253522897_tableData.bin
outputViewer0000000008.xml
Output Log
SAVE OUTFILE='C:\TRANSWEB\2019\42685_assignment4.sav'
/COMPRESSED.
META-INF/MANIFEST.MF
allowPivoting=true
42685assignment5-aylmuc4u.docx
Homework 5 Scope and Methods Linear Regression
1) Input the following data into SPSS:
X (Average job satisfaction)
Y (Job Turnover Rate)
50.0
2.0
38.0
4.25
16.0
58.5
28.5
222.5
25.0
46.75
44.0
194.25
49.0
13.5
38.5
10.0
37.5
8.75
22.5
256.0
44.5
66.75
48.5
12.0
49.5
13.25
47.0
1.25
41.0
206.0
0.0
358.75
49.0
38.25
22.0
41.75
40.5
8.0
33.5
108.0
0.0
13.5
44.0
2.0
50.0
30.75
9.0
0.5
49.5
14.25
a) Run a simple linear regression of the above data. Report the unstandardized beta coefficient and the standard error (note the standard error is found in the output next to each beta coefficient). Interpret your results. Is there a relationship between job satisfaction and job turnover?
Output:
Model Summary
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
1
.395a
.156
.119
92.07665
a. Predictors: (Constant), Average job satisfaction
ANOVAb
Model
Sum of Squares
df
Mean Square
F
Sig.
1
Regression
36027.298
1
36027.298
4.249
.051a
Residual
194996.512
23
8478.109
Total
231023.810
24
a. Predictors: (Constant), Average job satisfaction
b. Dependent Variable: Job Turnover Rate
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
156.010
45.935
3.396
.002
Average job satisfaction
-2.473
1.200
-.395
-2.061
.051
a. Dependent Variable: Job Turnover Rate
Unstandardized beta coefficients: beta0 = 156.010 ; beta1 = -2.473
Standard error: beta0 = 45.935; beta1 = 1.200
Interpretation: Beta0 = The initial job turnover is 156$ when job satisfaction is zero.
Beta1 = With 1 level increase in job satisfaction, the job turnover is increased by 156$.
Ho: there is no relationship between job satisfaction and job turnover. H1; there is a relationship between job satisfaction and job turnover. With F=4.249, p>5%, I fail to reject ho and conclude that there is no relationship between job satisfaction and job turnover.
b) Finally, plot the results on a graph (note: it can be any type of graph).
There is a weak negative linear between average job satisfaction and job turn over.
2) Have SPSS open the “Non-profit.xls” Excel file, then answer the questions below:
Hopecore, a non-profit agency that collects donations by selling autographed memorabilia through the telephone, has hired you to do a study on what factors cause people to donate/buy more products over the phone. You proposed and gathered data for the following model:
Y (donation) = Bo(constant) + B1 (donorstatus) + B2 (onhold) + B3(region) + B4(industry) + e
The coding for the dependent variable (donation) is the total amount spent (in actual dollars) by the donor during the phone call.
a) Estimate the parameters from your model above. From the Excel or Stata output, fill in the corresponding values in the below table based on the above model:
Bo (constant)
SE
3192.0991
109.9870
B1
SE
-138.7919
50.5810
B2
SE
-254.1293
22.0083
B3
SE
27.0411
22.1089
B4
SE
8.7566
31.0067
R-Sq
0.0888
Notes: You can just report the unstandardized beta coefficients.
SE is simply the standard error of each beta coefficient.
b) Now, in the space below, interpret the unstandardized beta coefficients of EACH of the above variables ONLY if they are significant.
With p<5%, I can say that B0, B1, and B2 are significant variables.
For non-regular Donor-status, other as On-hold time, belonging to other Region and Industry give donation 3192$.
A regular donor status give 138$ less donation in comparison to non-regular donor.
With 1 level increase in on-hold, the donation is decreased by254$.
c) Now, write a paragraph or two describing your findings to the executive board at Hopecore (note: they are unfamiliar with regression modeling, so you have to explain it to them in laymen’s terms).
The model is not a good fit for the data as there is only 8% variation in the donations which is explained by Donor status, on-hold, Region, Industry
3) Input the following data:
Couple
Divorce Index
Parents
Income
Children
1
78
2
20.4
0
2
34
0
44.8
1
3
45
0
39.7
0
4
78
2
12.1
1
5
60
2
18.7
1
6
13
0
77.3
1
7
18
0
70.1
0
9
12
0
102.1
0
10
78
2
55
0
11
64
0
57
1
12
45
1
60
1
13
50
1
67
0
14
67
1
45
0
15
80
2
21
1
You want to discover what factors increase a couple’s risk for getting a divorce.
The coding for the following is:
Couple: The ID variable for the couple
Divorce Index: How at risk they are for getting a divorce (0 is most likely to stay together and 100 is most likely to get a divorce) based on the interpretation of a divorce counselor.
Parents : The divorce status of the couple’s parents where: 0 = Both couple’s parents are together 1 = One of the members of the couple has divorced parents 2 = Both couples’ parents are divorced.
Income: The combined family income in thousands.
Children: Whether or not the couple has children 0 = No 1 = Yes
a) In SPSS, run a simple linear regression on the variables: Y = Divorce index and X = Income. Print out or report your results here. Report and interpret the r-squared (the coefficient of determination), the p-value (or the t-ratio), the beta coefficient and standard the error. Interpret each and describe the relationship.
Model Summary
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
1
.780a
.609
.576
16.09800
a. Predictors: (Constant), Income
ANOVAb
Model
Sum of Squares
df
Mean Square
F
Sig.
1
Regression
4835.681
1
4835.681
18.660
.001a
Residual
3109.748
12
259.146
Total
7945.429
13
a. Predictors: (Constant), Income
b. Dependent Variable: Divorce_Index
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
88.515
9.574
9.246
.000
Income
-.749
.173
-.780
-4.320
.001
a. Dependent Variable: Divorce_Index
R^2 = 78%. There is 78% variation in the divorce index which is explained by income.
p-value =.001, the model is significant p<5%.
beta coefficient =-.749, with 1$ increase in income, the divorce index is decreased by -0.749
standard the error = 16.09800, the low value of SE indicates that predictions based in this model are reliable.
b) Now, run a multiple regression with all of the other variables (except the ID variable). Report and interpret each beta value if they are significant. Interpret and describe your results. What can you conclude from your analysis?
Model Summary
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
1
.871a
.759
.686
13.84426
a. Predictors: (Constant), Children, Parents, Income
ANOVAb
Model
Sum of Squares
df
Mean Square
F
Sig.
1
Regression
6028.794
3
2009.598
10.485
.002a
Residual
1916.634
10
191.663
Total
7945.429
13
a. Predictors: (Constant), Children, Parents, Income
b. Dependent Variable: Divorce_Index
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
63.948
17.589
3.636
.005
Income
-.450
.224
-.468
-2.012
.072
Parents
13.329
5.982
.494
2.228
.050
Children
-5.155
7.954
-.108
-.648
.531
a. Dependent Variable: Divorce_Index
Beta2 =13.329, with 1 parent-member increase in divorce, the divorce index is increased by 13.3 units. This value is significant with t=2.228, p<5%.
With t=-2.023, t=-.648 and corresponding p-value>5%, there is no sufficient evidence to conclude that income and children are significant predictors of divorce index.
c) Test the assumptions of the regression model. What did you find? What do you recommend?
The assumption of normality of residuals is valid as normal probability plot is S shaped. The residuals don’t have equality of variances as residual plot doesn’t have randomly distributed points.
42685assignment5-iqkukdr2.sav
42685assignment5-ryucd3l3.spv
outputViewer0000000000.xml
Output
&[PageTitle]
Page &[Page]
Log
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Y
/METHOD=ENTER X.
00000000011_5855537724031510859_notesData.bin
00000000011_5855537724031510859_notes.xml
Contents
Notes
00000000013_5061215339754039627_tableData.bin
00000000013_5061215339754039627_table.xml
Details Model
Variables Entered/Removed b
a. All requested variables entered.
b. Dependent Variable: Job Turnover Rate
00000000014_7310200403672181067_tableData.bin
00000000014_7310200403672181067_table.xml
Statistics Model
Model Summary
a. Predictors: (Constant), Average job satisfaction
00000000015_4373290496673196363_tableData.bin
00000000015_4373290496673196363_table.xml
Statistics Source Model
ANOVA b
a. Predictors: (Constant), Average job satisfaction
b. Dependent Variable: Job Turnover Rate
00000000016_7744516292736723275_tableData.bin
00000000016_7744516292736723275_table.xml
Statistics Variables Model
Coefficients a
a. Dependent Variable: Job Turnover Rate
outputViewer0000000001_heading.xml
Output Regression Title
Regression Notes 00000000011_5855537724031510859_notes.xml 00000000011_5855537724031510859_notesData.bin Active Dataset
[DataSet0] Variables Entered/Removed 00000000013_5061215339754039627_table.xml 00000000013_5061215339754039627_tableData.bin Model Summary 00000000014_7310200403672181067_table.xml 00000000014_7310200403672181067_tableData.bin ANOVA 00000000015_4373290496673196363_table.xml 00000000015_4373290496673196363_tableData.bin Coefficients 00000000016_7744516292736723275_table.xml 00000000016_7744516292736723275_tableData.bin
outputViewer0000000002.xml
Output Log
SAVE OUTFILE='C:\TRANSWEB\2019\42685_assignment5.sav'
/COMPRESSED.
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=X Y MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: X=col(source(s), name("X"))
DATA: Y=col(source(s), name("Y"))
GUIDE: axis(dim(1), label("Average job satisfaction"))
GUIDE: axis(dim(2), label("Job Turnover Rate"))
ELEMENT: point(position(X*Y))
END GPL.
00000000031_4558782506329414429_notesData.bin
00000000031_4558782506329414429_notes.xml
Contents
Notes
...