CSI 5810 Assignment # 4
1. Let , , and befouritemsforclustering.
Considerthefollowingthreepartitions:
A.
B.
C. .
Determinethepartitionfavoredbythesum-of-square-error(SSE)clustering
criterion.
2. Considerthefollowingeightrecords;eachrecordisdescribedbytwoquantitative
attributes:
A=(2,10)t,B=(2,5)t,C=(8,4)t,D=(5,8)t,E=(7,5)t,F=(6,4)t G=(1,2)t,H=(4,
9)t.
Yourtaskistoapplycompletelinkclusteringtothisdataandproducethe
dendrogram.
3. Inthisexercise,youwillperformk-meansclusteringonwinedata.Youwillrepeat
theclusteringusingthefollowingvaluesofk:2,3,4,and5.Ineachcaseyouwill
determinetheSSEvalueandcalculatethevalueofRandindexandtabulateyour
results.
4. Inthisexercise,youwillbuildalinearpredictivemodeltopredictcrimeratebased
onanumberoffactors.Thedataisinthe“crime-rate”file.Youwillbuildthemodel
bywriting yourownscriptforgradientsearch.Experimentwith2-3learningrates
toseetheeffectoflearningrateonthesearch.
5. Buildamodeltopredictcornyieldwithtwoindependentvariablesfertilizers
andinsecticides.Thedataforthistaskisasfollows.Youwillusethepseudoinverseapproachtobuildthemodel. AlcocalculatetheR-squarecoefficient
toassessmodel’sgoodness.
Corn Fertilizer Insecticides
40 6 4
44 10 4
46 12 5
48 14 7
52 16 9
58 18 12
60 22 14
68 24 20
74 26 21
80 32 24
x1 = (4 5)
t x2 = (14)
t x3 = (01)
t x4 = (5 0)
t
P1 = 1 2 P2 = 3 4 {x ,x }, {x ,x }
P1 = 1 4 P2 = 2 3 {x ,x }, {x ,x }
P1 = 1 2 3 P2 = 4 {x ,x ,x }, {x }
6. Inthisexercise,youwillusethe 250examples oftwoclassesthatweregeneratedin
Q6oftheassignment#3.Eachexamplethere-inhadtwofeatures,x1andx2. You
willaugmentthesefeaturesbythreenewfeaturesdefinedasfollows:featurex3=
x1*x2,featurex4=x1*x1,andfeaturex5=x2*x2.Youwillthentrainandtesta
logisticclassifierwiththeaugmentedvectors.Use80:20ratiotosplityourdatainto
trainingandtestsets.Oncethemodelistrainedandtested,youwillusethemodel
parameterstoplotthedecisionboundaryofthelogisticclassifier