It is application of linear algebra assignment.
8.6. EXERCISES 9 the sum is zero, meaning there are no outbound links, then it assigns each entry in the column to be the same and add to 1, which pretends that the page is linked to every other page equally. Finally we assign the matrix 0.15/pagecount*part1 + 0.85*part2 to return. 8.6 Exercises Exercise 8.1. Consider this mini-internet: P1 P2 (a) Try to rank the pages in order of importance without doing any calculation. (b) Find the pagerank of each of the pages. (c) If there are any disparities between your answer to (a) and (b) explain (if you can) what the cause of this disparity might be. Exercise 8.2. Consider this mini-internet: P1 P2 P3 P4 P5 P6 (a) Try to rank the pages in order of importance without doing any calculation. (b) Find the pagerank of each of the pages. (c) If there are any disparities between your answer to (a) and (b) explain (if you can) what the cause of this disparity might be. Exercise 8.3. Suppose that the rule that there is a 15% probability that RW jumps to a random page were removed, and instead the full 100% (instead of 85%) from each node were distributed across all outbound links. If there are no outbound links then there is still a 100% probability that RW jumps to a random page. This could lead to the possiblility that T is not regular. (a) Give an example of an internet with a non-regular T such that there is no x̄∗ such that T kx̄0 converges to x̄∗ for all x̄0. 10 CHAPTER 8. GOOGLE PAGERANK (b) Give an example of an internet with a non-regular T such that there is some x̄∗ such that T kx̄0 converges to x̄∗ for all x̄0 but that this causes serious problems with the ranking of the pages. Hint: Can you design an internet where some of the pages will end up with a pagerank of zero? Exercise 8.4. The Google Pagerank algorithm works even if the internet is disconnected. Consider this example: P1 P2 P3 P4 P5 P6 Find the pagerank of each of the pages. Exercise 8.5. Given the transition matrix T we generally find the pagerank vector by solving the eigenvector equation (A − I)x̄ = 0̄, meaning we find the eigenvector corresponding to the eigenvalue λ = 1. However it’s also possible simply to pick an arbitrary x̄0 and then find T kx̄0 for successive values of k until successive entries of T kx̄0 all differ by a predetermined number. Starting with x̄0 = [1, 0, 0, 0, 0, 0]T and using the T from the previous problem, find the smallest k so that all entries in T kx̄0 and T k−1x̄0 are equal up to and including the fourth decimal place. Exercise 8.6. In an internet with n pages all of which link to one another it makes sense that all of the pages have the same pagerank. Show that this is the case - find the matrix T and show that the vector with all entries equal (and adding to 1) is an eigenvector corresponding to eigenvalue λ = 1. Exercise 8.7. A valid question is whether the 85/15 split has an impact not just on the pagerank but on the order of the pages in terms of ranking. For example if we used 60/40 instead would a higher ranked page using 85/15 still be higher ranked using 60/40. (a) Justify informally why it seems reasonable that the order of the pages in terms of ranking would not be affected. (b) Test this assumption on the following internet by finding the pageranks using both 85/15 and 60/40 splits. 8.6. EXERCISES 11 P1 P2 P3 (c) The above internet can also be analyzed with a 1/0 split. Find the pageranks using this split. Exercise 8.8. Find the pagerank of the pages in the following internet. You will definitely want to use technology for this! P1 P2P3 P4 P5 P6 P7 P8 P9 P10 Exercise 8.9. Explain why having outbound links on your webpage will not affect your Google Pagerank. Exercise 8.10. Write down the transition matrix for an internet with n pages for which the only links are from page 1 to page 2, page 2 to page 3, ... page n− 1 to page n. Exercise 8.11. Why does the Google Pagerank produce more reasonable re- sults than simply assigning a page a ranking in accordance with the number of pages that link to it? 10 CHAPTER 9. SINGULAR VALUE DECOMPOSITION >> A=[ 1 2 0 3 1 4 1 1 -1 0 1 2 ] A = 1 2 0 3 1 4 1 1 -1 0 1 2 >> [U,S,V] = svd(A) U = -0.2079 0.7246 -0.3584 -0.5507 -0.9171 -0.1139 0.3788 -0.0501 -0.0103 0.6615 0.2666 0.7009 -0.3400 -0.1560 -0.8106 0.4506 S = 5.5200 0 0 0 2.5538 0 0 0 1.4170 0 0 0 V = -0.5380 0.4090 0.7371 -0.3049 0.7208 -0.6225 -0.7859 -0.5596 -0.2631 9.6 Exercises Exercise 9.1. Find the singular value decomposition of each of the following matrices. First do this by computing both AAT and ATA, finding the eigen- value/eigenvector pairs of each, finding the corresponding singular values, and putting the results together. Then check your answer via technology. (a) A = ⎡ ⎢ ⎢ ⎣ 1.0 2.0 −3.0 0 1.0 1.0 1.0 2.0 5.0 −1.0 0 2.0 ⎤ ⎥ ⎥ ⎦ (b) A = ⎡ ⎣ −1.0 0 2.0 2.0 2.0 0 2.0 3.0 0 1.0 1.0 2.0 −2.0 1.0 2.0 ⎤ ⎦ 10.8. EXERCISES 15 >> A=[ 1 2 0 3 1 4 1 1 -1 0 1 2 ] A = 1 2 0 3 1 4 1 1 -1 0 1 2 >> [U,S,V] = svd(A) U = -0.2079 0.7246 -0.3584 -0.5507 -0.9171 -0.1139 0.3788 -0.0501 -0.0103 0.6615 0.2666 0.7009 -0.3400 -0.1560 -0.8106 0.4506 S = 5.5200 0 0 0 2.5538 0 0 0 1.4170 0 0 0 V = -0.5380 0.4090 0.7371 -0.3049 0.7208 -0.6225 -0.7859 -0.5596 -0.2631 >> SP=S;SP(3,3)=0; >> U*SP*transpose(V) ans = 1.3743 1.6839 -0.1336 2.6044 1.3341 4.1412 0.7216 1.2351 -0.9006 0.8466 0.2851 1.6979 10.8 Exercises Exercise 10.1. Find the (vector) direction in which the following set of points have the most variance, then plot the points and the corresponding line. (1, 2), (4, 3), (−1,−3), (−5,−8) Exercise 10.2. Find the (vector) direction in which the following set of points 16 CHAPTER 10. MATRIX (DATA) APPROXIMATION have the most variance, then plot the points and the corresponding line. (1,−1), (3,−2), (6,−4), (10,−4) Exercise 10.3. For each of the following matrices find the SVD and identify the direction in which the column data is most spread out, as well as the variance and proportion of total variance in that direction. (a) A = ⎡ ⎣ 5.0 6.0 5.0 5.0 6.0 6.0 5.0 5.0 6.0 5.0 5.0 6.0 5.0 6.0 5.0 ⎤ ⎦ (b) A = ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ 0 1.0 1.0 2.0 2.0 1.0 2.0 2.0 0 1.0 7.0 8.0 8.0 8.0 9.0 8.0 9.0 9.0 8.0 7.0 7.0 7.0 7.0 9.0 7.0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ (c) A = ⎡ ⎢ ⎢ ⎣ 4.0 5.0 0 0 0 5.0 5.0 0 0 0 0 1.0 7.0 8.0 8.0 0 0 8.0 7.0 7.0 ⎤ ⎥ ⎥ ⎦ (d) A = ⎡ ⎢ ⎢ ⎣ 2.0 3.0 0 0 0 0 0 0 1.0 0 0 0 3.0 3.0 0 0 0 0 0 0 5.0 5.0 5.0 5.0 ⎤ ⎥ ⎥ ⎦ Exercise 10.4. Given the matrix A = ⎡ ⎢ ⎢ ⎣ 2.0 3.0 0 0 0 0 0 0 1.0 0 0 0 3.0 3.0 0 0 0 0 0 0 5.0 5.0 5.0 5.0 ⎤ ⎥ ⎥ ⎦ (a) Find the singular value decomposition of A. (b) Find an approximation A′ to A preserving the largest two singular values only. What proportion of the total variance is preserved? Exercise 10.5. Given the matrix A = ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ 1 2 −1 0 5 0 1 1 −1 6 −1 0 −2 1 4 2 2 1 2 −5 2 1 0 0 −5 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ (a) Find the singular value decomposition of A. 10.8. EXERCISES 17 (b) Find an approximation A′ to A preserving the largest three singular values only. What proportion of the total variance is preserved? Exercise 10.6. Find an approximation to the following matrix which preserves at least 95% of the total variance using as few singular values as possible: A = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0 1 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 0 1 1 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ Exercise 10.7. Consider the points stored in the columns of this matrix: ⎡ ⎣ 2.0 −0.1 5.9 2.3 2.9 −5.1 1.9 3.6 3.2 4.8 −2.2 5.5 0.9 −1.7 3.0 3.6 17.0 13.0 1.5 21.0 3.7 −18.0 15.0 15.0 ⎤ ⎦ These points mostly lie close to a plane through the origin. (a) Find the equation of this plane by finding the two directions with the most variance, finding a normal vector for the plane, then finding the equation. (b) Using this equation predict which z-value would correspond to x = 10 and y = −3. Exercise 10.8. Consider the points stored in the columns of this matrix: ⎡ ⎣ 1.9 0.9 2.6 6.1 3.4 −0.5 4.5 6.0 5.3 11.0 3.9 7.0 8.2 9.2 12.0 2.6 −0.5 1.6 −1.6 −13.0 −10.0 3.3 −18.0 −8.8 ⎤ ⎦ These points mostly lie close to a plane but not through the