.
Download AB125166 and X68309 from the NCBI nucleotide
databases (see Appendix B for the URL). Edit both sequences so that they
contain the first 1000 positions in FASTA format. Then perform a Smith-
Waterman local alignment using resources at http://www.cmb.usc.edu/, setting
mismatch and gap parameters at 1000, and requesting return of the top
100 alignments.
a. How many alignments are there of length t ≥ 8?
b. Use the expression for λ in Section 7.4.2 to compute the expected number
of alignments of length at least 8 for sequences of this size (see Exercise 4
for computing p).
c. Use R to simulate ten pairs I and J of iid sequences having the same base
compositions as in the first 1000 nucleotides of AB125166 and X68309.
Then perform the Smith-Waterman alignment on each of the 10 pairs,
and calculate the average number of alignments of length at least 8. Compare
your result with those from part a and part b above, and explain
agreements or disagreements.
d. The probability that there is a local alignment of length t or more is
approximately
1 − exp[−nm(1 − p)pt] .
Calculate the probability (called a p-value) for t = optimal alignment
score in part a. What do you conclude from this p-value? Explain your
answer carefully.