Project description
In this project, we will first expand the epidemic model to include a cyclical and random component. Specifically, we will model a cyclical viral escape from immunity and randomly draw the transmission rate parameter of the new viral strain(s). Second, we will write functions to both transcribe and translate DNA. Then, along with some helper function, we will use these to calculate amino acid distances between SARS-CoV-2 strain spikes, as in assignment 08, except this time we will do it for protein (amino acid) sequences, which are the focus of immune response and evasion (and so the real crux of the problem). Finally, we will transcribe the viral sequence, given as DNA, remove alignment marks that were helpful in previous steps, and output a list with sequences of a trivalent vaccine, containing the original strain, and two variants of concern (in "BA.2" and "BA.5" lineages).
You may wish to make use of functions you already wrote for earlier lab assignments. In order to submit your project as one file, copy and paste any code you are reusing inside your submission file for this project before turning it in.
Part 1: Random fluctuation in SIRVD
Until now, our models have only allowed a single bout of the disease and its local spread. Now, using a previoussirvd_sim()
function as the starting point, the objective of the first part of the project is to model the cyclical arrival of new strains.
(a)Beginning with the pieces provided in the template, write a new function namedsirvd_sim
, implementing your SIRVD simulation code similar to Assignment 08, and as illustrated below.
Briefly, we assume that our population has baseline values of the transmission parameter,beta=0.3
, the recovery parametergamma=0.1
, the death rate among infected populationmu=0.01
, and vaccination parameter ofnu=0.1
(be careful with the transliteration of the Greek letters mu and nu!). It should also specify the default time steps to be simulated ast=200
.
The new bit here is a new default argumentevolve=True
, which takes on a boolean value (True/False). It will be used to allow invasion of a newly evolved strain that can evade immunity. The function will now takesixnumeric parameters (four floats, an integer, and a boolean) and still output a list of five lists (of floats; one for each compartment).
Initially, our model population will again contain starting fractions ofs=1-i
,i=0.001
, and0
for all other compartments at time step 0.
The big difference is that, if "evolve=True", in the 200th generation and each subsequent 200th generations (200, 400, 600, …, given thatt
is sufficiently large), the function will return all resistant and vaccinated individuals to susceptible, modeling loss of previously acquired immunity. It will also call another functionnew_strain()
to determine the new beta.
(b)Write a new functionnew_strain
, which will take no input and return a float. Specifically, it will draw a random uniform value between0.0
and0.4
and return it. In yoursirvd_sim
function, this will become the new beta for the next 200 time steps, if the simulation goes that far. Otherwise, thesirvd_sim
will proceed as before, tracking five compartments,s
,i
,r
,v
, andd
.
The included plotting functions are super-useful for checking your outcomes.
(c)Run your newsirvd_sim
function with the following parameter values (below), while checking the previously described default arguments):
sim1a = sirvd_sim(nu=0.3,t=100, evolve=False)
sim1b = sirvd_sim(nu=0.3,t=100)
sim2a = sirvd_sim(nu=0.3,t=1000, evolve=False)
sim3a = sirvd_sim(nu=0.3,t=1000)
sim3b = sirvd_sim(nu=0.3,t=1000)
sim3c = sirvd_sim(nu=0.3,t=1000)
Note that the first two simulations should be identical. Why? Prior to the 200th generation, the function will rely on the same, defaultbeta
values. Only in generationt=201
will this change, after the first call is made tonew_strain
. Explore other parameter values and take stock of ways in which our simulations are realistic or depart from reality. Some parameters sets will yield wild results and are worth playing with.
Part 2: Viral strain divergence and vaccine design
As we saw previously, the initial vaccines targeting SARS-CoV-2 were developed based on the sequence of the first sequenced strain, isolated in Wuhan, China (designated WH Hu-1). For example, both successful mRNA vaccines, Moderna (mRNA-1273) and Pfizer-BioNTech (BNT162b2), consist of the entire sequence of the spike protein, lining the outer surface of a free SARS-CoV-2.
The design of new multivalent vaccines, which include spikes from multiple strains of this viral group, will likely proceed by cyclical inclusion of new strain spikes. Two such strains, variants of concern (BA.2.10.1 from Thailand and BA.5.1.5 from Maine) are again included here.
You are providedread_fasta()
, a function whose utility you are to interpret and a file namedspikes_FA2022_aln.fasta
with three spike RNA sequences: Wuhan Hu-1 and two current variants of concern. The sequences do not have equal lengths and are aligned for you with dashes ("-") inserted to help fill gaps and line up homologous regions, those with similarity due to shared ancestry. As it happens, the Wuhan Hu-1 sequence does not have any gaps.
We will in several layers of short programs that will calculate amino acid distances between SARS-CoV-2 strain spikes, which are the focus of our immune response and viral evolution.
We will also use the viral sequence, given as DNA, remove alignment marks helpful in previous steps, and return a list of three sequences of mRNA (sufficient for trivalent vaccine), containing the original strain, and two variants of concern.
(a)Write a functiontranscribe
, which takes as its input a single string (DNA sequence) and returns the RNA sequence (string), after substituting "T" with "U" characters.
(b)Alter the provided translation RNA-amino acid table,RNA_AA_tab
, to that it can handle gaps in your DNA alignment. The gaps are denoted with "-", so that "---" codons should become "-" amino acids.
(c)Write a function translate, which takes as its input an RNA sequence (string) and the modified translation table
RNAAAtab(to handle gaps/"---") and returns the translated amino acid (protein) sequence. Your function will slice each codon (RNA triplet) in the input sequence to find a matching dictionary item and replace it with a single-letter amino acid. Stop codons (given as
*` in the translation table) should be omitted, not included/returned at the end of the sequence.
(d)Next, write or re-use a function namedhamming
to calculate the Hamming distance (integer) between two strings. It should take two strings as inputs and return the computed number of characters that differ at the same index position between those strings (assumed of equal length).
(e)Write a functionaa_dist
, which takes two dna sequences (strings with "ATGC-"), transcribes, and translates each, and returns their Hamming distance (integer).
(f)Write a functionmake_vax
, which takes a list of arbitrary number of dna sequences, and both removes gaps ("-" characters) and transcribes the sequence to mRNA. It returns a list with gap-less mRNA sequences of the same length as the input list.
(g)Finally, create five global variables with the contents as described here: (i)dna
, with all three spike DNA sequences fromspikes_FA2022_DNA_aln.fasta
, (ii)vaccines
, with gap-less mRNA sequences, (iii) and three variableshu1_ba5
,hu1_ba2
,ba5_ba2
, which are to hold integer values of Hamming distances between the protein sequences of Wuhan Hu-1 vs Maine BA.5, Wuhan Hu-1 vs Thailand BA.2, and Maine BA.5 vs Thailand BA.2, respectively.
-------------------
Your sequences stored in the variablevaccines
(hopefully) are very nearly the contents of the current "bivalent" vaccine.
The spike protein is a homotrimer--it consists of three identical subunits (monomers). It initially locks into an energetically unfavorable conformation, on the outside of the coronavirus. However, when the receptor-binding subunit engages animal ACE2, its structure changes profoundly, allowing fusion and infection. A solid association between the subunits would not allow receptor binding and virus-cell fusion. Virus evolution creates a "sweet spot," a trimer with just the right degree of stability. Unfortunately, in doing so, evolution also creates substantial problems for the designers of vaccines: these initial, wobbly spike proteins are a poor target for our immune system antibodies. We would potentially generate antibodies to the wrong protein conformation.
Consequently, vaccine design of both mRNA-1273 and BNT162b2 includes a pair of mutations in Wuhan Hu-1 strain: K986P/V987P. Two prolines are added at positions 986 and 987 (instead of a lysine and a valine), which freezes the spike in its initial shape, providing a solid target for our antibodies. That is the sole difference between your sequences and those in vaccine circulation.