Language c++ only can use (If else statment, for (while) loop, array, function,vector) #include#include#include#include#include#include The first file ecoli.fa is a FASTA file which contains the DNA...


Language c++ only can use (If else statment, for (while) loop, array, function,vector)


#include#include#include#include#include#include


The first file ecoli.fa is a FASTA file which contains the DNA sequence data. Here is an excerpt from the file:


>Chromosome dna_rm:chromosome chromosome:ASM584v2:Chromosome:1:4641652:1 REF AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTG GCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAA CGTATTTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCG CAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAAAACATGTCCTGCATGGCATT AGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAA ATGTCGATCGCCATTATGGCCGGCGTATTAGAAGCGCGCGGTCACAACGTTACTGTTATC GATCCGGTCGAAAAACTGCTGGCAGTGGGGCATTACCTCGAATCTACCGTCGATATTGCT GAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTGATCACATGGTGCTGATGGCA GGTTTCACCGCCGGTAATGAAAAAGGCGAACTGGTGGTGCTTGGACGCAACGGTTCCGAC TACTCTGCTGCGGTGCTGGCTGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGAC GTTGACGGGGTCTATACCTGCGACCCGCGTCAGGTGCCCGATGCGAGGTTGTTGAAGTCG ATGTCCTACCAGGAAGCGATGGAGCTTTCCTACTTCGGCGCTAAAGTTCTTCACCCCCGC


The second file in the project folder is a CSV file named codon_table.csv which contains the codon list. Here is an excerpt from the file:


Codon


AA.Abv


td>AA.Code


AA.Name


UUU Phe F Phenylalanine UUC Phe

F


Phenylalanine UUA Leu L Leucine UUG Leu

L


Leucine CUU Leu L Leucine

In the above table, AA.Abv represents the abbreviation of the aminoacid, AA.Code represents the code for the aminoacid and AA.Name represents the actual name of the aminoacid. There are 64 codons in the file. One aminoacid can be represented with multiple codons, they all create the same aminoacid. For example, both UUU and UUC codons are translated as phenylalanine.


Write a function transcribe(dna_string) that creates the mRNA string from the DNA string. Each base in dna_string must be matched to its corresponding mRNA base. There might be strange characters in the DNA string other than A, T, C, G. They should be ignored: A U, T A, G C, C G matchings are the only valid ones.


string transcribe(string dna_string){//this function must take the DNA string and construct a new mRNA string


//then return the mRNA string


}


Write a function translate which accepts the mRNA string as a parameter and creates a string vector of proteins. Each item in the vector is a string that consist of the aminoacid codes of the protein. The function must return the protein vector as a result.


Each protein’s aminoacid sequence starts with M (Methionine) which is the starting aminoacid and ends with a Stop aminoacid. So the function should:


look for mRNA sequences that starts with AUG codon;detect the end (UAG, UGA, or UAA codon);in between, identify the corresponding aminoacids for the codons to construct the protein;


save the protein string in the vector(Use push_back function).


vector translate(string mrna_string) {


//create a protein vector and return it. }


Use the following print and main function to connect the processes and print the resulting protein vector


void print_protein_list(vector list) {


for(string line : list) { cout

cout


int main() {


string dnastring = readFastaFile("ecoli.fa");


readCsvFile("codon_table.csv");string mrnastring = transcribe(dnastring);


vector protein_list = translate(mrnastring);


print_protein_list(protein_list);


return 0; }


The first few lines of the output should look like this:


0 -> MDGTHLILKStop1 -> MKLVISVSRVCLFLMSHVLStop2 -> MVVVVVMVSIATPDCACPLCLFSGVDCHARKKKLVSIAPLLVRSQLQAAMStop3 -> MGYSRYGLHKNGLKTALSGGGSAPRATALTFESSStop4 -> MTIARPAFStop5 -> MELRWQLStop6 -> MRRRHDRRTNARLTTLStop7 -> MDAGRSPRATLQQLQLQDGPSLPRKDEAAISRSGGVVMGVAGQGLGNGLIFMAFRSSWSMRVTTVGTTSAStop 8 -> MRStop9 -> MSStop10 -> MDLDFLPNDLGDRHCLADRStop11 -> MSSLRQVMASPIASQTLTAATATATRRPRStop12 -> MHRRHNGStop


....

Nov 25, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here