Attached
11/24/2019 Assignment #2 - click here - CCPS393 310 - Introduction to Unix, C, & C++ - F2019 https://courses.ryerson.ca/d2l/le/content/313352/viewContent/2555135/View 1/4 Assignment #2 - click here CPS393 - Assignment #2 DNA Decoder Assignment Description Write a C program which decodes a sequence of DNA. The alphabet of DNA consists of four letters (bases): A, C, G, and T. Groups of three bases encode for one of 20 amino acids. Use a DNA Codon table such as: https://en.wikipedia.org/wiki/DNA_codon_table Example: the sequence AGT codes for Serine Example: the sequence GCA codes for Alanine. Note that while there are 64 possible 3-base sequences (called codons), but only 20 amino acids, many amino acids will have more than one 3-base sequence. For example, the sequences TTT and TTC both code for Phenylalanine. Key Requirements Your program will read from a file specified as a command-line argument (i.e. do not prompt for file name). Your program will read characters one-at-a-time from the file (probably use the getchar() function - see ihypress sec 2 example 4). Print the corresponding amino acids to standard output. Your program must work with an input file of any length. In particular, your program must not read the entire data file into memory or use temporary files of essentially the same data file. (Such a temporary file would double the storage requirement -- a poor, unnecessary solution.) Because you can identify a codon after reading three characters, do not store more than three characters at a time. A data file could potentially contain billions of bases as in the human genome. For the input: Recognize upper or lower case bases [ACGTacgt] and ignore (skip) all other characters (e.g. space, newline, punctuation, etc.). For the output: Use the standard 1-Letter abbreviation for the amino acid. (Don't make up your own.) Note that the input may have *any* number of bases, not necessarily an exact multiple of 3. If the number of bases in the input stream is not an exact multiple of 3, still include the number of bases in the total. If you encounter a Stop sequence (e.g. TAA), print an asterisk (*) for the amino acid but keep processing until the end of file. For simplicity, include a Stop in the count of total amino acids. Finally, print the total number of bases processed and the number of amino acids decoded. Avoid embedded hard-coded constants within your code (use a #define instead). Make your own data file(s) for your own testing. The instructor/TA will use his/her own data at the time of evaluation, or you will be given instructions on the required data file. I am providing one sample data file as follows: ~lhiraki/open/ccps393/dnasample1.txt CCPS393 310 - Introduction to Unix, C, & C++ - F20… RT https://courses.ryerson.ca/d2l/le/content/313352/navigateContent/288/Previous?pId=2555106 https://courses.ryerson.ca/d2l/le/content/313352/navigateContent/288/Next?pId=2555106 https://en.wikipedia.org/wiki/DNA_codon_table https://www.ihypress.net/programming/c/prog.php?chap=02&pgm=04 https://courses.ryerson.ca/d2l/home/313352 11/24/2019 Assignment #2 - click here - CCPS393 310 - Introduction to Unix, C, & C++ - F2019 https://courses.ryerson.ca/d2l/le/content/313352/viewContent/2555135/View 2/4 If you correctly decode the file, you should see a medical message with a slight misspelling. Sample Executions (not exhaustive): Sample #1 If file yourdata1.txt contains: CGTT AaAAG ./dnadecode yourdata1.txt R*K Total number bases processed: 9 Total number of amino acids decoded: 3 Sample #2 If file yourdata2.txt contains: GGgaA Z TT ./dnadecode yourdata2.txt GN Total number bases processed: 7 Total number of amino acids decoded: 2 Implementation recommendations 1. You may hard-code the DNA Codon Table and Amino Acid abbreviations in the form of a character table. 2. Use a table look-up method to translate the 3-base codon to the corresponding amino acid. 3. Read one character at a time (checking for end-of-file after each), rather than in groups of 3 in case the number of bases in the file is not an exact multiple of 3. Grading Scheme Functionality, accuracy, and completeness (8): correct translation of codons correct total base count (within a few, leniency given here) correct total amino acid count ability to handle input file of any length command line argument support proper program logic (no premature aborts, etc.) no embedded hard-coded constants (e.g. data structure sizing, etc.) Documentation & Style (2): adheres to documentation guidelines (See Content -> Assignments -> Widget Corp Style Guide on course website) self-describing variable names ($result, $operand, etc. NOT $a, $b, $c, etc.) must pass Functionality section (>4/8) in order to count Penalties: wrong translation of codon to amino acid (-3 each) program would not handle input of any length (-3) loads input file into local data structure uses temporary file containing essentially the original data file command-line argument not properly supported/implemented (-2) incorrect letter count at end (-1) / missing altogether (-2) 11/24/2019 Assignment #2 - click here - CCPS393 310 - Introduction to Unix, C, & C++ - F2019 https://courses.ryerson.ca/d2l/le/content/313352/viewContent/2555135/View 3/4 Reflect in ePortfolio Download Print incorrect amino acid count at end (-1) / missing altogether (-2) each hard-coded constant (-1 per instance, max -2) fail to clean up temp files (if used): -1 Bonus: Code readability - indentation (+1) logic block level consistently represented indent level uses identical number of spaces (e.g. 3 spaces for each level, never mix with 2 or 4, etc.) no tabs allowed maximum 1 infraction error to still qualify for bonus must pass Functionality section (>4/8) in order to count If you wish to be considered for the bonus, you must request this at the time of presentation -- "I am applying for the bonus". You will be given a provisional grade at time of Customer Presentation. As a option, you may make corrections and present again once and only once. If you choose to make this second presentation, your assignment mark will be an average of the first and second presentations. (Example: First presentation 4/10; Second presentation (with corrections) 8/10; therefore your assignment mark would be 6/10.) Submission Requirements Part 1: Customer Presentation You must demonstrate your program in the lab on or before the assignment due date. Part 2: Code Submission into Brightspace To submit your code: 1. In this course shell, select Assessment -> Assignments. 2. Click on the assignment you wish to submit, e.g. "Assig #2". 3. Follow the submission instructions from that point. If you make a mistake, you may re-submit again before the deadline without late penalty. Supplementary Notes if working in a group Appoint a Group Leader. List the names and student numbers of all group members in: 1. each source file (at the top as in the Documentation Guidelines) 2. the Comment submission box (at the time of submission) Example: Our group consists of 3 members: Activity Details https://courses.ryerson.ca/d2l/le/content/313352/navigateContent/288/Previous?pId=2555106 https://courses.ryerson.ca/d2l/le/content/313352/navigateContent/288/Next?pId=2555106 javascript:void(0);