python
Background Information HOMEWORK INSTRUCTIONS Homework #3 DNA Sequencing Problem Introduction to Python Programming West Virginia University Page 1 of 9 Version 4.0 Modified 11/13/2019 Background Information Deoxyribonucleic acid (DNA) is a molecule that carries the genetic instructions used in the growth, development, functioning and reproduction of all known living organisms and many viruses. The two DNA strands composed of simpler units called nucleotides. Each nucleotide is composed of one of four nitrogen- containing nucleobases — cytosine (C), guanine (G), adenine (A), or thymine (T). During DNA transcription, DNA is divided into two sections, coding regions (exons) are separated by noncoding regions (introns). Problem Statement In this assignment, students will investigate how to find introns and exons in DNA sequences. Instructions IMPORTANT: Complete the steps below in the order they are given. Completing the steps out of order may complicate the assignment or result in an incorrect result. 1. Download and extract the provided Data Files ZIP file into the same directory as your source code. It contains the following file for use in this assignment: a. dna.txt – Sample DNA sequence [1]. 2. Create a new Python script named lastname_firstname_hw3_dsp.py. 3. At the beginning of your script, insert a comment block containing the following information: # Firstname Lastname # Date # Computer Science 293B Section YY # Homework #3: DNA Sequencing Problem 4. We wish to read data from the text file. a. Create a function named read_data(). It should take the following parameter: Parameter Name Type Description filename String Name of file to read. Optional parameter with a default value of dna.txt HOMEWORK INSTRUCTIONS Homework #3 DNA Sequencing Problem Introduction to Python Programming West Virginia University Page 2 of 9 Version 4.0 Modified 11/13/2019 b. In the body of the read_data() function, perform the following tasks: i. Use a try/except block to provide error handling while performing file I/O. (1) In the body of the try block, perform the following tasks: (a) Read the entire contents of the text file specified by the filename variable into a list named data. (2) In the event of an IOError exception, perform the following tasks: (a) Print “File not found.”. ii. Return the data variable. 5. We need to calculate the percentage of the DNA sequence that is composed of A or T nucleotides. a. Create a function named get_dna_stats(). It should take the following parameter: Parameter Name Type Description dna_strand String DNA sequence to process. b. In the body of the get_dna_stats() function, perform the following tasks: i. Return the percentage of characters in the dna_strand variable that are As and Ts. You can calculate the A and T nucleotide content using the following formula: ?????? ?? ? ??????? + ?????? ?? ? ??????? ?????ℎ ?? dna_strand 6. We also wish to find the complement of a given DNA sequence. a. Create a function named get_dna_complement(). It should take the following parameter: Parameter Name Type Description dna_strand String DNA sequence to process. b. In the body of the get_dna_complement() function, perform the following tasks: i. Create the following variable: Variable Name Type Initial Value(s) dna_complement String “” HOMEWORK INSTRUCTIONS Homework #3 DNA Sequencing Problem Introduction to Python Programming West Virginia University Page 3 of 9 Version 4.0 Modified 11/13/2019 ii. Use a loop to iterate through each character in the dna_strand variable. Each loop iteration must do the following: (1) Append a new character to the dna_complement variable based on the value of the current iteration’s nucleotide character: DNA Nucleotide DNA Complement A T T A G C C G iii. Return the dna_complement variable. 7. We want to print the DNA sequence and its complement. a. Create a function named print_dna(). It should take the following parameter: Parameter Name Type Description dna_strand String DNA sequence to process. b. In the body of the print_dna() function, perform the following tasks : i. Call dna_complement()and store the result in a variable named dna_complement. Specify the following parameters: Parameter Name Value(s) dna_strand dna_strand ii. Use a loop to simultaneously iterate through the dna_strand and dna_complement lists. Each loop iteration must do the following: (1) Print a statement listing the current DNA strand value and its complement. iii. Iterate through the dna_strand and dna_complement variables and print the nucleotide and its complement separated by an equal sign, following this example: “A = T” . 8. We want to determine the corresponding RNA sequence from a DNA sequence. a. Create a function named get_rna_sequence(). It should take the following parameter: Parameter Name Type Description dna_strand String DNA sequence to process. HOMEWORK INSTRUCTIONS Homework #3 DNA Sequencing Problem Introduction to Python Programming West Virginia University Page 4 of 9 Version 4.0 Modified 11/13/2019 b. In the body of the get_rna_sequence() function, perform the following tasks: i. Create the following variable: Variable Name Type Initial Value(s) rna_complement String "" ii. Use a loop to iterate through each character in the dna_strand variable. Each loop iteration must do the following: (1) Append a new character to the rna_complement variable based on the value of the current iteration’s nucleotide character: DNA Nucleotide DNA Complement A U T A G C C G iii. Return the rna_complement variable. 9. Exons are parts of genes that encode the final RNA sequences [2]. We wish to extract specific exons from larger DNA sequences. a. Create a function named extract_exon(). It should take the following parameter: Parameter Name Type Description dna_strand String DNA sequence to process. start Int Starting position in the DNA sequence. end Int Ending position in the DNA sequence. b. In the body of the extract_exon() function, perform the following task: i. Return a string containing the portion of the dna_strand variable between the start and end positions, inclusive. 10. We want to calculate what percentage of the DNA sequence is compromised of a given exon. a. Create a function named calculate_exon_pctg(). It should take the following parameter: Parameter Name Type Description dna_strand String DNA sequence to process. exons List Exons to compare. HOMEWORK INSTRUCTIONS Homework #3 DNA Sequencing Problem Introduction to Python Programming West Virginia University Page 5 of 9 Version 4.0 Modified 11/13/2019 b. In the body of the calculate_exon_pctg() function, perform the following tasks: i. Create the following variable: Variable Name Type Initial Value(s) exons_length Int 0 ii. Use a loop to iterate through the exons list. Each loop iteration must do the following: (1) Add the length of the current list element to exons_length. iii. Return the percentage of the dna_strand variable that is comprised of elements in the exons list. You can calculate the percentage using the following formula: ?????_?????ℎ ?????ℎ ?? ???_?????? 11. We want to format the DNA sequence with exon regions in uppercase letters and intron (non-exon) regions in lowercase letters. a. Create a function named format_dna(). It should take the following parameter: Parameter Name Type Description dna_strand String DNA sequence to process. b. In the body of the format_dna() function, perform the following tasks: i. Return a string derived from the dna_strand variable where characters in positions 0 to 62 are uppercase, positions 63 to 90 are lowercase, and positions 91 to the end are uppercase. 12. We need to write results to a file. a. Create a function named write_results(). It should take the following parameters: Parameter Name Type Description output List List containing output to write to a file. filename String Name of file to write output to. b. In the body of the write_results() function, perform the following tasks: i. Use a try/except block to provide error handling while performing file I/O. HOMEWORK INSTRUCTIONS Homework #3 DNA Sequencing Problem Introduction to Python Programming West Virginia University Page 6 of 9 Version 4.0 Modified 11/13/2019 (1) In the body of the try block, perform the following tasks: (a) Write the contents of the output list to the file specified by the filename variable, one row per list element. (2) In the event of an IOError exception, perform the following tasks: (a) Print “Error writing file.”. 13. We also wish to create a main function. a. Create a function named main(). It should take no parameters. b. In the body of the main() function, perform the following tasks: i. We wish to read a DNA sequence from the provided data file. Call read_data()and store the result in a variable named dna_sequence. Specify the following parameter: Parameter Name Value(s) filename dna.txt ii. Create the following variable: Variable Name Type Initial Value(s) Output List [] (empty list) iii. Append “The AT content is get_dna_stats()% of the DNA sequence.” to the output list. Call get_dna_stats() and substitute the result where indicated. Format the percentage as a percentage with 1 decimal place. Specify the following parameter: Parameter Name Value(s) dna_strand dna_sequence iv. Append “The DNA complement is get_dna_complement().” to the output list. Call get_dna_complement() and substitute the result where indicated. Specify the following parameter: Parameter Name Value(s) dna_strand dna_sequence HOMEWORK INSTRUCTIONS Homework #3 DNA Sequencing Problem Introduction to Python Programming West Virginia University Page 7 of 9 Version 4.0 Modified 11/13/2019 v. Append “The RNA sequence is get_rna_sequence().” to the output list. Call get_rna_sequence() and substitute the result where indicated. Specify the following parameter: Parameter Name Value(s) dna_strand dna_sequence vi. Call extract_exon() and store the result in a variable named exon1. Specify the following parameters: Parameter Name Value(s) dna_strand dna_sequence Start 0 End 62 vii. Call extract_exon() and store the result in a variable named