ORF analysis
Practical 3 (Online delivery): Open reading frame sequence analysis and PCR primer design Functional proteins are encoded from small sections of DNA called genes. These sections of DNA are first transcribed into messenger RNA (mRNA). Functional proteins are translated from these mRNA molecules. Proteins, as we know, are made up of amino acids. We can determine the sequence of amino acids in the final protein by investigating the section of DNA that that encodes this protein. An open reading frame is a section of DNA, more appropriately a section of mRNA, from the start codon to the stop codon that encodes a protein without being interrupted by stop codons which will make it too short a protein (hence called ‘open’). You have already been given a DNA sequence (the ORF number) for this practical exercise. Using that, you will need to access the DNA sequence from NCBI (Genbank) and identify the longest open reading frame. You then need to identify the properties of the putative (predicted/potential) protein that it encodes, and design a pair of primers for the amplification of the gene. ORF sequence allocation: Orf116 TTGAAAACCACGACAAGCTCACAAAAGCCATCTCGTTCCATTCCAGCTCAGGCGAGCAGCGACCACCTTCCG GCGAGGGTAGCAGCGATGGAGGCCAAGGACCGGGAGGACGTGCGGCTCGGAGCGAACAAGTACTCGGAGCGG CAGCCCATCGGCACGGCGGCGCAGGGGTCCGAGGACGCCGACTACAAGGAGCCCCCGCCGCCGCCGCCCTTC GAGCCCGGCGAGCTCCCGTCCTGGTCCTTCTACCGCGCCGGCATCGCCGAGTTCGTGGCCACCTTCTTCCTC TACGTCACCATCCTCACCGTCCTGGGCTACAGCGGCGCCAGGTCCAAGTGCGCCACCGTCGGCATCCAGGGC ATCGCCTGGTCCTTCGGCGGCACGATCTTCGTCGTCTACTGCCCCGCCGGCATCTCCGGCGGGCACATCAAC CCGGCGGTGACCTTCGGGCTGTTCCTGGCGAGGAAGCTGTCGCTGACCAGGGCGGTGTTCTACATCGTCACC CAGTGCCTGGGCGCCATCTGCGGCGCTGGCGTGGTGAAGGGGTTCCAGCAGCCCCTGTACATGGGCAACGGC GGCGGCGCCAACGTGGTGGCGCCCGGCTACACCAAGGGCTCCGGCCTCGGCGCCGAGATCATCGGCACCTTT GTCCCCGTCTACACCGTCTTCTCCGCCACCGACGCCAAGAGGAACGCCAGGGACTCCCACGTTATCCTCGCC CCGCTGCCCATCGGGTTCGCCGTGTTCCTGGTCCACCTGCCCACCATCCCCATCACCGGCACCGGCATCAAC CCGGCGAGGAGCCTCGGCGCGGCCATCATCTACAACAGGGAGCACGCCTGGTCAGACCACTGGATCTTCTGG GTCGGCCCCTTCATCGGCGCCGCGCTGGCCGCCGTGTACCACCAGCCGGTCATCAGAGCGATCCCGCCCAAG ACCAAGTCCTAAGCCGCTCCTGCTGCTGCAAAGAAGATGCCAGCCCAAACCGAAGAGCAGGCTGCGTGTTCT GAATTTCTGATGGGGCGGCTATCTCTTCCCATCATCATCCGTGTTACTACCGTGGAATTCCAATTTGTCTTC CAAGTTTGATGTACTAGTCTTGCTCTGTATACACCTGAACCTGGACCCAGACTTGTATGTAATCTCACCAGT ACTCAGTTGTGTGTGCAAATCCAATCCAATCAAGTTTATTCAGGACTT A. Finding the longest ORF and functional protein 1. Click on the following link: https://www.ncbi.nlm.nih.gov/orffinder/ (please note that other software can be used to find the ORF; if used, they should be accompanied with correct url and references) 2. Paste your sequence of bases into the Enter Query Sequence window (leave other parameters blank) 3. Set the following parameters in the Choose Search Parameters window Minimal ORF length (nt): 75 Genetic code: 1. Standard ORF start codon to use: “ATG” only Untick Ignore nested ORFs 4. Click Submit to find the ORF 5. You will be directed to the Open Reading Frame Viewer where you will see an image with all the ORFs in your DNA sequence (top), a list of these ORFs (bottom right) and the sequence of the selected ORF in the display window (bottom left) 6. Click on longest ORF in the table and note down the strand, frame, start and stop codon details and the length of ORF in terms of nucleotides and amino acids from the table 7. The sequence of the ORF will be displayed (as protein sequence) in the window on the left. Copy this sequence for your report As Protein sequence for report: >lcl|ORF1 MEAKDREDVRLGANKYSERQPIGTAAQGSEDADYKEPPPPPPFEPGELPS WSFYRAGIAEFVATFFLYVTILTVLGYSGARSKCATVGIQGIAWSFGGTI FVVYCPAGISGGHINPAVTFGLFLARKLSLTRAVFYIVTQCLGAICGAGV VKGFQQPLYMGNGGGANVVAPGYTKGSGLGAEIIGTFVPVYTVFSATDAK RNARDSHVILAPLPIGFAVFLVHLPTIPITGTGINPARSLGAAIIYNREH AWSDHWIFWVGPFIGAALAAVYHQPVIRAIPPKTKS 8. Click the Display ORF as… and select Nucleotide Sequence. Copy this sequence for your report. As Nucleotide sequence for report: >lcl|ORF1 CDS ATGGAGGCCAAGGACCGGGAGGACGTGCGGCTCGGAGCGAACAAGTACTC GGAGCGGCAGCCCATCGGCACGGCGGCGCAGGGGTCCGAGGACGCCGACT ACAAGGAGCCCCCGCCGCCGCCGCCCTTCGAGCCCGGCGAGCTCCCGTCC TGGTCCTTCTACCGCGCCGGCATCGCCGAGTTCGTGGCCACCTTCTTCCT CTACGTCACCATCCTCACCGTCCTGGGCTACAGCGGCGCCAGGTCCAAGT GCGCCACCGTCGGCATCCAGGGCATCGCCTGGTCCTTCGGCGGCACGATC TTCGTCGTCTACTGCCCCGCCGGCATCTCCGGCGGGCACATCAACCCGGC GGTGACCTTCGGGCTGTTCCTGGCGAGGAAGCTGTCGCTGACCAGGGCGG TGTTCTACATCGTCACCCAGTGCCTGGGCGCCATCTGCGGCGCTGGCGTG GTGAAGGGGTTCCAGCAGCCCCTGTACATGGGCAACGGCGGCGGCGCCAA CGTGGTGGCGCCCGGCTACACCAAGGGCTCCGGCCTCGGCGCCGAGATCA TCGGCACCTTTGTCCCCGTCTACACCGTCTTCTCCGCCACCGACGCCAAG AGGAACGCCAGGGACTCCCACGTTATCCTCGCCCCGCTGCCCATCGGGTT CGCCGTGTTCCTGGTCCACCTGCCCACCATCCCCATCACCGGCACCGGCA TCAACCCGGCGAGGAGCCTCGGCGCGGCCATCATCTACAACAGGGAGCAC GCCTGGTCAGACCACTGGATCTTCTGGGTCGGCCCCTTCATCGGCGCCGC GCTGGCCGCCGTGTACCACCAGCCGGTCATCAGAGCGATCCCGCCCAAGA CCAAGTCCTAA 9. Select the Non-redundant protein sequences (nr) as the Blast database and click SmartBLAST ORF 10. When the SmartBLAST search is complete, you will be directed to a SmartBLAST summary window 11. Right click on the NCBI Tree in the Summary window and Sort by Distance 12. Right Click on the updated NCBI Tree and click Full View. This will open a new tab NCBI Tree Viewer Example 12. Right Click on the updated NCBI Tree and click Full View. This will open a new tab NCBI Tree Viewer Example. 13. Click on Tools (top right corner) and Download as a PDF file. Attach this PDF to your report and comment on this tree in your Discussion 14. Go back to the SmartBLAST summary window. Inspect the closest hit at the top ofthe list of Best hits (it might seem identical but there will be differences). Make sure that the nearest hit is a natural protein and not an artificial construct (e.g. an engineered sequence – often in a plasmid). If the match to the data base entry is not close (>90%) then you may have to go back to the ORF finder and select the next longest open reading frame. If you have found the correct ORF it will translate to a protein with amino acid sequence identity to a data base entry in the 90-100% range 15. Click on this closest hit and you will see the alignments of this and other hits in the list. 16. When you have found an alignment for your amino acid sequence, take a screenshot of the alignment with the matches in the data base and include this in your report. 17. Summarise in words the difference between your protein and the data base entry 18. Click on the Sequence ID and cut and paste information into your document to include the following: species, locus, definition, accession number and source organism 19. Give a function (a physiological function) and description of the protein 20. When you have an open reading frame you should identify the start and stop codons in your DNA sequence, (which you will need for the designing of primers later in part D below). You should have the nucleotide sequence from step 8. B. Determination of molecular weight, isoelectric point and hydropathy plot of the putative (predicted protein) 1. The MW and isoelectric point can be obtained by pasting the amino acid sequence at: http://www.expasy.ch/tools/pi_tool.html pI and Mw: 4.99 / 72293.30 Protein/amino acid sequence 2. The hydropathy plot (Kyte and Doolittle) can be obtained by pasting the amino acid sequence at: http://au.expasy.org/tools/protscale.html 3. Comment on the isoelectric point and hydropathy plot in your description of the protein (below). Protein/amino acid sequence Hydrophobic Hydrophilic http://au.expasy.org/tools/protscale.html C. Generation of codon usage for the open reading frame 1. A table of codon usage for your open reading frame can be generated: http://www.kazusa.or.jp/codon/countcodon.html 2. Paste in the DNA sequence for just the ORF (from start to stop codons) Nucleotide sequence from Part A step 8 http://www.kazusa.or.jp/codon/countcodon.html • Go to homepage and search the hit you have got using Smart Blast. 3. Comment/Compare on the codon usage: is it typical of animals, plants, microbes etc. or is it unusual? D. Designing of a pair of primers to flank the coding sequence of the above gene You will need the nucleotide sequence (ORF) encoding your protein (Part A above), if you wished/needed to amplify this gene section by PCR. The primers should flank the coding sequence, hence you must have first identified the start and stop codons in the sequence. Please follow the guidelines for primer design given below. You will be assessed on the quality and likely success of your PCR based on these criteria. • GC Clamp: The presence of G or C bases within the last five bases from the 3' end of primers (GC clamp) helps promote specific binding at the 3' end • Repeats and Runs increase the likelihood of false priming • Secondary structure of primers - Hairpins (Should be avoided) - Dimers(Self-Dimers, Cross-Dimers): Decrease product yield, problematic at 3’ end Please refer to Dr Rohan Shah’s lecture notes (on Canvas) for more details. ▪ The primer pair (one forward, one reverse) must flank, and be close to, the ORF, but doesn’t have to be exactly adjacent to the start or stop codons: you can shift them a few bases 5’ or 3’ around the ORF to address the following criteria. ▪ Primer length: 15-30 bases each. ▪ Base composition: can be 40-60% G + C content of each primer. ▪ Less than 5°C difference in the melting temperatures (Tm) of the two primers in a pair. Obviously, the Tm of either primer should not exceed the optimal temp for Taq polymerase, or the denaturation temp of PCR! ▪ 3’ end of each one should be a G or a C ▪ Secondary structures should be avoided, especially at their 3’ ends. ▪ To make a start, select a tentative primer sequence around the ORF and use the computer tool given below to test the quality of your primers as you go. E. Analysis of primer properties You are required to test your primers as you are designing them, using the online tool called OligoANalyzer3.1. This can be accessed using this link: https://sg.idtdna.com/calc/analyzer. It will ask you to create an account which you can use your student email to register and then sign in. Once you are in the OligoAnalyzer page, click on the instruction button on the right (above analyze orange button) To familiarise yourself with this online tool. Once you have familiarise yourself with the software and its functions test your design upstream and downstream primers. Interpret all of the output results to your design primer sequences. You will most likely have to experiment a little (on the computer!) with your sequences to achieve a good set of primers. try extending/truncating the primers within the given range (15- 30 bases) and repeat the analysis. As you change the primers sequences to get the best pair, record the output of the analysis of the trial primers in a table similar to the following. Please provide the details of at least two examples for the forward primer trials and two for the reverse primer trials, other than the final primers. https://sg.idtdna.com/calc/analyzer Primer name Sense or Antisense Sequence (5’ to 3’) Tm (°C) Length (bp) Other primer in the pair Secondary structures Trial Fwd 1 Trial Rev 1 Trial Fwd 2 Trial Rev 2 Final Fwd Final Rev Once you have selected your primers that address the criteria given in the previous section, print out the reports for each of your selected upstream and downstream primers for your submission. You are required to submit these reports for your final upstream and downstream primer selection Example. Reverse and complement DNA sequence: http://reverse-complement.com Start codon Fwd Primer Stop codon Rev Primer Fwd. sequence TTGAAAACCACGACAAGCTCACAAAAGCCATCTCGTTCCATTCCAGCTCAGGCGAGCAGCGACCACCTTCCG