rough draft paper
BCHS3201: Microarray Paper Background You will be working with data generated using Affymetrix Arabidopsis thaliana (ATH1) full genome chips. Please watch the microarray lecture posted in Blackboard for information on how the chips are constructed and how they are used. Step-by-step instructions are provided here for managing the data. While I have provided details here, keep in mind that in a real research lab, you would have to decide for yourself how to organize the data and make sense of it. Arabidopsis thaliana Arabidopsis thaliana is a small, flowering plant found all over the world. It is commonly considered a weed in the United States and can be found in the Midwest (Texas is too hot; the plant likes temperatures around 68°F). Arabidopsis serves as a model plant because it has a number of characteristics that make it amenable to study. The plant is small, reaching only 30 cm in height when full grown. It grows well grows well in both soil and nutrient media making it easier to develop carefully controlled studies (Meyerowitz, 1989). It is easily grown indoors in a laboratory. Crop plants require much larger facilities and land to study. The life cycle of Arabidoposis is only 6 weeks from seed to seed-producing. This allows a much faster pace for experiments than most crop plants where only one generation of plants can be grown in a calendar year (unless your university is fortunate enough to have land on two hemispheres so you can get two growing seasons in). Arabidopsis plants produce thousands of seeds per plant and these seeds are tiny making them easy to store in microcentrifuge tubes in the freezer (Meyerowitz, 1989). Arabidopsis has a haploid genome of 5 chromosomes consisting of approximately 125 megabases (The Arabidopsis Genome Initiative, 2000). This is a very small genome compared to that of crop species. Maize, for example, is around 2,500 megabases in size (Adam, 2000). Most genes in Arabidopsis exist at a single locus in the genome. Crop plant genomes are large in part because their genomes contain large sections that are duplicated. This makes creating complete knock-outs of a particular gene difficult. Arabidopsis is amenable to genetic manipulations either through traditional cross-breeding techniques or more modern genetic modification techniques (mutation through T-DNA inserts, chemical agents, or CRISPR-CAS9). Studies conducted in Arabidopsis are often directly transferable to crop species as many of the genes have homologues in crop plants. Studying them first in Arabidopsis is easier, cheaper, and faster. Sugar and Phytohormone Signaling Pathways Sugars have a role in basic plant metabolism as a carbon source and also play a role as signaling molecules, contributing to the regulation of a number of pathways in plants. The expression of genes involved in mobilization of starch and lipid reserves is usually repressed by the presence of high sugar levels in the plant while genes involved in storage of carbohydrates are upregulated (Jang & Sheen, 1997; Yu, 1999). Soluble sugar levels in plants also play a role in a number of developmental processes including time to flowering (Bernier et al., 1993), shoot to root ratios (Wilson, 1988), and senescence (cells stop dividing and normal biological processes begin to deteriorate) (Dai et al., 1999). The DNA chip data you will be analyzing for class is part of a larger study to elucidate the full impact of sugar signaling in Arabidopsis and to identify potential components of signaling pathways for future study. Phytohormones are involved in a wide array of plant responses. The plant phytohormones ethylene and abscisic acid are also intertwined with the sugar response signaling pathways. Ethylene plays a role in a plant’s development as well as its response to environmental conditions. Ethylene has a role in shoot and root elongation, sex determination, petal senescence, and fruit ripening. It also is involved in the plant’s response to flooding and pathogens. Abscisic acid is involved in preventing pre-mature germination of seeds, root elongation, and stomatal closure. Stomata are pores in the leaf epidermis which control the rate of gas exchange. The pore is surrounded by two bean-shaped guard cells that regulate the size of the pore opening. Abscisic acid plays a critical role in the closure of the guard cells. Plants with mutations in the abscisic acid biosynthesis pathway have a “wilty” phenotype because they are unable to close their stomata during the day when loss of water to evaporative processes is high. The mutant, aba2, has been found to allelic to the glucose insensitive 1 (gin1) mutant (meaning the mutation for both aba2 and gin1 lie in the same gene). Signaling pathways often work together to fine-tune plant development and responses. Seed germination, for example is finely controlled by antagonist interactions between sugar and abscisic acid which inhibit germination and gibberellin and ethylene which promote germination (figure 1). Figure 1. Seed germination is controlled by a combination of signals from sugar levels, abscisic acid, gibberellin, and ethylene. The sugar-insensitive 6 (sis6) mutant is slightly resistant to the inhibitory effects of abscisic acid on germination (Pattison, 2004). When seeds are grown in a petri plate with nutrient medium supplemented with abscisic acid, germination is delayed in wild-type plants. The sugar-insensitive 3 (sis3) mutant is slightly resistant to the effect of abscisic acid in comparison to wild-type (Columbia ecotype) seeds. The abscisic acid insensitive 4-1 (abi4-1) mutant displays precocious seed germination in the presence of abscisic acid, germinating despite the presence of exogenous ABA which should significantly delay germination (figure 2). Figure 2. The sis6 mutant is insensitive to the inhibitory effects of ABA on germination. Seeds were sown on the indicated media and grown in continuous white fluorescent light. Germination was scored every 12 hours for four days and then every 24 hours thereafter. Error bar represent the mean ± standard deviation (n=3). This experiment was conducted three times with similar results. From Pattison, 2004. How the Data was Collected for this set of Experiments In order to conduct a chip experiment, RNA must be collected from the samples. In our experiments, wild-type Arabidopsis seeds (ecotype Columbia) were surface sterilized, cold treated at 4° C in the dark for three days and then plated on Nytex mesh screens placed in petri dishes containing minimal nutrient media. After 20 hour under continuous light at 21° the nytex meshes were transferred to plates containing minimal media supplemented with 100 mM sorbitol (control) or 100 mM glucose. Seeds were grown on the new media for 12.5 hours and then frozen in liquid nitrogen. RNA was extracted using a phenol/chloroform extraction (Verwoerd et al., 1989). RNA samples were sent to the Molecular Genomics Core Facility at the University of Texas Medical Branch in Galveston for processing. Control versus Experimental Conditions Minimal media is a basic growth media. For experiments utilizing glucose, the sugar itself creates osmotic stress on the plant. To differentiate between the impact of glucose versus the impact of osmotic stress, sorbitol is used as the control. Sorbitol is a sugar-alcohol which is not metabolized by the plant. It should mimic the osmotic stress created by the glucose but not impact sugar-regulated metabolism or signaling to any great extent. Part 1. Identifying differences in gene regulation between control and experimental conditions. 1. Download the spreadsheet corresponding to your selected control and experimental conditions to your computer. 2. Take a few minutes to familiarize yourself with the spreadsheet layout. Column A: AGI#. AGI stands for Arabidopsis Genome Initiative. Every gene in the Arabidopsis was assigned a unique identifier during the genome sequencing project. The Affymetrix DNA chip contains over 22,000 genes representing nearly every known gene in the genome of Arabidopsis. Column B: Affy Probe Index #. The Affymetrix probe index # refers to the probe array that corresponds to each gene. Each probe array contains 11 pairs of probe to the same gene. One probe in each pair is a perfect match to the gene and the other contains a mismatch in the center of the probe. The software uses the data from the perfect match sets and the mismatch sets to subtract out signal that may have arisen from near (but not quite perfect) matches. The names of the probe sets are based on what was known about the gene sequence at the time the chip was created. Names ending inmeans _atall probes match one known transcript _aall probes match alternate transcripts from the same gene _sall probes match transcripts from different genes _xsome probes match transcripts from different genes Notice that rows 2 through 65 do not have AGI#’s and the Probe Index #’s all begin with AFFX. These are the quality control probe arrays for the chip. They are included so that researchers know that there were not technical issues with the chip or samples. A mix of probes that will result in positive and absent calls are included. There are also some cells in the AGI#’s column that are listed as a “0” instead of an Arabidopsis Genome Initiative number. We will not be utilizing these rows. Signal Columns: Each experiment in this data set was conducted 5 times. The columns that contain the word “Signal” in the header represent the value for the signal reads. Detection Columns: The column to the right of each signal column is the Detection Column. P= present A=absent M=marginal Present means the gene was expressed in the sample, resulting in a measurable signal above a minimal detection threshold. Absent means the gene was not expressed under the experimental conditions. Marginal means the expression was very near the detection threshold. Marginal calls require further investigation and experimentation to confirm. Converted Detection Columns: The column to the right of each Detection Column is the Converted Detection Column. The PMA calls are converted to a numeric value which allows the researcher to average the detection calls and decide whether or not to include a particular gene in the data set. P=2 A=0 M=1 Descriptions: what was known about the gene at the gene identity or function at the time the Chip was created. 3. Open the WT on sorbitol_germinating seeds (control) and the WT on glucose_germinating (experimental) seeds Excel files found on Blackboard. For both experimental and control conditions, delete the rows containing the controls. These will be the rows that lack an AGI# or have a “0” in the AG# column. You can highlight the entire spreadsheet and use the custom sort feature to sort on column A from smallest to largest. This will group your “0”’s and your blank cells to make it easy to delete them as a block. The 0’s will end up at the top and the blank cells at the bottom. 4. Open a new Excel file and name it as follows: Lastname_firstname_microarray. 5. Change the name of Sheet 1 to “control” by right clicking on the tab and selecting “rename” from the pop up menu. Copy and paste Row 1 to capture the headers and all the rows assigned for your group (see list below) from your control sheet (WT seeds on sorbitol) into the “control tab”. Group 1: Rows 2-3765 Group 2: Rows 3766-7531 Group 3: Rows 7532-11296 Group 4: Rows 11297-15062 Group 5: Rows 15063-18828 Group 6: Rows 18829-22592 6. Click the “+” sign to add another tab at the bottom of the Excel sheet. Rename the new sheet “experimental”. Copy and paste Row 1 to capture the headers and all the rows assigned for your group (see list above) all the data from your experimental sheet into the “experimental tab”. 7. Scroll to the right. Skip a column after the “Descriptions” column. Label the next column to the right “AVG control PMA” or “AVG experimental PMA”. Calculate the average PMA call for each gene using the converted detection column values for each condition. For example, if converted PMA detection calls are located in cells E2, I2, M2, an Q2, the formula you enter into the cell would be “=(E2+I2+M2+Q2)/4”. Do this for both your control and experimental sheets. Enter the formula and copy/paste it down the column. The row numbers will change automatically. 8. Click the “+” sign to add another tab to the bottom of the Excel sheet. Rename the new sheet “combined”. 9. Copy the following columns into the “combined” data sheet. You will need to paste “values” for any columns containing formulas. It’s under paste options. a. AGI# b. AGI probe number c. Description d. Signal columns for the control e. Leave a blank column f. Signal columns for the experimental g. Leave a blank column h. AVG control PMA column i. AVG experimental PMA column 10. In the combined data sheet, add another column to the right of your AVG control PMA and AVG Experimental PMA columns.. Label this one “final PMA call”. Type in the formula “=MAX(XX2:XY2) where XX is the column labeled “AVG control PMA” and XY is the column labeled “AVG exp PMA” (substitute your actual column letters for XX and XY). This formula will transfer the maximum value for the two