Task 1. Data Set Preparation: a) From ArrayExpress, select experiment E-MEXP-76. · Download the processed data. · Open the data in Excel. · For each sample, use only the signal (gene expressions...

is a combination of R and excel. I can do the report myself. I just need the solution and explanation


Task 1. Data Set Preparation: a) From ArrayExpress, select experiment E-MEXP-76. · Download the processed data. · Open the data in Excel. · For each sample, use only the signal (gene expressions level) and the detection call (Present, Absent or Marginal) columns. This will give you a data matrix with a single probe set column, and with two columns (signal and detection call) for each biological sample. 3 Multivariate analysis of gene expression data and Biomarker discovery will be covered in DATA 522 and DATA 525, the subsequent courses in the Bioinformatics track of our Data Science program. Page 2 of 4 • Note that the class names are included in the sample names. The data set includes three classes. The required experiments should be performed for two of them: CVID and CTR. In Excel: b) Perform gene-expression-level quality assessment; at a very minimum provide: Report: 4 • MA plots (for each array versus a virtual reference array – see DMGP). • A single figure with all box plots for samples (arrays), c) Filter out probe sets whose expression measurements are not reliable or represent noise: 5 • Filter by the fraction of Present calls in a class. Use the threshold fraction of 25%. Refer to DMGP for the details of this filtering method. • Then filter by the range of expression values. Remove probe sets with the amplitude (max-min) of expression level less than the noise level determined for these data (see the note below). Note: Since many genes in a tissue are not expressed at biologically significant levels, the MAS5 trimmed mean of expression values for each array (which should be the same for all arrays if their expressions were scaled – check it!) can be used as a crude approximation of the noise level. Make sure that you calculate the trimmed means for the complete data of each array (not after any filtering). Refer to DMGP for the details of this filtering method and for MAS5 normalization and trimmed mean information. Provide a detailed and informative description of each step of your experiments performed in Task 1. Present the results and discuss them. Submit also supporting evidence – the Excel file showing all steps of your experiments. Task 2. Perform basic exploratory analysis using a t-test: In Excel, perform a classical t-test for the data set prepared in Task 1, to identify differentially expressed genes (probe sets). Decide whether the t-test for equal or for unequal population variances should be used, and provide your reasoning. Remember about adjusting for multiple comparisons (you are performing as many t-tests as the number of variables after filtering the data) – apply the following corrections: 4 If you have an old version of Excel that does not have the Box & Whisker chart, you may create the boxplot figure using a different software package. 5 If you do not have experience with Excel, then some of its functions —such as TRIMMEAN or COUNTIF— would be a good topic to start class discussions for this unit. You may also look at examples of their use in Chapter 6. Page 3 of 4 · Single-step Bonferroni procedure, · Step-down Holm procedure, · Step-up Benjamini and Hochberg procedure. Report: Provide a detailed description of the experiments performed in Task 2. Provide the most important results and discuss them. Compare the results of the three methods of correction for multiple comparisons. Your report has to be comprehensive and informative. Submit also the Excel file showing all steps of your experiments. Make sure that you follow all report and submission requirements as specified in Syllabus. Note: The goal of this exercise is to get familiar with the main quality assessment and filtering steps, and the univariate analysis of gene expression data. The selected data set is too small for the results to be biologically relevant. Extra credit tasks (optional) Note: Consider doing the extra credit tasks only after you are very satisfied with your experiments and your report for the required tasks. In a case of any major deficiency in your required tasks, I will not even look at any extra work. Extra credit Task A: Repeat all steps of the required experiments, but now using R. Extra credit Task B: Select a few of the most differentially expressed genes, and search for their annotation information using one of the databases or browsers covered in Unit 3.
Apr 05, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here