Hi,I am a masters student in bioinformatics. You will be doing homework 1. We are using R for the homework 1 that I attached below and for the class. You will need the renal carcinoma zip files...

1 answer below »
Hi,I am a masters student in bioinformatics. You will be doing homework 1. We are using R for the homework 1 that I attached below and for the class. You will need the renal carcinoma zip files because the run those files to do the homework. I also added lecture 2 and 3 because the homework is based on the lectures and you will find codes that are relevant to the homework at the end of the lectures. I also have video guidance about what the homework is asking but I can't download it here but I will send it to you if we make arrangement.


Microsoft Word - Homework 1.docx Gene Expression Analysis and Visualization HW #1 For this homework, we will be working with a study from Gene Expression Omnibus (GEO) with the accession GDS2880. This is an Affymetrix microarray experiment (HGU133A array). The data researchers were investigating patient matched normal and stage 1 or stage 2 clear cell renal cell carcinoma (cRCC) tumors to provide insight into the molecular pathogenesis of cRCC. We will be conducting outlier analysis using various methods to identify aberrant samples, followed by missing value imputation to assess the accuracy of two different algorithms. 1. Download and load the renal cell carcinoma data file into R. Make sure that the row names are in the correct location (Affymetrix fragment names). Look at the dimensions and verify that you have 22 arrays and 22,283 probesets. (2pts) 2. Label the header columns of your data frame maintaining the GSM ID, but adding the Normal/Tumor identity. (2pts) 3. Identify any outlier samples using the following visual plots: I. Correlation plot (heat map) (2pts) II. Hierarchical clustering dendrogram (2pts) III. CV vs. mean plot (2pts) IV. Average correlation plot (2pts) For all plots, make sure you label the points appropriately, title plots, and label axes. You will also need to provide a legend for the correlation plot. You can use the gplots for a color gradient, or just use the default colors. 4. Install and load the impute library. 5. Remove the outlier samples you identified in the first part of this assignment. (2pts) 6. Now we are going to use a couple of transcripts that were determined in this study to be indicative of normal renal function. The genes we will assess are kininogen 1 (KNG1) and aquaporin 2 (AQP2). Using either NetAffx or Gene Cards websites (or other resources, if you like), extract the probesets for these two genes. Hint: KNG1 has two while AQP2 has one. Then plot a profile plot (expression intensity vs. samples) for each probeset for these two genes. You may have to convert the data frame row to a vector to plot it. Do the plots of these genes seem to indicate normal renal function? Explain. (6pts.) 7. We want to assess the accuracy of missing value imputation. So assign the KNG1 probeset (206054_at) an NA value, only for array GSM146784. Be sure to first save the original value before replacing it with an NA. Also cast the data frame to a matrix to run this function. (2pts.) 8. Now estimate the missing values in the array using 6 nearest neighbors and Euclidean distance with the impute.knn() function. (2pts.) 9. Look at the value that was imputed for your gene and calculate the relative error of this value using the actual value that you saved. (2pts.) 10. Now impute the missing values using the SVD imputation method. This is in the pcaMethods package and the function is called pca() with method svdImpute and set nPcs=9. To retrieve the output matrix, see the help file. (2pts.) 11. Finally, plot a gene profile plot of the probeset for this gene, where the two different imputed values are represented as different colored points and the actual value is a third point. (6pts.) Generate the code and plots for each. Turn in the visuals, code, and an explanation of the questions asked. Paste all information into a PDF doc. renal_carcinoma_annotation.txt GSM146778 = Value for GSM146778: Stage 1, PT2, Normal (HG-U133A); src: Human Renal Epithelium GSM146780 = Value for GSM146780: Stage 1, PT3, Normal (HG-U133A); src: Human Renal Epithelium GSM146782 = Value for GSM146782: Stage 1, PT4, Normal (HG-U133A); src: Human Renal Epithelium GSM146784 = Value for GSM146784: Stage 1, PT5, Normal (HG-U133A); src: Human Renal Epithelium GSM146786 = Value for GSM146786: Stage 1, PT6, Normal (HG-U133A); src: Human Renal Epithelium GSM146789 = Value for GSM146789: Stage 2, PT8, Normal (HG-U133A); src: Human Renal Epithelium GSM146790 = Value for GSM146790: Stage 2, PT9, Normal (HG-U133A); src: Human Renal Epithelium GSM146792 = Value for GSM146792: Stage 2, PT10, Normal (HG-U133A); src: Human Renal Epithelium GSM146794 = Value for GSM146794: Stage 2, PT11, Normal (HG-U133A); src: Human Renal Epithelium GSM146798 = Value for GSM146798: Stage 2, PT1, Normal (HG-U133A); src: Human Renal Epithelium GSM146796 = Value for GSM146796: Stage 2, PT12, Normal (HG-U133A); src: Human Renal Epithelium GSM146779 = Value for GSM146779: Stage 1, PT2, Tumor (HG-U133A); src: Human Renal Epithelium GSM146781 = Value for GSM146781: Stage 1, PT3, Tumor (HG-U133A); src: Human Renal Epithelium GSM146783 = Value for GSM146783: Stage 1, PT4, Tumor (HG-U133A); src: Human Renal Epithelium GSM146785 = Value for GSM146785: Stage 1, PT5, Tumor (HG-U133A); src: Human Renal Epithelium GSM146787 = Value for GSM146787: Stage 1, PT6, Tumor (HG-U133A); src: Human Renal Epithelium GSM146788 = Value for GSM146788: Stage 2, PT8, Tumor (HG-U133A); src: Human Renal Epithelium GSM146791 = Value for GSM146791: Stage 2, PT9, Tumor (HG-U133A); src: Human Renal Epithelium GSM146799 = Value for GSM146799: Stage 2, PT1, Tumor (HG-U133A); src: Human Renal Epithelium GSM146793 = Value for GSM146793: Stage 2, PT10, Tumor (HG-U133A); src: Human Renal Epithelium GSM146795 = Value for GSM146795: Stage 2, PT11, Tumor (HG-U133A); src: Human Renal Epithelium GSM146797 = Value for GSM146797: Stage 2, PT12, Tumor (HG-U133A); src: Human Renal Epithelium Archive created by free jZip.url [InternetShortcut] URL=http://www.jzip.com/archive_link renal_cell_carcinoma.txt GSM146778GSM146780GSM146782GSM146784GSM146786GSM146789GSM146790GSM146792GSM146794GSM146798GSM146796GSM146779GSM146781GSM146783GSM146785GSM146787GSM146788GSM146791GSM146799GSM146793GSM146795GSM146797 1007_s_at1942.12358.32465.22732.91952.22048.321093005.12568.19898.340691212132107.71940.22608.81837.21559.22111.626411972.710191.67139470121737.21636.12718.4 1053_at40.158.2132.664.366.169109.759.481.7134.278622164788100.8170.1186.7103.886.886152.7124.8213.483548572135115202.1257.3 117_at72.1248.885.5129.5161.2148.9157182143.9250.889424481655160.4104.2252262.8204.4217.1291.5182.8264.53753958106117.5393.1254.6 121_at4693.67098.26314.150386012.46472.86940.410609.76942.78088.449062725386864.935973143.72253.32835.43746.14563.54245.38672.764781074532875.134434103.6 1255_g_at35.997.322.42385.19.645.1110.334.215.546487763064791.255.719.15325.650.312.250.373.036103970231247.466.397.8 1294_at546.8479.8426.3591402.6524.6444.5469.6495.9500.768575676875334.4678.5503.6615.4364.9418.4495.9274.9424.307342524535457.5447.5395.9 1316_at213.3254.5341.1265.1248.5215.7192.9220.3226.4900.625721429576243.2171.6178.2233192.2203153.1154.8805.427453239524130.2161.1208.8 1320_at89.497.960.7117.176.5103.813664.329.583.776142178401495110.121.138.497.436.230.647.7109.76450227613315.36436.6 1405_i_at153.524.649.187.2129.531.376.528.413.144.84333247304823.82681.7352.5833.3315.3323.11421.9248109.601116108248188.8629.8535.4 1431_at62.759.7103.4118.386.650.769.199.553.5108.81550555726167.67764.360.779.853.736.461.2148.30673439855585.785.468 1438_at39.3144.9175.4171.1136.940.554.5371.3109.390.374475005373944.829.9104145.4115.1169151.879.5121.61503426649136.9115.8132 1487_at1100.51265.21148.11252.21254.81364.11386.61341.81257.34167.132976221391356.6689.6462.8595.6675606.7855.7579.64235.86730065874688.8613.2615.6 1494_f_at220.1342.3278.5401.9421367.1333.2336.6298.9624.490557835215254.2261.2314.8267.1188.9192.4347.6226.6566.821721249506226.4258.2361.1 1598_g_at2670.63060.43218.92511.83074.71751.81556.44057.72079.77410.06227219515199513901142.92903.12484.51808.32996.93070.96583.526042039352624.41551.51119.7 160020_at447.6486.7827.2705565.6637.3696.7699.86011857.66261770867862.9742.11026699.2992.9844.31035.2671.82030.80738389712426.7905.9950.3 1729_at853.4719.3546.5446.1714.2734.9546403.3412.41330.05263420335320.81347.11414.211431227.4926.81534.81359.2988.1735450742021762.8827.1782.1 1773_at23.523.1131.157.144.7126.5247.1129.6167.531.0929390540824184.7188.224.8134.8157.2222.7225106.260.735693598922976.3200.8243 177_at174.9184.6211.9214.4175.7215.9147149.7268.7145.335066151151356.5183.4134.4181.2182.9139.3109.1133.598.9543989079167204.4195107.4 179_at902592.41079.5986.1586.2540.1713.9820.6579.4350.432784862624749.1565.2725.6717.3512.3591.6663.1450.5463.199342924324465626.91644.1 1861_at170.4359.5251.5192.1210220.6229.7335.6282.2545.937587682958238.5235.5155.7164.9225.2154.4214.3219.1568.458131871149161.7257.7377.4 200000_s_at1148.11268.71228.71311.71345.61233.71224.42001.11520.42491.257752436371348.71646.31260.81455.81372.31883.31369.71375.42472.477905554911582.61788.51214 200001_at27834762.83474.84384.64405.94263.34620.95884.14123.68583.186500518264628.852704958.85006.14156.84656.25558.53907.59303.132901800213491.838207454.9 200002_at6286.25156.55138.15187.55360.44072.44261.85470.75157.217272.0633739164642.68186.39541.77382.49562.66289.95342.79605.218075.355978527411442.24857.85235.1 200003_s_at8767.36996.55098.45791.37031.14638.25098.46457.265954142.415922357655277.614226.611615.81070312037.38928.19817.910524.84553.42446176281342710240.610428.9 200004_at5329.24285.14440.74161.34981.65856.35583.34752.256724151.405659315745712.15835.94894.45734.75189.56014.45538.25119.73957.881805518435807.35768.75255.4 200005_at3142.229362286.92907.92888.823772330.32828.12928.511319.21879903532069.83512.53361.13157.13824.63646.52734.4351911096.834483547346432813.42923.5 200006_at7191.68229.25549.85901.16944.46869.26312.247526748.93418.237214104096538.76661.66574.54305.45782.13970.75392.558863141.0640166056976594077.45859 200007_at6529.17541.58674.36979.37015.56645.35978.374516298.48421.313989158136096.64696.358616295.76408.95823.16373.86150.97576.052461346226240.84471.95206.6 200008_s_at831.41094.61523.81301.2769.31245.22229.31337.41392.37553.450676120291518.41746.11852.42070.51738.92934.82164.51309.413523.0546034974660.11752.82259.3 200009_at51755001.64053.34975.24610.84559.24019.24116.73755.911978.61822934374178.25537.55070.25659.15966.357584428.55659.410559.84874262556283.84146.64588.3 200010_at6271.98441.659906621.95876.363338783.67462.18380.81313.595397891059331.711514.312316.99665.813446.510909.38436.311861.31821.9006058607111122.37590.28933.1 200011_s_at1541.72174.31683.61876.91624.72280.22364.62401.12215.91008.177001052151971.92006.82569.12173.2185120732537.71639.41045.493963989081587.51788.83091.9 200012_x_at14554.713194.19829.510769.311700.69424.79526.912103.111656.516217.395743579811055.511563.415679.31306411963.19770.69412.217064.616393.2546934661148948929.96867.3 200013_at10180.69269.49587.48794.610295.78286.19262.58373.29272.812851.2122298999501.810689.515213.511044.714112.18796.88300.914132.314365.546310017914372.37953.97770.3 200014_s_at1166.81404.11007.7959.81219.91280.5975.81290.91146.187.8497158174254958.41860.41512.110871786.81180.59491991.166.94553119456751996.5960.7882.7 200015_s_at4558.34166.93586.83708.745114683.531544939.53849.85007.702351298943554.13755.92321.93968.24158.93643.74940.54469.63372.326938400095956.83702.64174.5 200016_x_at12100.211848.19513.89283.912586.211960.59462.311530.812595.727557.867641253213668.411364.613776.511815.814443.210490.49534.413055.721801.831945305813694.410412.49098.6 200017_at10029.210760.36614.97598.311048.410913.18841.87282.77636.918683.27623416828351.11006612967.21296812624.610859.86645.313920.615137.201327511813901.19917.27049.4 200018_at14920.313365.411218.71210313733.910501.310993.712020.311720.118190.346676064210598.712887.914825.715235.715084.312033.311372.417637.619043.281712992420270.710998.311511.1 200019_s_at7947.28642.14813.37052.983955265.45025.76533.65859.89745.142280334834527
Answered 2 days AfterSep 30, 2021

Answer To: Hi,I am a masters student in bioinformatics. You will be doing homework 1. We are using R for the...

Mohd answered on Oct 03 2021
139 Votes
---
title: '-'
author: '-'
date: "10/3/2021"
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,cache = TRUE,war
ning = FALSE,message = FALSE,dpi = 180,fig.width = 8,fig.height = 5)
```
Loading Packages
```{r}
library(readr)
library(magrittr)
library(dplyr)
library(ggplot2)
library(rmarkdown)
library(MASS)
```
```{r}
ren_carci <- read.delim("renal_cell_carcinoma.txt")
ren_car_anno <- read.csv("renal_carcinoma_annotation.txt",
sep = ";", header = FALSE)
renc_df <- as.data.frame(ren_carci)
```
```{r}
colnames(renc_df) <- c("GSM146778_Normal",
"GSM146780_Normal",
"GSM146782_Normal",
"GSM146784_Normal",
"GSM146786_Normal",
"GSM146789_Normal",
"GSM146790_Normal",
"GSM146792_Normal",
"GSM146794_Normal",
"GSM146798_Normal",
"GSM146796_Normal",
"GSM146779_Tumor",
"GSM146781_Tumor",
"GSM146783_Tumor",
"GSM146785_Tumor",
"GSM146787_Tumor",
"GSM146788_Tumor",
"GSM146791_Tumor",
"GSM146799_Tumor",
"GSM146793_Tumor",
"GSM146795_Tumor",
"GSM146797_Tumor") #column names...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here