INSTRUCTIONS The purpose of this exercise is to obtain a better understanding of your technical skills and capabilities in the areas of data processing, R programming and data visualization /...

1 answer below »


INSTRUCTIONS


The purpose of this exercise is to obtain a better understanding of your technical skills and capabilities in the areas of data processing, R programming and data visualization / reporting.


Please use R (https://www.r-project.org/) to complete these tasks.


The R package “NanoStringNorm” (https://cran.r-project.org/web/packages/NanoStringNorm/index.html) can be used to read in RCC files.


Please provide the following in your response to this assessment:


1. R code used to complete the tasks for quality control (Section 3.1), data analysis (Section 3.2) and reporting (Section 3.3) as indicated below. R code should include comments that allows the reviewer to follow the process used to complete this case study.


2. Brief report that summarizes what you did and shows all figures and tables created for this case study. You may choose the format for reporting (e.g., PowerPoint, HTML report).



3 CASE STUDY


A pharmaceutical company generated gene expression data using the NanoString nCounter assay for five subjects across two timepoints (i.e. 10 samples total). Raw data from these 10 samples in NanoString’s Reporter Code Count (RCC) format have been made available along with an annotation file connecting RCC files with subjects and timepoints as shown below:






























































RCC File




Subject




Timepoint




GSM2055823_01_4353_PD_mRNA



1



Baseline




GSM2055824_02_4355_PD_mRNA



1



Post-Treatment




GSM2055825_03_3366_PD_mRNA



2



Baseline




GSM2055826_04_4078_PD_mRNA



2



Post-Treatment




GSM2055827_05_4846_PD_mRNA



3



Baseline




GSM2055828_06_3746_PD_mRNA



3



Post-Treatment




GSM2055829_07_3760_PD_mRNA



4



Baseline




GSM2055830_08_3790_PD_mRNA



4



Post-Treatment




GSM2055831_09_4436_PD_mRNA



5



Baseline




GSM2055832_10_4050_PD_mRNA



5



Post-Treatment










3.1 Quality Control



3.1.1 Overview


The NanoString assay contains a set of negative and positive control genes indicated by code class “positive” or “negative” in the raw files as shown in the following example extracted from file GSM2055832_10_4050_PD_mRNA.RCC:



These positive and negative control genes are used to assess quality by evaluating signal and noise levels.



3.1.2 Task


Please generate a heatmap showing positive and negative control genes in columns and samples in rows. Please consider the potentially different scales of the data and transform the data appropriately if needed.





Technnical Assessment for Data Scientist Position



3.2 Data Analysis



3.2.1 Overview


The Pharma client is interested in showing differences between the baseline and post-treatment timepoints for two genes of interest: MCL1 and CXCL1.



3.2.2 Task


Please generate a figure showing boxplots of summary statistics (minimum, 25th
percentile, mean, median, 75th
percentile, and maximum) for each timepoint by gene. If possible, please also overlay the individual data points for each sample.


In addition, please generate a table that lists the numeric values of the summary statistics shown in the boxplot.





3.3 Reporting



3.3.1 Overview


The Pharma client asked for a report that contains all figures and tables generated for this project.



3.3.2 Task


Please provide a brief report in a format of your choosing per the instructions for this case study
Answered Same DayAug 03, 2021

Answer To: INSTRUCTIONS The purpose of this exercise is to obtain a better understanding of your technical...

Mohd answered on Aug 04 2021
150 Votes
Untitled
Untitled
-
8/4/2021
knitr::opts_chunk$set(echo = TRUE,cache = TRUE,warning = FALSE,message = FALSE,dpi = 180,fig.width = 8,fig.height = 5)
#if (!requireNamespace("BiocManager", quietly = TRUE))
#install.packages("BiocManager")
#BiocManager::install
("ComplexHeatmap")
library(ComplexHeatmap)
#BiocManager::install("dendextend")
library(dendextend)
#install.packages("corrplot")
library(corrplot)
library(rcc)
#BiocManager::install("NanoStringNorm")
library(nanostringr)
rcc_file<-read_rcc(path = "/cloud/project/files")
Heatmap
my_matrix<-as.matrix(rcc_file$raw[,c(4:13)])
class(rcc_file)
## [1] "list"
class(my_matrix)
## [1] "matrix" "array"
fontsize<-0.5
#Gene_symbol<-data.frame(Gene=mmc$Gene.Symbol)
Heatmap(my_matrix)
Heatmap(my_matrix,cluster_columns = FALSE,
row_names_side = "left",
row_names_gp=gpar(cex=fontsize))
COL.OVD <- "#66C2A5"
COL.OVO <- "#A6D854"
COL.OVCL <- "#FC8D62"
COL.HLD <- "#8DA0CB"
COL.HLO <- "#E78AC3"
getNum <- function(str.vect) {
sapply(strsplit(str.vect, "[_]"), "[[", 2)
}
#boxplot(perFOV ~ fove.counted, ylab = "% fov", main = "% FOV by fove.counted", data = rcc_file$exp, pch = 20,
#col = c(COL.HLD, COL.OVD, COL.OVCL, COL.HLO, COL.OVO))
#abline(h = 75, lty = 2, col = "red")
#grid(NULL, NULL, lwd = 1)
#CorreletionMatrix
res <- cor(rcc_file$raw[4:13])
round(res, 2)
## gsm2055823014353pdmrna-zxlba3er
## gsm2055823014353pdmrna-zxlba3er 1.00
## gsm2055824024355pdmrna-cplirsoi 0.90
## gsm2055825033366pdmrna-vdzuy1ic 0.91
## gsm2055826044078pdmrna-ir4cfdoi 0.98
## gsm2055827054846pdmrna-1og2mkza 0.90
## gsm2055828063746pdmrna-owqwv5us 0.77
## gsm2055829073760pdmrna-hgjac45r 0.90
## gsm2055830083790pdmrna-av1oifdi 0.94
## gsm2055831094436pdmrna-3ubwxwbn 0.91
## gsm2055832104050pdmrna-xftlqjmo 0.95
## gsm2055824024355pdmrna-cplirsoi
## gsm2055823014353pdmrna-zxlba3er 0.90
## gsm2055824024355pdmrna-cplirsoi 1.00
## gsm2055825033366pdmrna-vdzuy1ic 0.71
## gsm2055826044078pdmrna-ir4cfdoi 0.88
## gsm2055827054846pdmrna-1og2mkza 1.00
## gsm2055828063746pdmrna-owqwv5us 0.57
## gsm2055829073760pdmrna-hgjac45r 0.99
##...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here