INSTRUCTIONS
The purpose of this exercise is to obtain a better understanding of your technical skills and capabilities in the areas of data processing, R programming and data visualization / reporting.
Please use R (https://www.r-project.org/) to complete these tasks.
The R package “NanoStringNorm” (https://cran.r-project.org/web/packages/NanoStringNorm/index.html) can be used to read in RCC files.
Please provide the following in your response to this assessment:
1. R code used to complete the tasks for quality control (Section 3.1), data analysis (Section 3.2) and reporting (Section 3.3) as indicated below. R code should include comments that allows the reviewer to follow the process used to complete this case study.
2. Brief report that summarizes what you did and shows all figures and tables created for this case study. You may choose the format for reporting (e.g., PowerPoint, HTML report).
3 CASE STUDY
A pharmaceutical company generated gene expression data using the NanoString nCounter assay for five subjects across two timepoints (i.e. 10 samples total). Raw data from these 10 samples in NanoString’s Reporter Code Count (RCC) format have been made available along with an annotation file connecting RCC files with subjects and timepoints as shown below:
RCC File
|
Subject
|
Timepoint
|
GSM2055823_01_4353_PD_mRNA
|
1
|
Baseline
|
GSM2055824_02_4355_PD_mRNA
|
1
|
Post-Treatment
|
GSM2055825_03_3366_PD_mRNA
|
2
|
Baseline
|
GSM2055826_04_4078_PD_mRNA
|
2
|
Post-Treatment
|
GSM2055827_05_4846_PD_mRNA
|
3
|
Baseline
|
GSM2055828_06_3746_PD_mRNA
|
3
|
Post-Treatment
|
GSM2055829_07_3760_PD_mRNA
|
4
|
Baseline
|
GSM2055830_08_3790_PD_mRNA
|
4
|
Post-Treatment
|
GSM2055831_09_4436_PD_mRNA
|
5
|
Baseline
|
GSM2055832_10_4050_PD_mRNA
|
5
|
Post-Treatment
|
3.1 Quality Control
3.1.1 Overview
The NanoString assay contains a set of negative and positive control genes indicated by code class “positive” or “negative” in the raw files as shown in the following example extracted from file GSM2055832_10_4050_PD_mRNA.RCC:
These positive and negative control genes are used to assess quality by evaluating signal and noise levels.
3.1.2 Task
Please generate a heatmap showing positive and negative control genes in columns and samples in rows. Please consider the potentially different scales of the data and transform the data appropriately if needed.
Technnical Assessment for Data Scientist Position
3.2 Data Analysis
3.2.1 Overview
The Pharma client is interested in showing differences between the baseline and post-treatment timepoints for two genes of interest: MCL1 and CXCL1.
3.2.2 Task
Please generate a figure showing boxplots of summary statistics (minimum, 25th
percentile, mean, median, 75th
percentile, and maximum) for each timepoint by gene. If possible, please also overlay the individual data points for each sample.
In addition, please generate a table that lists the numeric values of the summary statistics shown in the boxplot.
3.3 Reporting
3.3.1 Overview
The Pharma client asked for a report that contains all figures and tables generated for this project.
3.3.2 Task
Please provide a brief report in a format of your choosing per the instructions for this case study