A file combining clinical data and genetic data is provided and has been preprocessed (BRCAMerged.csv). A number of functions have been provided in cells#1-12 of the attached notebook...

1 answer below »

A file combining clinical data and genetic data is provided and has been preprocessed (BRCAMerged.csv).


A number of functions have been provided in cells#1-12 of the attached notebook (AssignmentWeek3.pdf). The corresponding script is provided as a Jupyternotebook and an R script, both called AssignmentWeek3.


The question asked in this assignment is to determine a selection of genes, called a genetic signature, that best separates normals and cancer samples. You can use the trail-and-error approach and visualize on a heatmap.


1) Download the attached files and place them in the same folder:



  • BRCAMerged.csv

  • AssignmentWeek3.ipynb

  • AssignmentWeek3.R


2) Run the script either as Week3.iptnb(Jupyternotebook installation) or as Week3.R (RStudioinstallation).


3) Through trial-and-error, for example adjusting the number of genes from Cell#12, fixed at 100, represent a better grouping of genes that optimally separates normals from cancer samples - you can see this by having most red samples grouped together. You can use any method of your choice to determine an optimal genetic signature.



  • You need to find a set of genes to generate a heatmap which groups more red samples together than the original heatmap.

  • The columns 1-40 of our input data contain the values of clinical features. As explained in the assignment description, we are selecting features representing the genes expressed differently between two groups. Therefore, you cannot select the clinical features.

  • If you want to use the BSS/WSS method, you can change the number of the genes selected for the original heatmap in our R script. This means that you can choose a subset of the genes selected for the original heatmap.

  • You can use any feature selection method other than the BSS/WSS method if you wish.


4) Turn in the assignment as a plain R script file (not Jupyter Notebook file), an original heatmap image, and a heatmap image of the selected genes, attached to your submission.


Note: the file BRCAMerged.csv can be downloaded from Google Drive:https://drive.google.com/file/d/1I8yySge8gTfKR2WlpQ_Q1SSAR-O8dtwn/view?usp=sharing(Links to an external site.)


Attachments


Answered 1 days AfterJul 24, 2021

Answer To: A file combining clinical data and genetic data is provided and has been preprocessed...

Subhanbasha answered on Jul 26 2021
148 Votes
# cell #1 loading the data set, which has patients as rows and variables as columns
mrnaNorm <- read.table("BRCAMerg
ed.csv", header = T, sep=",")
class(mrnaNorm)
# cell #2
dim(mrnaNorm)
mrnaNorm5x5 = mrnaNorm[1:5, 1:5] # first 5 rows and columns
head(mrnaNorm5x5, 2) # display first two rows
summary(mrnaNorm[,1:5]) # summary statistics for the first 5 variables
# cell #3
summary(mrnaNorm[,"type"])
# 1 1212
# cell #4
unique(mrnaNorm[,"type"])
# extracts how many unique objects there are
tab <- table(unlist(mrnaNorm[,"type"]))
tab
# count how many of each type
# C M MN
# 1093 7 112
# cell #5
sampClass <- lapply(mrnaNorm[,"type"], function(t) (if (t == "MN") return("0") else return("1")))
mrnaClass <- as.data.frame(sampClass)
dim(mrnaClass)
table(unlist(sampClass))
# 0 1
# 112 1100
# cell #6
sampClassNum <- lapply(mrnaNorm[,"type"], function(t) (if (t == "MN") return(0) else return(1)))
mrnaClassNum <- as.data.frame(sampClassNum)
table(unlist(mrnaClassNum))
# cell #7
bssWssFast <- function (X, givenClassArr, numClass=2)
# between squares / within square feature selection
{
    classVec <- matrix(0, numClass,...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here