A file combining clinical data and genetic data is provided and has been preprocessed (BRCAMerged.csv).A number of functions have been provided in cells#1-12 of the attached notebook (AssignmentWeek3.pdf). The corresponding script is provided as a Jupyternotebook and an R script, both called AssignmentWeek3.The question asked in this assignment is to determine a selection of genes, called a genetic signature, that best separates normals and cancer samples. You can use the trail-and-error approach and visualize on a heatmap.1) Download the attached files and place them in the same folder: BRCAMerged.csv AssignmentWeek3.ipynb AssignmentWeek3.R2) Run the script either as Week3.iptnb(Jupyternotebook installation) or as Week3.R (RStudioinstallation).3) Through trial-and-error, for example adjusting the number of genes from Cell#12, fixed at 100, represent a better grouping of genes that optimally separates normals from cancer samples - you can see this by having most red samples grouped together. You can use any method of your choice to determine an optimal genetic signature. You need to find a set of genes to generate a heatmap which groups more red samples together than the original heatmap. The columns 1-40 of our input data contain the values of clinical features. As explained in the assignment description, we are selecting features representing the genes expressed differently between two groups. Therefore, you cannot select the clinical features. If you want to use the BSS/WSS method, you can change the number of the genes selected for the original heatmap in our R script. This means that you can choose a subset of the genes selected for the original heatmap. You can use any feature selection method other than the BSS/WSS method if you wish.4) Turn in the assignment as a plain R script file (not Jupyter Notebook file), an original heatmap image, and a heatmap image of the selected genes, attached to your submission.Note: the file BRCAMerged.csv can be downloaded from Google Drive:https://drive.google.com/file/d/1I8yySge8gTfKR2WlpQ_Q1SSAR-O8dtwn/view?usp=sharing(Links to an external site.)Attachments Data Dictionary.pdf AssignmentWeek3.pdf AssignmentWeek3.ipynb AssignmentWeek3.r BRCAMerged.csv:https://drive.google.com/file/d/1I8yySge8gTfKR2WlpQ_Q1SSAR-O8dtwn/view?usp=sharing

A file combining clinical data and genetic data is provided and has been preprocessed (BRCAMerged.csv). A number of functions have been provided in cells#1-12 of the attached notebook...

A file combining clinical data and genetic data is provided and has been preprocessed (BRCAMerged.csv).

A number of functions have been provided in cells#1-12 of the attached notebook (AssignmentWeek3.pdf). The corresponding script is provided as a Jupyternotebook and an R script, both called AssignmentWeek3.

The question asked in this assignment is to determine a selection of genes, called a genetic signature, that best separates normals and cancer samples. You can use the trail-and-error approach and visualize on a heatmap.

1) Download the attached files and place them in the same folder:

BRCAMerged.csv

AssignmentWeek3.ipynb

AssignmentWeek3.R

2) Run the script either as Week3.iptnb(Jupyternotebook installation) or as Week3.R (RStudioinstallation).

3) Through trial-and-error, for example adjusting the number of genes from Cell#12, fixed at 100, represent a better grouping of genes that optimally separates normals from cancer samples - you can see this by having most red samples grouped together. You can use any method of your choice to determine an optimal genetic signature.

You need to find a set of genes to generate a heatmap which groups more red samples together than the original heatmap.

The columns 1-40 of our input data contain the values of clinical features. As explained in the assignment description, we are selecting features representing the genes expressed differently between two groups. Therefore, you cannot select the clinical features.

If you want to use the BSS/WSS method, you can change the number of the genes selected for the original heatmap in our R script. This means that you can choose a subset of the genes selected for the original heatmap.

You can use any feature selection method other than the BSS/WSS method if you wish.

4) Turn in the assignment as a plain R script file (not Jupyter Notebook file), an original heatmap image, and a heatmap image of the selected genes, attached to your submission.

Note: the file BRCAMerged.csv can be downloaded from Google Drive:https://drive.google.com/file/d/1I8yySge8gTfKR2WlpQ_Q1SSAR-O8dtwn/view?usp=sharing(Links to an external site.)

Attachments

Data Dictionary.pdf

AssignmentWeek3.pdf

AssignmentWeek3.ipynb

AssignmentWeek3.r

BRCAMerged.csv:https://drive.google.com/file/d/1I8yySge8gTfKR2WlpQ_Q1SSAR-O8dtwn/view?usp=sharing

assignmentweek3-1-h1plmvvl.pdf assignmentweek3-d1kwpqr4.ipynb assignmentweek3-wfsxa404.r data-dictionary-5-t2jun1ls.pdf

Answered 1 days AfterJul 24, 2021

Answer To: A file combining clinical data and genetic data is provided and has been preprocessed...

Subhanbasha answered on Jul 26 2021

148 Votes

# cell #1 loading the data set, which has patients as rows and variables as columns
mrnaNorm <- read.table("BRCAMerged.csv", header = T, sep=",")
class(mrnaNorm)
# cell #2
dim(mrnaNorm)
mrnaNorm5x5 = mrnaNorm[1:5, 1:5] # first 5 rows and columns
head(mrnaNorm5x5, 2) # display first two rows
summary(mrnaNorm[,1:5]) # summary statistics for the first 5 variables
# cell #3
summary(mrnaNorm[,"type"])
# 1 1212
# cell #4
unique(mrnaNorm[,"type"])
# extracts how many unique objects there are
tab <- table(unlist(mrnaNorm[,"type"]))
tab
# count how many of each type
# C M MN
# 1093 7 112
# cell #5
sampClass <- lapply(mrnaNorm[,"type"], function(t) (if (t == "MN") return("0") else return("1")))
mrnaClass <- as.data.frame(sampClass)
dim(mrnaClass)
table(unlist(sampClass))
# 0 1
# 112 1100
# cell #6
sampClassNum <- lapply(mrnaNorm[,"type"], function(t) (if (t == "MN") return(0) else return(1)))
mrnaClassNum <- as.data.frame(sampClassNum)
table(unlist(mrnaClassNum))
# cell #7
bssWssFast <- function (X, givenClassArr, numClass=2)
# between squares / within square feature selection
{
classVec <- matrix(0, numClass,...

SOLUTION.PDF

A file combining clinical data and genetic data is provided and has been preprocessed (BRCAMerged.csv). A number of functions have been provided in cells#1-12 of the attached notebook...

Attachments

Answer To: A file combining clinical data and genetic data is provided and has been preprocessed...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment