Given the subset of genes and their expressions (mRNAseq), perform descriptive analysis and visualization. Take example on the worksheet.
A template in R JupyterNotebook format is provided (AssignmentWeek2.pdf). Each comment line describes a statement after which the R code should be added.
1) Install package 'ggplot2' - if not already installed.
2) Load package 'ggplot2'.
3) Read dataset from file 'BRCAMergedWAVE.csv'. The documentation is located in file'DataDictionary.pdf'.
4) List number of rows, number of columns, and dimension of the dataset.
5) Display the list of variables (which is the first row, numbered 0).
6) Provide descriptive statistics for all variables in the dataset.
7) Draw a histogram of age (Diagnosis.Age) showing the frequency of cancer for each age.
8) Draw a scatterplot of variable/gene ABI1 as a function of age. Add a smoothing line.
9) Draw a scatterplot of variable/gene WASF1 as a function of age. Add a smoothing line.
10) Turn in the assignment as a plain R script file (do NOT submit a Jupyter Notebook file).
7/15/2019 AssignmentWeek2 localhost:8888/notebooks/ml/BHI550-R/AssignmentWeek2.ipynb# 1/2 Install package 'ggplot2' - if not already installed In [ ]: Load package 'ggplot2' In [ ]: Read dataset from file 'BRCAMergedWAVE.csv'. The documentation is located in file 'DataDictionaryWeek2.pdf'. In [ ]: List number of rows, number of columns, and dimension of te dataset. In [ ]: Display the list of variables (which is the first row, numbered 0). In [ ]: Provide descriptive statistics for all variables in the dataset. In [ ]: Draw a histogram of age (Diagnosis.Age) showing the frequency of cancer for each age. In [ ]: Draw a scatterplot of variable/gene ABI1 as a function of age. Add a smoothing line. In [ ]: Draw a scatterplot of variable/gene WASF1 as a function of age. Add a smoothing line. In [ ]: In [ ]: # number of rows # number of columns # number of rows and columns # returns the list of variables 7/15/2019 AssignmentWeek2 localhost:8888/notebooks/ml/BHI550-R/AssignmentWeek2.ipynb# 2/2 "Sample.ID","Patient.ID","sampleType","type","stage","Diagnosis.Age","American.Joint.Committee.on.Cancer.Metastasis.Stage.Code","Neoplasm.Disease.Lymph.Node.Stage.American.Joint.Committee.on.Cancer.Code","Neoplasm.Disease.Stage.American.Joint.Committee.on.Cancer.Code","American.Joint.Committee.on.Cancer.Tumor.Stage.Code","Cancer.Type.Detailed","Birth.from.Initial.Pathologic.Diagnosis.Date","Days.to.Sample.Collection.","Death.from.Initial.Pathologic.Diagnosis.Date","Last.Alive.Less.Initial.Pathologic.Diagnosis.Date.Calculated.Day.Value","Days.to.Last.Followup","Disease.Free..Months.","Disease.Free.Status","ER.Status.By.IHC","ER.Status.IHC.Percent.Positive","Ethnicity.Category","Fraction.Genome.Altered","HER2.fish.status","HER2.ihc.percent.positive","HER2.ihc.score","Neoplasm.Histologic.Type.Name","Prior.Cancer.Diagnosis.Occurence","IHC.HER2","Year.Cancer.Initial.Diagnosis","Menopause.Status","Patient.Metastatic.Sites","Overall.Survival..Months.","Overall.Survival.Status","Patient.Primary.Tumor.Site","PR.status.by.ihc","PR.status.ihc.percent.positive","Race.Category","Sample.Type","Sex","Person.Neoplasm.Status","ABI1","ABI2","ABI3","BRK1","CYFIP1","CYFIP2","NCKAP1L","NCKAP1","WASF1","WASF2","WASF3" "TCGA-3C-AAAU-01","TCGA-3C-AAAU","01","C","NA","55","MX","NX","Stage X","TX",NA,"-20211","3599",NA,"0","3767","59.4","Recurred/Progressed","Positive","50-59%","NOT HISPANIC OR LATINO","0.7787","[Not Evaluated]",NA,NA,"Infiltrating Lobular Carcinoma","No","Negative","2004","Pre (<6 months="" since="" lmp="" and="" no="" prior="" bilateral="" ovariectomy="" and="" not="" on="" estrogen="" replacement)",na,"132.95","living","left="" lower="" outer="" quadrant","positive","50-59%","white","primary","female","with="" tumor","-2.0640","-0.2766","-0.4904","-2.0366","-0.6276","-0.1848","-0.7496","-0.1460","-0.8868","2.1615","0.0376"="" "tcga-3c-aali-01","tcga-3c-aali","01","c","2","50","m0","n1a","stage="" iib","t2","breast="" invasive="" ductal="">6><10%","not hispanic="" or="" latino","0.7164",na,na,na,"infiltrating="" ductal="" carcinoma","no","positive","2003","post="" (prior="" bilateral="" ovariectomy="" or="">12 mo since LMP with no prior hysterectomy)",NA,"131.57","LIVING","Right Upper Outer Quadrant","Positive","<10%","black or="" african="" american","primary","female","tumor="" free","-1.2726","-0.3246","1.3957","-0.9959","-1.5814","-0.6934","-0.1818","-0.8773","-0.6140","-2.1910","0.3597"="" "tcga-3c-aalj-01","tcga-3c-aalj","01","c","2","62","m0","n1a","stage="" iib","t2","breast="" invasive="" ductal="" carcinoma","-22848","1026",na,"0","1228","48.42","diseasefree","positive","90-99%","not="" hispanic="" or="" latino","0.534",na,na,na,"infiltrating="" ductal="" carcinoma","no","indeterminate","2011","post="" (prior="" bilateral="" ovariectomy="" or="">12 mo since LMP with no prior hysterectomy)",NA,"48.42","LIVING","Right","Positive","30-39%","BLACK OR AFRICAN AMERICAN","Primary","Female","TUMOR FREE","-0.5440","-1.0187","-0.0233","-0.7693","-1.8434","-0.4144","-0.1828","-1.5121","-0.7096","-0.8078","-0.6161" "TCGA-3C-AALK-01","TCGA-3C-AALK","01","C","1","52","M0","N0 (i+)","Stage IA","T1c","Breast Invasive Ductal Carcinoma","-19074","1022",NA,"0","1217","47.57","DiseaseFree","Positive","70-79%","NOT HISPANIC OR LATINO","0.0764",NA,NA,NA,"Infiltrating Ductal Carcinoma","No","Positive","2011",NA,NA,"47.57","LIVING","Right","Positive","80-89%","BLACK OR AFRICAN AMERICAN","Primary","Female","TUMOR FREE","-1.2824","0.1820","-0.0180","0.1454","-0.3691","-0.8713","-0.5229","-0.7449","-0.0637","-0.9419","-0.6488" "TCGA-4H-AAAK-01","TCGA-4H-AAAK","01","C","3","50","M0","N2a","Stage IIIA","T2","Breast Invasive Lobular Carcinoma","-18371","35",NA,"0","158","11.43","DiseaseFree","Positive","60-69%","NOT HISPANIC OR LATINO","0.2364",NA,"10-19%","2","Infiltrating Lobular Carcinoma","No","Equivocal","2013","Post (prior bilateral ovariectomy OR >12 mo since LMP with no prior hysterectomy)",NA,"11.43","LIVING","Left|Left Upper Outer Quadrant","Positive","70-79%","WHITE","Primary","Female","TUMOR FREE","-0.7593","0.5282","-0.2688","0.2040","-1.5453","-0.3216","-0.4157","-1.0018","0.8592","-0.3453","0.1656" "TCGA-5L-AAT0-01","TCGA-5L-AAT0","01","C","2","42","M0","N0","Stage IIA","T2","Breast Invasive Lobular Carcinoma","-15393","1320",NA,"0","1477","48.52","DiseaseFree","Positive","70-79%","HISPANIC OR LATINO","0.0702",NA,NA,"1","Infiltrating Lobular Carcinoma","Yes","Negative","2010","Post (prior bilateral ovariectomy OR >12 mo since LMP with no prior hysterectomy)",NA,"48.52","LIVING","Right|Right Lower Outer Quadrant","Positive","50-59%","WHITE","Primary","Female","TUMOR FREE","-0.9428","-0.4424","0.9949","0.3311","-0.4513","-0.6734","-0.0338","-1.1991","-0.7001","-0.0123","-0.6567" "TCGA-5L-AAT1-01","TCGA-5L-AAT1","01","C","4","63","M1","N0","Stage IV","T2","Breast Invasive Lobular Carcinoma","-23225","1259",NA,"0","1471","48.32","DiseaseFree","Positive","80-89%","HISPANIC OR LATINO","0.0798",NA,NA,"2","Infiltrating Lobular Carcinoma","Yes","Equivocal","2010","Post (prior bilateral ovariectomy OR >12 mo since LMP with no prior hysterectomy)","Bone","48.32","LIVING","Left","Positive","10-19%","WHITE","Primary","Female","WITH TUMOR","-0.5366","-1.0874","2.0167","1.7459","-1.0919","0.2976","0.6149","-1.3837","0.1618","-0.6376","-0.1869" "TCGA-5T-A9QA-01","TCGA-5T-A9QA","01","C","2","52","MX","NX","Stage IIA","T2","Breast Invasive Ductal Carcinoma","-19031","31",NA,"0","12","9.95","DiseaseFree","Positive","70-79%","NOT HISPANIC OR LATINO","0.4133","Negative","10-19%","2","Other, specify","No","Equivocal","2013",NA,NA,"9.95","LIVING","Left","Negative",NA,"BLACK OR AFRICAN AMERICAN","Primary","Female",NA,"0.2962","-0.6337","-0.8979","1.2512","-1.5282","-0.9661","-0.8398","-0.6190","-0.9522","-1.7920","-1.0734" "TCGA-A1-A0SB-01","TCGA-A1-A0SB","01","C","1","70","M0","N0","Stage I","T1c","Adenoid Cystic Breast Cancer","-25833","764",NA,"0","259","8.51","DiseaseFree","Positive","70-79%","NOT HISPANIC OR LATINO","8e-04",NA,NA,NA,"Other, specify","No","Negative","2008","Post (prior bilateral ovariectomy OR >12 mo since LMP with no prior hysterectomy)",NA,"8.51","LIVING","Left","Negative",NA,"WHITE","Primary","Female","TUMOR FREE","0.2204","2.6328","-1.1748","-0.9635","0.8227","-0.1755","-0.8893","-0.0589","0.6404","2.4596","2.2202" "TCGA-A1-A0SD-01","TCGA-A1-A0SD","01","C","2","59","M0","N0","Stage IIA","T2","Breast Invasive Ductal Carcinoma","-21793","1697",NA,"0","437","14.36","DiseaseFree","Positive","90-99%","NOT HISPANIC OR LATINO","0.2474",NA,NA,NA,"Infiltrating Ductal Carcinoma","No","Negative","2005",NA,NA,"14.36","LIVING","Left","Positive","90-99%","WHITE","Primary","Female",NA,"0.3694","1.2054","-0.2017","-0.7820","-0.2027","0.7166","-0.5284","0.8973","0.6958","-0.9287","-0.4558" "TCGA-A1-A0SE-01","TCGA-A1-A0SE","01","C","1","56","M0","N0 (i-)","Stage I","T1c","Breast Mixed Ductal and Lobular Carcinoma","-20717","1672",NA,"0","1321","43.4","DiseaseFree","Positive","80-89%","NOT HISPANIC OR LATINO","0.2134","Negative",NA,"1","Mixed Histology (please specify)","No","Negative","2005","Pre (<6 months="" since="" lmp="" and="" no="" prior="" bilateral="" ovariectomy="" and="" not="" on="" estrogen="" replacement)",na,"43.4","living","left="" upper="" outer="" quadrant","positive","90-99%","white","primary","female","tumor="" free","1.8649","0.1701","-0.6978","0.9491","1.2939","-0.7610","-0.6581","0.6584","-0.2906","1.5437","-0.2752"="" "tcga-a1-a0sf-01","tcga-a1-a0sf","01","c","2","54","m0","n0","stage="" iia","t2","breast="" invasive="" ductal="" carcinoma","-19731","1648",na,"0","1463","48.06","diseasefree","positive","90-99%","not="" hispanic="" or="" latino","0.2015",na,na,na,"infiltrating="" ductal="" carcinoma","no","negative","2006","pre="">6><6 months="" since="" lmp="" and="" no="" prior="" bilateral="" ovariectomy="" and="" not="" on="" estrogen="" replacement)",na,"48.06","living","left","positive","90-99%","white","primary","female","tumor="" free","0.2632","-0.6611","-0.3641","-0.2014","-0.6814","-0.5363","-0.3075","-0.9419","-0.6697","-0.3586","-0.4173"="" "tcga-a1-a0sg-01","tcga-a1-a0sg","01","c","2","61","m0","n1a","stage="" iib","t2","invasive="" breast="" carcinoma","-22380","1581",na,"0","434","14.26","diseasefree","positive","90-99%","not="" hispanic="" or="" latino","0.0969",na,na,na,"other,="" specify","no","negative","2006","post="" (prior="" bilateral="" ovariectomy="" or="">12 mo since LMP with no prior hysterectomy)",NA,"14.26","LIVING","Right","Positive","90-99%","WHITE","Primary","Female",NA,"-0.1346","-0.6149","-0.5361","1.0430","0.5070","0.0674","-0.3274","-0.0372","-0.4914","-0.2831","-0.8145" "TCGA-A1-A0SH-01","TCGA-A1-A0SH","01","C","2","39","M0","N0 (i-)","Stage IIA","T2","Breast Invasive Ductal Carcinoma","-14595","1305",NA,"0","1437","47.21","DiseaseFree","Negative",NA,"NOT HISPANIC OR LATINO","0.1789","Negative",NA,"2","Infiltrating Ductal Carcinoma","No","Equivocal","2006","Pre (<6 months="" since="" lmp="" and="" no="" prior="" bilateral="" ovariectomy="" and="" not="" on="" estrogen="" replacement)",na,"47.21","living","left="" upper="" inner="" quadrant","positive","90-99%","white","primary","female","tumor="" free","0.0217","0.2793","-0.4688","-0.0671","2.1251","-0.8735","-0.1567","1.3566","0.6122","0.4163","0.0555"="" "tcga-a1-a0si-01","tcga-a1-a0si","01","c","2","52","m0","n1a","stage="" iib","t2","breast="" invasive="" ductal="" carcinoma","-19250","1267",na,"0","635","20.86","diseasefree","positive","50-59%","not="" hispanic="" or="" latino","0.3313",na,na,na,"infiltrating="" ductal="" carcinoma","no","negative","2007",na,na,"20.86","living","right","positive","10-19%","white","primary","female","tumor="" free","-0.1307","-0.2494","-0.3230","-0.4739","-0.4776","-0.3999","0.0934","0.0707","-0.5371","0.1495","-0.5414"="" "tcga-a1-a0sj-01","tcga-a1-a0sj","01","c","3","39","m0","n1a","stage="" iiia","t3","breast="" invasive="" ductal="" carcinoma","-14383","1399",na,"0","416","13.67","diseasefree","positive","70-79%","not="" hispanic="" or="" latino","0.46","negative",na,"2","infiltrating="" ductal="" carcinoma","no","equivocal","2006",na,na,"13.67","living","left","positive","10-19%","black="" or="" african="" american","primary","female","tumor="" free","0.7440","-0.0381","-0.5955","-1.0935","0.5130","0.3843","-0.6697","1.0343","0.4641","-0.2471","0.3303"="" "tcga-a1-a0sk-01","tcga-a1-a0sk","01","c","2","54","m0","n0="" (i-)","stage="" iia","t2","invasive="" breast="" carcinoma","-20048","1164","967","0",na,na,na,"negative",na,"not="" hispanic="" or="" latino","0.4163",na,na,"0","other,="" specify","no","negative","2007","indeterminate="" (neither="" pre="" or="" postmenopausal)",na,"31.77","deceased","right="" upper="" outer="" quadrant","negative",na,"asian","primary","female",na,"-0.8968","7.1119","-1.2353","-1.9544","-1.9080","1.0988","-0.8968","0.4063","7.9363","0.7024","9.9796"="" "tcga-a1-a0sm-01","tcga-a1-a0sm","01","c","2","77","m0","n0="" (i-)","stage="" iia","t2","breast="" invasive="" ductal="" carcinoma","-28198","1115",na,"0","242","7.95","diseasefree","positive","20-29%","not="" hispanic="" or="" latino","0.1807","positive",na,"3","infiltrating="" ductal="" carcinoma","no","positive","2007",na,na,"7.95","living","left","negative",na,"white","primary","male","tumor="" free","-0.0499","1.4591","-0.2197","-0.3564","1.4313","0.0360","-0.3689","0.7793","-0.7286","-1.1166","-0.4732"="" "tcga-a1-a0sn-01","tcga-a1-a0sn","01","c","2","50","mx","n1","stage="" iia","t1c","breast="" invasive="" ductal="" carcinoma","-18401","1091",na,"0","1196","39.29","diseasefree","positive","90-99%","not="" hispanic="" or="" latino","0.5518",na,na,na,"infiltrating="" ductal="" carcinoma","no","positive","2007","post="" (prior="" bilateral="" ovariectomy="" or="">12 mo since LMP with no prior hysterectomy)",NA,"39.29","LIVING","Left","Positive","60-69%","WHITE","Primary","Female","TUMOR FREE","-0.2034","-0.7616","-0.2425","-1.0514","0.8993","-0.3103","0.3351","-0.4011","-0.7831","-0.6539","-0.3999"6>6>10%","black>10%","not>