SIT718 Real World Analytics Assessment Task 3: Problem Solving ã Deakin University 1 FutureLearn Using aggregation functions for data analysis This document supplies detailed information on assessment tasks for this unit. Key information Due: Friday 3rd May 2019 11.30 pm AEST Weighting: 30% Reference style: Harvard Learning Outcomes This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning Outcomes (GLO): Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO) ULO1 – assessed through student ability to apply knowledge of multivariate functions, data transformations and data distributions to summarise data sets. ULO2 – assessed through the student ability to analyse datasets by interpreting summary statistics, model and function parameters. ULO4 - assessed through student ability to develop software codes to solve computational problems for real world analytics. GLO1 - Discipline knowledge and capabilities GLO4 – Critical thinking GLO5 – Problem solving Purpose This assignment will test your knowledge and understanding of the aggregation functions and their applications for data summarization and prediction. This assignment will also test your ability in R programming, in using specific R commands as well as R packages. Instructions The work is individual. Solutions and answers to the assignment must be explained carefully in a concise manner and presented carefully. Use of books, articles and/or online resources on share price related to SIT718 Real World Analytics is allowed. Students are expected to refer to the suitable literature where appropriate. The assessment consists of FOUR tasks. Students must attempt all tasks and provide an individual written report in appropriate word processor. The detailed problem description and data set will be released to students on Friday 5th Apri ASSESSMENT DETAILS ã Deakin University 2 FutureLearn Submission details • No more than 7 A4 sides, including Figures, Tables, Appendices and References. The report should be typed. Use minimal font 11pt and 2.5cm side margins. If the page limit is exceeded only the first 7 pages will be marked. • Assignment (a report in pdf format, software code and/or data) must be submitted via the assignment folder in the unit site (accessed via the unit Program page) • No e-mail or hardcopy submissions are accepted. Extension requests Requests for extensions should be made to Unit/Campus Chairs well in advance of the assessment due date. If you wish to seek an extension for an assignment, you will need to apply by email directly to Prof. Maia Angelova(
[email protected]), as soon as you become aware that you will have difficulty in meeting the scheduled deadline, but at least 3 days before the due date. When you make your request, you must include appropriate documentation (medical certificate, death notice) and a copy of your draft assignment. Conditions under which an extension will normally be approved include: Medical To cover medical conditions of a serious nature, e.g. hospitalisation, serious injury or chronic illness. Note: Temporary minor ailments such as headaches, colds and minor gastric upsets are not serious medical conditions and are unlikely to be accepted. However, serious cases of these may be considered. Compassionate e.g. death of close family member, significant family and relationship problems. Hardship/Trauma e.g. sudden loss or gain of employment, severe disruption to domestic arrangements, victim of crime. Note: Misreading the timetable, exam anxiety or returning home will not be accepted as grounds for consideration. Special consideration You may be eligible for special consideration if circumstances beyond your control prevent you from undertaking or completing an assessment task at the scheduled time. See the following link for advice on the application process: http://www.deakin.edu.au/students/studying/assessment-and-results/special-consideration Assessment feedback Students will receive written feedback to aid reflection and analysis of problem strategies and solutions for consideration in the upcoming problem-solving task. Referencing You must correctly use the Harvard method in this assessment. See the Deakin referencing guide. Academic integrity, plagiarism and collusion Plagiarism and collusion constitute extremely serious breaches of academic integrity. They are forms of cheating, and severe penalties are associated with them, including cancellation of marks for a specific assignment, for a specific unit or even exclusion from the course. If you are ever in doubt about how to properly use and cite a source of information refer to the referencing site above. ASSESSMENT DETAILS ã Deakin University 3 FutureLearn Plagiarism occurs when a student passes off as the student’s own work, or copies without acknowledgement as to its authorship, the work of any other person or resubmits their own work from a previous assessment task. Collusion occurs when a student obtains the agreement of another person for a fraudulent purpose, with the intent of obtaining an advantage in submitting an assignment or other work. Work submitted may be reproduced and/or communicated by the university for the purpose of assuring academic integrity of submissions: https://www.deakin.edu.au/students/studysupport/referencing/academic-integrity Total Marks 100, Weighting 30% Energy Prediction of Domestic Appliances Dataset The given dataset, "Energy19.txt", can be used to create models of energy use of appliances in a energy-efficient house. The dataset provides the Energy use of appliances (denoted as Y) using 671 samples. It is a modified version of data used in the study [1]. The dataset includes 5 variables, denoted as X1, X2, X3, X4, X5, and Y, described as follows: X1: Temperature in kitchen area, in Celsius X2: Humidity in kitchen area, given as a percentage X3: Temperature outside (from weather station), in Celsius X4: Humidity outside (from weather station), given as a percentage X5: Visibility (from weather station), in km Y: Energy use of appliances, in Wh Assignment Tasks 1. Understand the data [20 marks] (i) Download the txt file (Energy19.txt) from Future Learn and save it to your R working directory. (ii) Assign the data to a matrix, e.g. using the.data <- as.matrix(read.table("energy19.txt="" "))="" (iii)="" the="" variable="" of="" interest="" is="" energy="" use="" of="" appliances="" (y).="" to="" investigate="" y,="" generate="" a="" subset="" of="" 300="" data,="" e.g.="" using:="" my.data="">-><- the.data[sample(1:671,300),c(1:6)] using aggregation functions for data analysis downloadsit718_assessment-task_3-t1_2019-data and script.zip it contains the data file [energy19.txt] and the r code [aggwafit718.r ] to use with the following tasks, include these in your r working directory. (iv) using scatter plots and histograms, report on the general relationship between each of the variables x1, x2, x3, x4, x5 and the variable of interest y. include 5 scatter plots, 6 histograms, and 1 or 2 sentences for each of the variables, including the variable of interest y. ã deakin university 2. transform the data [20 marks] (i) choose any four from the five variables (x1, x2,..,x5). make appropriate transformations to the chosen four variables and the variable of interest y so that the values can be aggregated in order to predict the variable of interest. assign your transformed data along with your transformed variable of interest to an array (it should be 300 rows and 5 columns). save it to a txt file titled "nametransformed.txt" using write.table(your.data,"name-transformed.txt") where “name” is replaced with your name - you can use your surname or first name. (ii) briefly explain the transformations applied for the selected four variables and the variable of interest. (1- 2 sentences each) 3. build models and investigate the importance of each variable [40 marks] (i) download the aggwafit718.r file (from future learn) to your working directory and load into the r workspace using, source("aggwafit718.r") (ii) use the fitting functions to learn the parameters for • a weighted arithmetic mean (wam) • weighted power means (wpm) with p = 0.5, and p = 2, • an ordered weighted averaging function (owa), and • a choquet integral. (iii) include two tables in your report - one with the error measures and correlation coefficients, and one summarising the weights/parameters and any other useful information learned for your data. (iv) compare and interpret the data in your tables. comment on a. how good the model is, b. the importance of each of the variables (the four variables that you have selected), c. any interaction between any of those variables (are they complementary or redundant?) and d. better models favour higher or lower inputs. (1-3 paragraphs for part 3(iv)) futurelearn 4. use your model for prediction [20 marks] (i) choose your best fitting model. using your best fitting model, predict the energy use of appliances for the following input x1=18; x2=44; x3=4; x4=74.8; x5=31.4. (ii) give your result and comment on whether you think it is reasonable. (1-2 sentences). (iii) comment on the best conditions (in terms of your chosen four variables) under which a low energy use of appliances will occur. (1-2 sentences). submit to the sit718 clouddeakin dropbox. your final submission should include the following three files: 1. a report, "name-report.pdf", in pdf format (created in any word processor), covering all of the items in above (where “name” is replaced with your name -you can use your surname or first name). with plots and tables it should be up to 7 pages. 2. a data file named "name-transformed.txt" - just to help us distinguish them!). 3. the r code file (that you have written to produce your results) named "namecode.r" (where “name” is replaced with your name - you can use your surname or first name). references: 1. luis m. candanedo, veronique feldheim, dominique deramaix. data driven prediction models of energy use of appliances in a low-energy house, energy and buildings, volume 140, 1 april 2017, pages 81-97, issn 0378-7788. http://archive.ics.uci.edu/ml/datasets/appliances+energy+prediction ã deakin university 3 f the.data[sample(1:671,300),c(1:6)]="" using="" aggregation="" functions="" for="" data="" analysis="" downloadsit718_assessment-task_3-t1_2019-data="" and="" script.zip="" it="" contains="" the="" data="" file="" [energy19.txt]="" and="" the="" r="" code="" [aggwafit718.r="" ]="" to="" use="" with="" the="" following="" tasks,="" include="" these="" in="" your="" r="" working="" directory.="" (iv)="" using="" scatter="" plots="" and="" histograms,="" report="" on="" the="" general="" relationship="" between="" each="" of="" the="" variables="" x1,="" x2,="" x3,="" x4,="" x5="" and="" the="" variable="" of="" interest="" y.="" include="" 5="" scatter="" plots,="" 6="" histograms,="" and="" 1="" or="" 2="" sentences="" for="" each="" of="" the="" variables,="" including="" the="" variable="" of="" interest="" y.="" ã="" deakin="" university="" 2.="" transform="" the="" data="" [20="" marks]="" (i)="" choose="" any="" four="" from="" the="" five="" variables="" (x1,="" x2,..,x5).="" make="" appropriate="" transformations="" to="" the="" chosen="" four="" variables="" and="" the="" variable="" of="" interest="" y="" so="" that="" the="" values="" can="" be="" aggregated="" in="" order="" to="" predict="" the="" variable="" of="" interest.="" assign="" your="" transformed="" data="" along="" with="" your="" transformed="" variable="" of="" interest="" to="" an="" array="" (it="" should="" be="" 300="" rows="" and="" 5="" columns).="" save="" it="" to="" a="" txt="" file="" titled="" "nametransformed.txt"="" using="" write.table(your.data,"name-transformed.txt")="" where="" “name”="" is="" replaced="" with="" your="" name="" -="" you="" can="" use="" your="" surname="" or="" first="" name.="" (ii)="" briefly="" explain="" the="" transformations="" applied="" for="" the="" selected="" four="" variables="" and="" the="" variable="" of="" interest.="" (1-="" 2="" sentences="" each)="" 3.="" build="" models="" and="" investigate="" the="" importance="" of="" each="" variable="" [40="" marks]="" (i)="" download="" the="" aggwafit718.r="" file="" (from="" future="" learn)="" to="" your="" working="" directory="" and="" load="" into="" the="" r="" workspace="" using,="" source("aggwafit718.r")="" (ii)="" use="" the="" fitting="" functions="" to="" learn="" the="" parameters="" for="" •="" a="" weighted="" arithmetic="" mean="" (wam)="" •="" weighted="" power="" means="" (wpm)="" with="" p="0.5," and="" p="2," •="" an="" ordered="" weighted="" averaging="" function="" (owa),="" and="" •="" a="" choquet="" integral.="" (iii)="" include="" two="" tables="" in="" your="" report="" -="" one="" with="" the="" error="" measures="" and="" correlation="" coefficients,="" and="" one="" summarising="" the="" weights/parameters="" and="" any="" other="" useful="" information="" learned="" for="" your="" data.="" (iv)="" compare="" and="" interpret="" the="" data="" in="" your="" tables.="" comment="" on="" a.="" how="" good="" the="" model="" is,="" b.="" the="" importance="" of="" each="" of="" the="" variables="" (the="" four="" variables="" that="" you="" have="" selected),="" c.="" any="" interaction="" between="" any="" of="" those="" variables="" (are="" they="" complementary="" or="" redundant?)="" and="" d.="" better="" models="" favour="" higher="" or="" lower="" inputs.="" (1-3="" paragraphs="" for="" part="" 3(iv))="" futurelearn="" 4.="" use="" your="" model="" for="" prediction="" [20="" marks]="" (i)="" choose="" your="" best="" fitting="" model.="" using="" your="" best="" fitting="" model,="" predict="" the="" energy="" use="" of="" appliances="" for="" the="" following="" input="" x1="18;" x2="44;" x3="4;" x4="74.8;" x5="31.4." (ii)="" give="" your="" result="" and="" comment="" on="" whether="" you="" think="" it="" is="" reasonable.="" (1-2="" sentences).="" (iii)="" comment="" on="" the="" best="" conditions="" (in="" terms="" of="" your="" chosen="" four="" variables)="" under="" which="" a="" low="" energy="" use="" of="" appliances="" will="" occur.="" (1-2="" sentences).="" submit="" to="" the="" sit718="" clouddeakin="" dropbox.="" your="" final="" submission="" should="" include="" the="" following="" three="" files:="" 1.="" a="" report,="" "name-report.pdf",="" in="" pdf="" format="" (created="" in="" any="" word="" processor),="" covering="" all="" of="" the="" items="" in="" above="" (where="" “name”="" is="" replaced="" with="" your="" name="" -you="" can="" use="" your="" surname="" or="" first="" name).="" with="" plots="" and="" tables="" it="" should="" be="" up="" to="" 7="" pages.="" 2.="" a="" data="" file="" named="" "name-transformed.txt"="" -="" just="" to="" help="" us="" distinguish="" them!).="" 3.="" the="" r="" code="" file="" (that="" you="" have="" written="" to="" produce="" your="" results)="" named="" "namecode.r"="" (where="" “name”="" is="" replaced="" with="" your="" name="" -="" you="" can="" use="" your="" surname="" or="" first="" name).="" references:="" 1.="" luis="" m.="" candanedo,="" veronique="" feldheim,="" dominique="" deramaix.="" data="" driven="" prediction="" models="" of="" energy="" use="" of="" appliances="" in="" a="" low-energy="" house,="" energy="" and="" buildings,="" volume="" 140,="" 1="" april="" 2017,="" pages="" 81-97,="" issn="" 0378-7788.="" http://archive.ics.uci.edu/ml/datasets/appliances+energy+prediction="" ã="" deakin="" university="" 3="">- the.data[sample(1:671,300),c(1:6)] using aggregation functions for data analysis downloadsit718_assessment-task_3-t1_2019-data and script.zip it contains the data file [energy19.txt] and the r code [aggwafit718.r ] to use with the following tasks, include these in your r working directory. (iv) using scatter plots and histograms, report on the general relationship between each of the variables x1, x2, x3, x4, x5 and the variable of interest y. include 5 scatter plots, 6 histograms, and 1 or 2 sentences for each of the variables, including the variable of interest y. ã deakin university 2. transform the data [20 marks] (i) choose any four from the five variables (x1, x2,..,x5). make appropriate transformations to the chosen four variables and the variable of interest y so that the values can be aggregated in order to predict the variable of interest. assign your transformed data along with your transformed variable of interest to an array (it should be 300 rows and 5 columns). save it to a txt file titled "nametransformed.txt" using write.table(your.data,"name-transformed.txt") where “name” is replaced with your name - you can use your surname or first name. (ii) briefly explain the transformations applied for the selected four variables and the variable of interest. (1- 2 sentences each) 3. build models and investigate the importance of each variable [40 marks] (i) download the aggwafit718.r file (from future learn) to your working directory and load into the r workspace using, source("aggwafit718.r") (ii) use the fitting functions to learn the parameters for • a weighted arithmetic mean (wam) • weighted power means (wpm) with p = 0.5, and p = 2, • an ordered weighted averaging function (owa), and • a choquet integral. (iii) include two tables in your report - one with the error measures and correlation coefficients, and one summarising the weights/parameters and any other useful information learned for your data. (iv) compare and interpret the data in your tables. comment on a. how good the model is, b. the importance of each of the variables (the four variables that you have selected), c. any interaction between any of those variables (are they complementary or redundant?) and d. better models favour higher or lower inputs. (1-3 paragraphs for part 3(iv)) futurelearn 4. use your model for prediction [20 marks] (i) choose your best fitting model. using your best fitting model, predict the energy use of appliances for the following input x1=18; x2=44; x3=4; x4=74.8; x5=31.4. (ii) give your result and comment on whether you think it is reasonable. (1-2 sentences). (iii) comment on the best conditions (in terms of your chosen four variables) under which a low energy use of appliances will occur. (1-2 sentences). submit to the sit718 clouddeakin dropbox. your final submission should include the following three files: 1. a report, "name-report.pdf", in pdf format (created in any word processor), covering all of the items in above (where “name” is replaced with your name -you can use your surname or first name). with plots and tables it should be up to 7 pages. 2. a data file named "name-transformed.txt" - just to help us distinguish them!). 3. the r code file (that you have written to produce your results) named "namecode.r" (where “name” is replaced with your name - you can use your surname or first name). references: 1. luis m. candanedo, veronique feldheim, dominique deramaix. data driven prediction models of energy use of appliances in a low-energy house, energy and buildings, volume 140, 1 april 2017, pages 81-97, issn 0378-7788. http://archive.ics.uci.edu/ml/datasets/appliances+energy+prediction ã deakin university 3 f>