How do you solve these questions in the file using MATLAB
ENG3104 Engineering Simulations and Computations Semester 2, 2018 Assessment: Assignment 2 Due: 8 October 2018 (deadline is two weeks after date in course spec) Marks: 400 Value: 40% 1 (worth 100 marks) Introduction To do something useful with big data, models are devised from the large numbers of observations in order to predict what will occur for some other observation(s). A simple linear model1 is of the form: yi = N∑ j=1 xijaj (1) where yi is the dependent variable, i is the observation number (there are a total of M observa- tions), xij is the set of independent variables, N is the number of independent variables (for big data, M � N), and aj are the set of model coefficients. Equation (1) lends itself to a matrix formulation: Y = XA (2) The model coefficients aj are determined by measuring yi and xij . One of the dangers of developing such a model is “over-fitting” the data. This is where aj are tuned for the M observations so that aj is an excellent model for yi, i ≤ M , but is a poor model for yi, i > M . Good practice is therefore to split the M observations into a “training dataset” (with M1 observations, M1 ≥ N and typically M1 � N) and a “test dataset” (with M2 observations, M1 + M2 = M , and typically M2 < m1).="" the="" values="" of="" aj="" are="" determined="" from="" eq.="" (1)="" using="" the="" training="" dataset="" (with="" m1="" observations).="" the="" values="" can="" then="" be="" validated="" using="" the="" test="" dataset="" by="" calculating="" yi="" using="" eq.="" (1)="" and="" calculating="" the="" error="" from="" the="" actual="" values="" ŷi.="" in="" this="" question,="" you="" are="" going="" to="" apply="" this="" methodology="" to="" determine="" if="" it="" is="" possible="" to="" estimate="" the="" mean="" pressure="" for="" the="" year="" based="" on="" temperature="" readings="" from="" each="" month.="" the="" ideal="" gas="" law="" is:="" p="ρRT" (3)="" where="" p="" is="" the="" pressure,="" ρ="" is="" the="" density,="" r="" the="" ideal="" gas="" constant="" and="" t="" the="" temperature.="" your="" computational="" task="" is="" to="" use="" eq.="" (1)="" p̄i="12∑" j="1" tijaj="" (4)="" in="" the="" form="" of="" eq.="" (2)="" p="TA" (5)="" for="" the="" 9:00="" readings.="" here="" p̄i="" is="" the="" average="" pressure="" across="" all="" months="" for="" day="" i,="" tij="" is="" the="" temperature="" on="" day="" i="" and="" month="" j="" and="" aj="" is="" the="" average="" coefficient="" for="" month="" j.="" you="" will="" use="" the="" entire="" 12="" months’="" worth="" of="" data="" (n="12)" to="" calculate="" the="" average="" pressure="" for="" each="" calendar="" 1examples="" of="" non-linear="" models="" are:="" 1.="" having="" xij="" raised="" to="" some="" power="" other="" than="" 1="" 2.="" having="" xijxi(j−k)="" where="" k="" is="" some="" integer="" 3.="" having="" xij="" inside="" some="" function,="" e.g.="" lnxij="" ,="" sinxij="" 1="" eng3104="" engineering="" simulations="" and="" computations="" semester="" 2,="" 2018="" date="" (m="28" since="" there="" are="" only="" 28="" days="" in="" february),="" i.e.="" p̄1="" is="" the="" average="" pressure="" calculated="" using="" the="" 1st="" of="" july,="" 1st="" of="" august,="" 1st="" of="" september,="" etc.="" for="" your="" assignment,="" the="" following="" value="" is="" to="" be="" used:="" m2="2" +="" 2.9244="" 2="" ,="" where="" m2="" is="" to="" be="" rounded="" to="" the="" nearest="" integer.="" because="" m=""> N (we don’t have M � N), we will work with M2 ≤ N , which is not ideal, but is pragmatic, since it guarantees that M1 > N to produce statistically-good estimates of aj . Requirements For this assessment item, you must perform hand calculations using Eq. (5): 1. Calculate a1 using only the 1st of July (i.e. M = 1, N = 1). 2. Calculate a1 and a2 using only the 1st and 2nd of both July and August (i.e. M = 2, N = 2). You must also produce MATLAB code which uses Eq. (5): 3. Repeats Requirements 1 and 2. Reports and verifies the results. 4. *Successfully loads all the relevant data. 5. *Repeats Requirement 2 using the loaded data. Reports and verifies the results. 6. **Reports the value of M2 before it is rounded, to confirm the values of M1 and M2 you are to use. Calculates all the aj using the training dataset of M1 values and reports aj . 7. **Uses the test dataset of M2 values to assess the quality of the modelled values of p̄j . 8. ***The accuracy of the results is limited because the variability in the temperature and pressure data is in the 3rd or 4th significant figure, and also because we do not have big data. To remedy the problem of significant figures, the data should be normalised. The first normalisation technique to use in this circumstance is to “centre” the data in the matrix T (subtract a constant value, sometimes the mean, from all the data), which will make the variability in the 1st or 2nd significant figure. Use 15◦C to centre the temperature data, produce new aj from your training dataset and test the coefficients. See if you achieve some further numerical improvement in this case by “scaling” the data in T (non-dimensionalising, normally by dividing by the standard deviation) so that all the quantities are of the same order of magnitude2. 9. Discusses the results. 10. Has appropriate comments throughout. The projected difficulty of a Requirement is indicated by the number of * at the start. All students are expected to be able to complete Requirements which do not have an *. 2Scaling the centred data in this case may not do much, since the quantities are already of a similar order of magnitude. If you had different types of variables in T with some much bigger than others (e.g. temperature, pressure, the size of grains of sand), then scaling would vastly improve the outcome. 2 ENG3104 Engineering Simulations and Computations Semester 2, 2018 Assessment Criteria Your code will be assessed using the following scheme. Note that you are marked based on how well you perform for each category, so the correct answer determined in a basic way will receive half marks and the correct answer determined using an excellent method/code will receive full marks. Quality of hand calculations 20 marks Quality of Requirement 3 20 marks Quality of Requirement 5 10 marks Quality of Requirement 6 15 marks Quality of Requirement 7 10 marks Quality of normalisation(s) 10 marks Quality of discussion(s) 5 marks Quality of header(s) and comments 5 marks Quality of code 5 marks 3 ENG3104 Engineering Simulations and Computations Semester 2, 2018 2 (worth 100 marks) Introduction When data is being measured, it is common for there to be data missing, which could be due to a fault in the measuring equipment, or the variable being unmeasurable at that moment. In the weather data for Dalby, the maximum temperature was not recorded on 29th October 2017, presumably because not all the temperature readings were recorded for that day, so therefore it is impossible to know whether the highest recorded temperature was actually the maximum. Leaving unknown/unreliable readings blank is the best option, since inserting a value (such as zero) could be a valid value, and therefore pollutes the data (this is why my preferred option is to fill an empty slot in an array with NaN, since it is unlikely to have occurred from a calculation). If you need to be able to use a value where there is one missing, then you need to use some method of including an intelligent guess. In this question, you will use a global curve-fit to provide the guess. All of you will use T3 (the temperature measured at 3:00 pm) as the independent variable to model the maximum daily temperature, Tmax. You will also compare the outcome for this modelling to using another variable, V , as the independent variable. For your assignment, the following value is to be used: Q2 = 1.7949 . The independent variable (besides T3) you are to use is based on your value of Q2: V ≡ Tmin , Q2 ≤ 5 V ≡ T9 , Q2 > 5 where T is temperature and the subscript refers to either the daily minimum or the particular time of day. Your task is to estimate the value of Tmax on 29th October 2017 using both T3 and V as the independent variable. Requirements For this assessment item, you must perform hand calculations using Tmax and T3: 1. Take the values from 28th and 30th October 2017 and estimate the coefficients of the three standard curve-fitting functions. These data points will provide a qualitative repre- sentation of the overall trend. You must also produce MATLAB code which: 2. Repeats Requirement 1 and verifies the results. 3. *Performs curve-fits of all the data for Tmax and T3. Use the MATLAB function isfinite to filter the dataset so that only those dates with recordings of both Tmax and T3 are included. 4. Validates the three standard curve-fitting functions obtained in Requirement 3 by com- paring with the parameters obtained in Requirement 1. Given the limited data used in Requirement 1 and the overall scatter in data, don’t expect the values to be very close. 5. Determines which curve-fit is the best. 6. Demonstrates that the chosen curve-fit is the best both graphically and numerically, show- ing both the data and the relevant curve-fit. 4 ENG3104 Engineering Simulations and Computations Semester 2, 2018 7. Displays a message in the Command Window stating which type of curve-fit was chosen, stating the parameters of the curve-fit and the result of the numerical test of the curve-fit. 8. Plots the best curve-fit along with the data in a separate figure with normal-scale axes. 9. Uses the best curve-fit to estimate Tmax for 29th October 2017. 10. *Reports the value of Q2, leading to the selection of V . Repeats Requirements 3, 8 and 9 using only a linear curve-fit with V the independent variable instead of T3. Plots the curve-fit along with the data. Compares and discusses the two estimates for Tmax. 11. Has appropriate comments throughout. The projected difficulty of a Requirement is indicated by the number of * at the start. All students are expected to be able to complete Requirements which do not have an *. Assessment Criteria Your code will be assessed using the following scheme. Note that you are marked based on how well you perform for each category, so the correct answer determined in a basic way will receive half marks and the correct answer determined using an excellent method/code will receive full marks. Quality of hand calculations 20 marks Quality of determination of appropriate curve-fit 30 marks Quality of verifications/validations 10 marks Quality of reporting of curve-fit 5 marks Quality of plots (e.g. axis labels, titles) 5 marks Quality of Requirement 10 10 marks Quality of 29th October 2017 estimations 10 marks Quality of header(s) and comments 5 marks Quality of code 5 marks 5 ENG3104 Engineering Simulations and Computations Semester 2, 2018 3 (worth 100 marks) Introduction This question provides an alternative methodology to