Report for categorical data, the instructions are in the Final report guidelines file, also a sample of a similar report but I used different variables back then
First draft: Family Factors Impacts on Student’s Academic Performance Introduction Conventionally, students link high performance to the quality of their academic progression. This is in the sense that the standards of a student in real life situation are dignified by the quality of knowledge acquired. However, the attributes of home life are believed to having an impact on a student's academic performance. For example, behavioral factors of the parents can significantly influence the performance of a learner. A parent who is a drunkard may hurt student's academic performance. Besides, the socioeconomic status of the family may also have some impacts on the student's learning process Research questions/ problem The research paper report described herein entails an important issue that affects our society. The research problem and question in this case is How Family Life Affects Academic Performance. This question is an important tool for the research because academic performance of learners is a key concern to every stakeholders not only the government but also the family of the learners. Family is the root for every person and it is a fact that a person takes the largest share of their academic life with his/her family. This problem statement was preferred due to available data source, which can be used to test for hypothesis Background and Literature review Family based factors have resulted into low academic performance particularly during national exams. Some of these factors are the hard economic times that make most family to fail to meet their responsibilities such as providing basic needs for a healthy and literate family. Firstly, the family size where a child is raised will affect his or her academic performance because of lack of quality medical care and feeding, in addition the parent will not be able to provide the required attention to the child making him or her to lose focus on academics. Other items that are within family to provide to children in their academic performance include payment of fees, provision of security and concern on the homework. Research has it that education is the key item one need to realize both economic development as well as the human development. Society needs nothing less but education for it to thrive and record growth in all the aspects (Anonymous, 2017). Nationally, education is important as it contribute to national development whereby personnel with knowledge help the country to promote productivity including eliminating poverty, ignorance, diseases, and hunger from the citizens. Education also is regarded as it liberates society from socio-political problem. Many researches have proved that there is an effect of nature of family background to a child academic performance. This play a role in almost all levels of the education. Achievement and motivation of a child depends on the quality of family interaction to the child. Some literature shows that while comparing the stresses of children from low socio economic status with those from the higher socio- economic status, those from low socio economic status suffers more with poor academic performance (Egalite, 2016). Parents that were raised and are still in low socio economic status will not be able to provide financial resources to their kids and will ultimately have no time to support them in their academic works such as homework and home assignment. Data Collection and variables Discussed early in the problem statement, this research is one that the sources and collection of data to be analyzed is much easier. The data collection will involve selecting a school randomly and forming a questionnaire with the following variables; type of student home address either binary, rural or urban, family size, parental status, levels of both mother and father education, type of the parent’s job, family education support, and quality of family relationship. Lastly, the last data will be on the children final grade/ performance. (Cortez, 2014)Questionnaire will be in table format where grading will be given in various ranges. Data Pre-processing Our dataset did not require much in terms of cleaning. There were no missing values, and no points that were obvious outliers or seemed to be mistakes from the data collecting process. Summary statistics of our raw data are shown in Table 1. Table 1 Variable Min Mean Max Standard Deviation n Urban Address a 77.72% 395 Family Size > 3 b 71.14% 395 Parents Together c 89.62% 395 Mother's Education d 0 2.749 4 1.095 395 Father's Education d 0 2.522 4 1.088 395 Family Education Support e 61.27% 395 Family Relationships f 1.00 3.94 5.00 0.90 395 Final Math Score 0 10.42 20 4.581 395 aOmitted category is Rural Address: 22.28%. bOmitted category is Family Size ≤ 3: 28.86%. cOmitted category is Parents Apart: 10.38%. dMeasured on a scale of 0 to 4: 0 = no education, 1 = through 4th grade, 2 = 5th through 9th grade, 3 = 10th through 12th grade, 4 = higher education. eOmitted category is No Family Education Support: 38.73%. f Measured on a scale of 1 to 5: 1 = very bad, 5 = excellent. We next created dummy variables for all of our binary inputs (Table 2). We also followed the precedent set by existing literature and re-coded our raw Final Math Score variable as ordered categorical data in the manner of the standard Portuguese grading system. A breakdown of the system and the results of our recoding are shown in Table 3. Table 2 Variable Dummy Values Address 0 = Rural 1 = Urban Family Size 0 = Less than or equal to 3 1 = Greater than 3 Parent’s Status 0 = Apart 1 = Together Family Education Support 0 = No 1 = Yes Table 3 Category Raw Score Range n Fail 0-9 130 Sufficient 10-11 103 Satisfactory 12-13 62 Good 14-15 60 Excellent 16-20 40 Graphing the distribution of the raw final math score did reveal that is it not normally distributed. One could argue a skewed normal distribution from about a grade of 4 to a grade of 20, but there were no students who scored between 1 and 3, and then a spike of students who scored a zero. Since the number of students scoring a zero was relatively high (38 students, ~9.6% of our total sample) we could not justify saying that these scores were the result of a data collection error or that they are outliers, and thus we left them in our data set. We next examined our inputs variables to see if there were any strong correlations or possible collinearities. A correlation matrix revealed a correlation coefficient of 0.623 between the mother’s level of education and the father’s level of education. Using a graphical visualization we could see a positive relationship between the two variables. To bypass the issue of collinearity in our models, we will use a new variable that averages the mother and father’s education levels into a single variable, referred to as parent’s education. We also decided to investigate the relationship between family size and the status of the parent’s cohabitation (together or apart), since it seemed that they two variable were potentially reporting similar information and therefore wouldn’t be independent, despite having a correlation coefficient of only 0.150. Running a Chi-squared test on the cross-tabulation of these two variables yielded a p-value of 0.005, indicating that they are indeed not independent. The variables are similar but since they represent different information we will not drop one entirely, but will instead try all of our models using only one at a time. Preliminary Data Analysis Due to the nature of our output variable, the final math score, we could choose to treat the data collected as numerical, using the raw score; or categorical, using the ordered buckets shown in Table 3. We found that using the raw scores to create a linear model resulted in a very poor fit (adjusted R-squared: 0.05) and only the parent’s education variable was a significant input. We ran a few diagnostic tests to further check the suitability of a linear regression model. We found that the distribution of the residuals was approximately normal, but with an apparent tail on the left side. A qqPlot confirmed that our residuals were not normally distributed. However, the Breusch-Pagan test and the Durbin-Watson test did not indicate issues with heteroskedasticity or autocorrelation, respectively. Since previous research suggested that a linear regression was not the most appropriate approach for this type of data, we next considered our data set as a classification problem, i.e. using the buckets defined in Table 3. Since our goal includes identifying which input variables have the strongest effect on the output, we decided not to use a KNN fit, since it would tell us nothing about our inputs. We next tested the assumption of both logistic regression and determinant analysis methods that we are sampling from a multivariate gaussian distribution. Unfortunately, both a Mardia and a Henze-Zirklers’s multivariate normality test returned negative results, so we cannot make the necessary assumption of multivariate normality. Limitations and Future work #The data collected already has a few limitations that could create confusion when trying to draw conclusions. #We would like to use the variable famsize to indicate if the student is an only child. #However a single parent with two children is also a family of three. #It is unclear what is meant by the variable famsup #We are taking it to meant that the student is receiving educational support from their family, but we cannot be sure if this means monetary, emotional, or academic support. Since the modelling methods tested above do not seem suitable to our data, we intend to utilize decision trees in our future analysis. We feel that it suits our data due to our relatively high number of binary inputs and ordered inputs with few levels. Anonymous. (2017). The Family EffLimitationsect on Academic Performance in School. A Case Study of selected Schools in Kabale District. Kabale: Atlantic International University (Education Foundations) . Cortez, P. (2014, November 27). Student Performance Data Set. Retrieved from UCI: https://archive.ics.uci.edu/ml/datasets/student+performance Egalite, A. (2016). How Family Background Influences Student Achievement. EducationNext, 1-8. Retrieved from https://www.educationnext.org/how-family-background-influences-student-achievement/ References Muñoz, K., Olson, W. A., Twohig, M. P., Preston, E., Blaiser, K., & White, K. R. (2015). Pediatric hearing aid use: Parent-reported challenges. Ear and Hearing, 36(2), 279-287. Mekonnen, M. A. (2017). Effects of family educational background, dwelling and parenting style on students' academic achievement: The case of secondary schools in Bahir Dar. Educational Research and Reviews, 12(18), 939-949. Olaitan, A. W. (2017). IMPACT OF FAMILY STRUCTURE ON THE ACADEMIC PERFORMANCE OF SECONDARY SCHOOL STUDENTS IN YEWA LOCAL GOVERNMENT AREA OF OGUN STATE, NIGERIA. International Journal of Sociology and Anthropology