In this final project, we will explore a dataset provided by the New York City Department of Education. Of particular interest is whether characteristics of NYC middle schools predict admission to one...

In this final project, we will explore a dataset provided by the New York City Department of Education. Of particular interest is whether characteristics of NYC middle schools predict admission to one of 8 highly selective public high schools (Stuyvesant, Bronx High School of Science, etc.) in New York (from now on called HSPHS). Admission to these schools is contingent on applying AND scoring sufficiently highly on the Specialized High Schools Admissions Test (SHSAT), an independently produced and anonymously graded standardized test. The Dataset: The dataset (‘middleSchoolData.csv’) contains data from all 594 NYC middle schools, including 485 public schools and 109 charter schools (in the last 109 rows) from a randomly picked year in the past 5 years. Each row of the dataset represents a particular school, so the unit of analysis is “school”. Here is what the columns represent: A - B: NYC DOE school code and name, respectively C: Number of applications to HSPHS originating from this school D: Number of applicants to HSPHS accepted from this school E: Per student spending, in $ F: Average class size G-K: Self-described ethnic identity of the student body L-Q: Average rating of “school climate” factors as perceived by the students, e.g. trust R: Percentage of students who have been evaluated as disabled S: Percentage of students living in households below the poverty line T: Percentage of ESL students U: School size (Number of students in the entire school) V: Average student achievement on a state-wide standardized test W-X: Proportion of students exceeding state-wide expectations in reading and math This dataset is comprehensive, but some data is missing. If data in a cell is missing, you have to handle (clean or impute) it, in order to do the analyses. Sometimes, data is missing systematically. For instance, data for columns E and F is missing for all charter schools. Format: The project is comprised of your answers to 10 questions. Each answer should ideally include some paragraph of text (describing what you did and what you found), a figure that illustrates the findings and some numbers (e.g. test statistics or p-values). Please save it as a word, pdf or pages document. This document should be 4-6 pages long (arbitrary font size and margins). ~half a page per question is reasonable. In addition, open your document with a brief statement as to how you handled dimension reduction, data cleaning and data transformation, as this will apply to all answers. Make sure to include your name. Academic integrity: You are expected to do this project by yourself, individually so that we are able to determine a grade for you. There are enough degrees of freedom (e.g. how to clean the data, what variables to compare, aesthetic choices in the figures, etc.) that no two reports will be alike. We’ll be on the lookout for suspicious similarities, so please refrain from collaborating.
May 22, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here