Homework using: Files, Lists, Strings, Looping Statements, Conditional Statements, Functions, Variables, and Expressions.
Homework 7: Files CS-UY 1114 NYU Tandon Your submission for this assignment must be via Gradescope. Submit your solution to all problems in a single file named YourNetID_hw7.py Your submitted file must start with the mandatory comment header, containing your name, the class number, the date, and the homework number. Your submitted code may use only the material that we’ve covered so far this semester. Download the file colleges.csv from Classes. Place it in the same folder as your Python program. If you find that your program is not able to access the file, make sure that it is in the same folder. Problem 1 You are given a file colleges.csv, containing a data set of all US colleges and universities. Open this file in a spreadsheet program (e.g. Microsoft Excel, LibreOffice Calc, Apple Numbers, etc) and take a look at its contents. Notice which data are provided for each college. Some of this data will be relevant to this assignment, and some won’t. Notice the headers in the first row. For example, the SAT_AVG column indicates the average SAT score for that college. Not all colleges release that information, so some colleges have the special value NULL instead. Now open the same file in a text editor (e.g. IDLE, Notepad, TextEdit, SublimeText, etc) and notice how its data is represented. You’ll see that the first line in the file consists of a sequence of comma-separated column headers. Subsequent lines in the file contain comma-separated values. The position of an element in a row matches the corresponding header: for example, the fifth column header is CITY, so the fifth value in each row indicates the college’s city. The lines are very long and may wrap around to fit in the windows, but they are still just one line! Write a function read_file with the following header: def read_file(filename): """ sig: str -> list(list(str)) Loads the named file and returns all data rows in the file as a list of lists of strings. """ Your function should open the file named in its parameter, skip over the header row, parse the remaining lines into a list of strings, and return a list of all such lists. The list returned by your function should not include the header row. 1 For example: >>> colleges = read_file("colleges.csv") # read the file into a variable >>> somecollege = colleges[50] # select college at row 50 >>> print(somecollege[3]) # print the college's name Snead State Community College >>> print(somecollege[6]) # print the college's zip code 35957-0734 Test your code thoroughly. Make sure that it works as shown above. Don’t forget to close the file after reading it! Hint: open the file in read mode using open, then use a for loop to iterate through each line in the file. Then use the split method to separate each line into a list of values. Problem 2 Write a function with the following signature: def find_most_exclusive_womens_college(colleges): """ sig: list(list(str)) -> str Returns the name of the women's college with the lowest admission rate """ That is, your function takes the data structure produced by read_file and finds the name of the women’s college with the lowest admission rate. A college will have a 1 in the WOMENONLY column if it’s a women’s college. The admission rate for a college is given as a float in the ADM_RATE column. For example: >>> colleges = read_file("colleges.csv") >>> print("The most selective women's college is "+find_most_exclusive_womens_college(colleges)) Hint: you’ll have to iterate through all colleges (using a for loop) and keep track of what is the lowest admission rate you’ve seen so far. Every time you find a lower admission rate, you’ll also need to keep track of the name of the college. Problem 3 Write a function with the following signature: def average_sat_score_in_ny(colleges): """ sig: list(list(str)) -> float Returns the value of the average SAT score of all colleges in New York """ That is, your function takes the data structure produced by read_file and finds the average of all average SAT scores of all colleges located in the state of New York. The STABBR field tells you the state of a college, and its SAT_AVG field tells you its average SAT score. Not all colleges release their SAT scores, so you should skip over colleges if the SAT score column contains NULL. 2 Hint: you’ll have to iterate through all colleges (using a for loop) and sum up the total of all SAT scores, then divide by the number of schools that provide an SAT score. Problem 4 Write a function with the following signature: def distance(first , second): """ sig: tuple(float , float), tuple(float , float) -> float Returns the Cartesian distance between two points , expressed as a latitude/longitude pair """ That is, given two tuples, each containing a pair of floats representing the latitude and longitude of a point on Earth, calculate their Cartesian distance. You can use the usual formula: ? = √(?1 − ?1)2 + (?2 − ?2)2 For example, given the location of New York City and Los Angeles: >>> print(distance((40.730610, -73.935242), (34.052235, -118.243683))) 44.808912467176725 Problem 5 Write a function with the following signature: def find_college_nearest_center_of_us(colleges): """ sig: list(list(str)) -> str Returns the name of the college nearest the geographical center of the contiguous United States """ The geographical center of the contiguous United States is located at the latitude/lon- gitude position (39.833333, -98.583333). The location of each college is given in the LATITUDE and LONGITUDE columns. Find which college is nearest the geographical center of the US. Not all colleges provide their location, so if either of the position columns contain NULL, skip over that college. Hint: use your distance function. Hint: you’ll have to iterate through all colleges (using a for loop) and keep track of what is the college nearest to the center point. Every time you find a nearer college, you’ll also need to keep track of the name of the college. Problem 6 Tying together all the code you’ve written in this assignment, write a function main that calls the read_file function to load the file colleges.csv and then prints out the result of the other functions you’ve written. When run, the final output should look like this: 3 Most exclusive women 's college: [college name here] Average SAT of all NY schools: [number here] College nearest geographical center of contiguous US: [college name here] Your main function should call read_file only once. 4