follow every steps from the instruction, using python
Microsoft Word - homework4.docx Data Focused Python Homework 4 Due: 11:59 PM Sunday, November 27, 2022 This is an individual assignment. Problem 1 a. Read the file 'expenses.txt' into a DataFrame named expenses – the data is separated by a colon ( : ). The first line contains column headers. Display expenses Read the file 'people.txt’, also colon-separated with column headers in the first line, into DataFrame people. Display people. Read the file ‘departments.txt’ into DataFrame departments – same formatting. Display departments. b. Clean people by removing any row with null data – display the rows removed (hint: use a variant of isna() ). Then remove anyone whose department is not in departments (recall the isin( ) method from Lab 7) – again display the rows removed. Display the cleaned version of people. c. Clean expenses by removing any row with null data – display rows removed. Then remove rows whose id is not in people – display rows removed. Then remove rows with a malformed date field (only check for the correct number of digits, 8 – use a lambda and apply() ) – display rows removed. Display the cleaned version of expenses. d. Merge expenses and people on the ‘ID’ column using an inner join – name the result alldata. Sort alldata on ‘ID’. Create sums by grouping the rows (use groupby( ) ) on ‘ID’ and then use agg(‘sum’) to get the total of each person’s expenses. Display sums. Problem 2 Use the expenses and people DataFrames from Problem 1 to display the following graphs. Make sure you label the graphs and axes correctly. Use the pandas functions groupby( ) with the column you need, together with either count( ) or sum( ) – as in, groupby().sum(). Also, use the groupby parameter as_index=False to store the grouping column as an actual column instead of as the index. The following graph shows the idea, although the values may not be correct (it's sample data, and I was lazy about labels): a. Bar chart containing the departments and the number of people employed in each. b. Bar chart containing the id's and the total expense amount for each id. c. Bar chart containing the expense categories and the total expense amount for each category. d. Pie chart containing the same information as #c. Read the file “countryData.csv” into a DataFrame named countryData; this is a version of the GDP – Olympic medals data from Lab 7. Use countryData to create regression plots for the following using Seaborn’s regplot. Compute the correlation coefficient and use it as an annotation on the graph. e. Population against GDP f. GDP against Weighted g. Formula against Total h. Formula against Weighted JTW Problem 3 Write a script by copying the relevant parts of the Lab 8 code and modify it to do these things: a. Create a list with these topic strings: Python; Data Science; Data Analysis; Machine Learning; and Deep Learning. Use these topics, one at a time, to query the Google Books API. For each returned JSON string: b. Convert the JSON string to a dict using loads( ) (as in the lab), then use this to convert it to a DataFrame: pd.io.json.json_normalize ( thedict['items'] ) c. Extract just the 'volumeInfo.title' and 'volumeInfo.authors' columns. d. Relabel those two columns as 'Title' and 'Authors'. After creating the five DataFrame objects, use concat( ) to create one table called bigTable (use ignore_index=True). The function takes a list of the DataFrames to concatenate (i.e., in [ ]'s). e. Display bigTable. f. Re-display bigTable in the following way, using regular Python to display data extracted from bigTable. Create the table headers (left justified), then use a for loop over bigTable.index, which will count on the index number starting at 0. Display at most 25 characters of the title (just use [:25], even if the title has fewer characters and only the first author. It should look something like this (your data may vary): ID:Amount:Category:Date:Description 5:5.25:supply:20220222:box of staples 7:79.81:meal:20220222:lunch with ABC Corp. clients Al, Bob, and Cy 4:43.00:travel:20220222:cab back to office 4:383.75:travel:20220223:flight to Boston, to visit ABC Corp. 22:55.00:travel:20220223:cab to ABC Corp. in Cambridge, MA 17:23.25:meal:20220223:dinner at Logan Airport 5:318.47:supply:20220224:paper, toner, pens, paperclips, tape 22:142.12:meal:20220226:host dinner with ABC clients, Al, Bob, Cy, Dave, Ellie 20:20.20:: 20:20.20::: 49:303.94:util:20220227:Peoples Gas 49:121.07:util:20220227:Verizon Wireless 8:7.59:supply:20220227:Python book (used) 8:79.99:supply:20220227:spare 20" monitor 13:49.86:supply:20220228:Stoch Cal for Finance II 7:6.53:meal:20220302:Dunkin Donuts, drive to Big Inc. near DC 7:127.23:meal:20220302:dinner, Tavern64 22:33.07:meal:20220303:dinner, Uncle Julio's 7:86.00:travel:20220304:mileage, drive to/from Big Inc., Reston, VA 7::travel:20220304:mileage, drive to/from Big Inc., Reston, VA 50:22.00:travel:20220304:tolls 7:378.81:travel:20220304:Hyatt Hotel, Reston VA, for Big Inc. meeting 8:1247.49:supply:20220306:Dell 7000 laptop/workstation 40:6.99:supply:20220306:HDMI cable 49:212.06:util:20220308:Duquesne Light 8:23.86:supply:20220309:Practical Guide to Quant Finance Interviews 5:195.89:supply:20220309:black toner, HP 304A, 2-pack 5:195.89:supply:20220309: 22:86.00:travel:20220317:mileage, drive to/from Big Inc., Reston, VA 18:32.27:meal:20220317:lunch at Clyde's with Fred and Gina, Big Inc. 7:22.00:travel:20220317:tolls 5:119.56:util:20220319:Verizon Wireless 5:284.23:util:20220323:Peoples Gas 5:8.98:supply:20220325:Flair pens 5:8.98:supply:202325:Flair pens 5:22.95:supply:20220412:Bic pens 4:149.95:travel:20220415:Car rental 28:2245.25:supply:2022512:party 28::supply:2022512:party 18:77.75:meal:20220416:Lunch with investors 7:950.15:travel:20220418:flight to Chicago 5:22.95:supply:20220412:Bic pens 4:149.95:travel:20220415:Car rental 18:77.75:meal:20220416:Lunch with investors 7:950.15:travel:20220418:flight to Chicago 7:950.15:travel:220418:flight to Chicago 5:22.95:supply:20220412:Bic pens 4:5.19:meal:20190519:McDonalds 4:149.95:travel:20220415:Car rental 18:77.75:meal:20220416:Lunch with investors 18:77.75::20220416:Lunch with investors 7:950.15:travel:20220418:flight to Chicago Department:Location Sales:New York Office:New York Research:San Francisco Manufacturing:Pittsburgh Marketing:Chicago