11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles...

1 answer below »
The instructions are in the pdf. Please make sure to complete the assignment in Jupyter Notebook format. Please make sure to complete all section down to. These last questions:

4) Findings / What would you present to your boss?




11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/blob/main/submissions/W200_Final_Exam.ipynb 1/7 UC-Berkeley-I-School / mids-w200-fall21-Lucas-Charles Private Code Issues Pull requests Actions Projects Wiki Security mids-w200-fall21-Lucas-Charles / submissions / W200_Final_Exam.ipynb charleslucas1217berkeley Final Exam History 1 contributor main 490 lines (490 sloc) 16.6 KB https://github.com/UC-Berkeley-I-School https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/issues https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/pulls https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/actions https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/projects https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/wiki https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/security https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/pulse https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/tree/main/submissions https://github.com/charleslucas1217berkeley https://github.com/charleslucas1217berkeley https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/commit/db2bbea2561f6f6bd99754b3dd48eadcb3b30c09 https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/commits/main/submissions/W200_Final_Exam.ipynb 11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/blob/main/submissions/W200_Final_Exam.ipynb 2/7 W200 Introduction to Data Science Programming, UC Berkeley MIDS Instructions The final exam is designed to evaluate your grasp of Python theory as well as Python coding. This is an individual exam. You have 48 hours to complete the exam, starting from the point at which you first access it. You will be graded on the quality of your answers. Use clear, persuasive arguments based on concepts we covered in class. Please double-click the markdown cells where it says "Your answer here" to input answers (if you need more cells please make them markdown cells) Use only Python standard libraries, matplotlib, seaborn, NumPy and Pandas for this exam Please push the exam to your github repo in the folder /SUBMISSIONS/final_exam/ YOUR NAME HERE 1: Short Answer Questions (25 pts - each question = 5 pts) a) The following class Cart and method add_to_cart are parts of a larger program used by a mobile phone company. The method add_to_cart will work when an object of type MobileDevice or of type ServiceContract is passed to it. State whether the method add_to_cart is a demonstration of the following items (yes/no) and the reasoning (1-2 sentences): 1. Inheritance 2. Polymorphism 3. Duck typing 4. Top-down design 5. Functional programming In [ ]: # Method: class Cart(): def __init__(self): self.cart = [] self.total = 0 def add_to_cart(self, item): self.cart.append(item) 11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/blob/main/submissions/W200_Final_Exam.ipynb 3/7 a) Your answer here 1. Inheritance: 2. Polymorphism: 3. Duck typing: 4. Top-down design: 5. Functional programming: b) Suppose you have a long list of digits (0-9) that you want to write to a file. From a storage standpoint, would it be more efficient to use ASCII or UTF-8 as an encoding? What is the most efficient way to create an even smaller file to store the information? b) Your answer here c) Why is it important to sanity-check your data before you begin your analysis? What could happen if you don't? c) Your answer here d) How do you determine which variables in your dataset you should check for issues prior to starting an analysis? d) Your answer here e1) Explain why the following code prints what it does. e1) Your answer here e2) Explain why the following code prints something different. e2) Your answer here 2: General Coding Questions (15 pts - each question 5 pts) a) Using a list comprehension: Make a list of the squared numbers greater than 25 that are the square of non-negative integer less than 10. Fill in a list comprehension below so that we get this desired output. pp ( ) self.total += item.price In [ ]: def f(): pass print(type(f)) In [ ]: def f(): pass print(type(f())) In [ ]: # 2a) Your code here 11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/blob/main/submissions/W200_Final_Exam.ipynb 4/7 b) Below is a data frame of customers that have different cooling systems. Your data science team lead wants the column cooling_system to be labeled with the integer numbers 1-4 instead of the text as shown below: 1 = Air Conditioning / AC / Air Con 2 = Heat Pump / HP 3 = Evaporative Cooler / Evap Cooler 4 = Fan Make a new column called cooling_type that maps the text values to the new numeric values. Filter out the values that are not included in the mapping above. Print out/display this new data frame. Be sure to list any assumptions also! c) From the dataframe below, use groupby in Pandas to show how many total delegates were obtained grouped by favorite color. Print this out. # 2a) Your code here In [ ]: import pandas # creating a data frame from scratch - list of lists data = [ [101, 'AC'], [102, 'Heat Pump'], [103, 'Air Con'], [104, 'Air Conditioning'], [105, 'Fan'], [106, 'None'], [107, 'Evap Cooler'], [108, None], [109, 'AC'], [110, 'Evaporative Cooler'], [111, 'geothermal'], [112, 1] ] # create a data frame with column names - list of lists col_names = ['Cust_Number', 'Cooling_System'] df = pandas.DataFrame(data, columns=col_names) df In [ ]: # 2b) Your code here In [ ]: import pandas # creating a data frame from scratch - list of lists data = [ ['marco', 165, 'blue', 'FL'], ['jeb', 0, 'red', 'FL'], ['chris', 0, 'white', 'NJ'], ['donald', 1543, 'white', 'NY'], ['ted', 559, 'blue', 'TX'], ['john', 161, 'red', 'OH'] 11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/blob/main/submissions/W200_Final_Exam.ipynb 5/7 3: Bitcoin coding problem (20 points): Bitcoin Consider a record of a one-time investment in bitcoin with value of that investment tracked monthly, provided as an (ordered) tuple of dictionaries, where each dictionary comprises one key for the month and corresponding one value for the value of the investment, and the first entry (Jan 2018) is the initial investment made on 01 Jan 2018, shown in data below. Write Python code to take such a record of any length (the below data is only a sample), and output a table/dataframe comprising a row for each month with columns for date, start balance, and return. Print out this table/dataframe. Also, visualize the record as two vertically arranged plots. The top plot should show a line plot of start balance vs. month The bottom plot should show a bar plot of return vs. month, with a black horizontal line at return=0, and bars color-coded such that positive returns are green and negative returns are red. The two plots' horizontal axes should align. Demonstrate that your code works by applying it to data . Notes: The gain for each period is the end balance minus the start balance. The growth factor for each period is the end balance divided by the start balance. The return for each period is the growth factor minus 1. 4: Clinical disease data (40 pts) Your boss comes to you Monday morning and says “I figured out our next step; we are ] # create a data frame with column names - list of lists col_names = ['name', 'delegates', 'color', 'state'] df = pandas.DataFrame(data, columns=col_names) df In [ ]: # 2c) Your code here In [ ]: data = ({"Jan 2018":1000},{"Feb 2018":1100},{"Mar 2018":1400},{"Apr 2018":700},{ data In [ ]: # 3) Your code here 11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/blob/main/submissions/W200_Final_Exam.ipynb 6/7 y y g y g p going to pivot from an online craft store and become a data center for genetic disease information! I found ClinVar which is a repository that contains expert curated data, and it is free for the taking. This is a gold mine! Look at the file and tell me what gene and mutation combinations are classified as dangerous.” Make sure that you only give your boss the dangerous mutations and include: 1) Gene name 2) Mutation ID number 3) Mutation Position (chromosome & position) 4) Mutation value (reference & alternate bases) 5) Clinical significance (CLNSIG) 6) Disease that is implicated Requirements 1) The deliverables are the final result as a dataframe with a short discussion of any specifics. (that is, what data you would present to your boss with the explanation of your results) 2) Limit your output to the first 100 harmful mutations and tell your boss how many total harmful mutations were found in the file 3) Use the instructor-modified "clinvar_final.txt" at this link: https://drive.google.com/file/d/1Zps0YssoJbZHrn6iLte2RDLlgruhAX1s/view?usp=sharing This file was modified to be not exactly the same as 'standard' .vcf file to test your data parsing skills. This is a large file so do NOT upload it into your github repo! 4) Replace missing values in the dataframe with: 'Not_Given'. Print or display this (including the Not_Given count) for the column CLNSIG by using pandas value_counts() function (https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html). 5) State in your answer how you define harmful mutations 6) Do your best on getting to above requirements and submit whatever you do before the deadline. If your work is incomplete be sure to describe the blockers that got in your way and how you might get past them (if given more time). 7) You can use as many code blocks as you need. Please clean-up your code and make it readable for the graders! Hints We do not expect you to have any medical knowledge to solve this problem; look at the data, read the documentation provided, and write down your assumptions! https://drive.google.com/file/d/1Zps0YssoJbZHrn6iLte2RDLlgruhAX1s/view?usp=sharing https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html 11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main · UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles https://github.com/UC-Berkeley-I-School/mids-w200-fall21-Lucas-Charles/blob/main/submissions/W200_Final_Exam.ipynb 7/7 7) You can use as many code blocks as you need. Please clean-up your code and make it readable for the graders! Hints • We do not expect you to have any medical knowledge to solve this problem; look at the data,
Answered 2 days AfterNov 29, 2021

Answer To: 11/29/21, 6:03 PM mids-w200-fall21-Lucas-Charles/W200_Final_Exam.ipynb at main ·...

Sandeep Kumar answered on Dec 02 2021
124 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here