Pyhton programming using any of this -
matplotlib pyplot numpy pandaPYTHON ASSIGNMENT Send code as well as paste plots generated in a word, also explain what it concludes Let us use data analytical skills to determine which factors contribute to higher medical costs. The insurance.csv dataset is related to individual medical costs billed by health insurance companies. It also includes some personal information. Use from these -- matplotlib pyplot numpy panda Assignment Data Description · age: age of primary beneficiary · sex: insurance contractor gender, 1 (female), 0 (male) · bmi: body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9 · children: number of children covered by health insurance / number of dependents · smoker: 1 (smoking), 0 (non-smoking) · region: the beneficiary's residential area in the US, 0 (southwest), 1(southeast), 2 (northwest), 3 (northeast) · charges: individual medical costs billed by health insurance Questions 1. We will examine if bmi has an impact on the medical costs. Put the bmi on the x-axis. The color of each point will be set according to whether the patient is a smoker. Set the transparency to be 0.7. Be sure to include the colorbar, and set appropriate labels for x-axis, y-axis and the colorbar. What business insights can you get? 2. We further compare the distribution of the medical costs of smokers and that of non-smokers. Plot the distribution of medical costs of smokers first. Then on the same figure, plot the distribution of medical costs of non-smokers and set the transparency to 0.6. The number of bins is 12 for both plots. Set appropriate labels and legends. 3. We study whether age is an important factor by comparing the distribution of medical costs of young people and that of elder people. On the same plot, generate a histogram of medical costs of patients younger than 40 years old, and then another histogram representing the rest of the patients. Set the transparency of the second histogram to 0.7. The number of bins is 15 for both histogram. Set appropriate labels and legends. What can you conclude from this figure? 4. Open-ended question. Now it is your turn to discover something interesting and valuable! What else can you conclude from this dataset using the data visualization skills we leant? Generate two more figures and explain your findings. PART 2 of Assignment >>. Visualization Practice: Bike Sharing Systems Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues. Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data. Data Description: We will be using the daily version of the Capital Bikeshare System dataset from the UCI Machine Learning Repository. This data set contains information about the daily count of bike rental checkouts in Washington, D.C.’s bikeshare program between 2011 and 2012. It also includes information about the weather and seasonal/temporal features for that day (like whether it was a weekday). • day: Day of the record (relative to day 1:2011-01-01) • season: Season (1:winter, 2:spring, 3:summer, 4:fall) • weekday: Day of the week (0=Sunday, 6=Saturday) • workingday: If day is neither weekend nor holiday is 1, otherwise is 0. • weathersit: – 1: Clear, Few clouds, Partly cloudy, Partly cloudy – 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist – 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds • temp: Normalized temperature in Celcius • windspeed: Normalized wind speed • casual: Count of checkouts by casual/non-registered users • registered: Count of checkouts by registered users • cnt: Total checkouts [ ]: import pandas as pd daily = pd.read_csv('day.csv') daily.head() Questions: 1. Understand Trends. Generate a line chart to show the checkouts over time by using day column as the x-axis and cnt column as the y-axis. Label the x-axis as ‘Day’, and y-axis as ‘Check Outs’. What can you conclude? 2. Explore Relationships. We will plot the daily count of bikes that were checked out by casual/non-registered users against the temperature. Color the points to be ‘#539cab’. Set the transparency to be 0.7. Be sure to include appropriate labels for x-axis and y-axis. What insight can you get? 3. Explore Relationships with Multidimensional Information. We will plot the daily count of bikes that were checked out by casual/non-registered users against the temperature. The color of each point will be set according to whether it is a working day. Set the trans- parency to be 0.7. Be sure to include appropriate labels for x-axis and y-axis. Change the legend of the color bar to whether it is a working day. What additional insights can you get? 4. Examine Distributions. Let’s first build a histogram of the registered bike checkouts with the number of bins as 10. Set appropriate labels. Also set the title to be “Distribution of Registered Check Outs”. 5. Compare Distributions. We now compare the distributions of registered and casual check- outs. To make the figure easy to understand, additional to the histogram we made for the previous question, we will set the transparency of the casual one to 0.8 and the number of bins to 5. Set appropriate labels. 6. How do the temperatures change across the seasons? You need to choose the type of visual- ization that best serves this purpose. What are the mean and median temperatures? 7. What else can you conclude from this dataset by using various data exploration? age,sex,bmi,children,smoker,region,charges 19,1,27.9,0,1,0,16884.924 18,0,33.77,1,0,1,1725.5523 28,0,33,3,0,1,4449.462 33,0,22.705,0,0,2,21984.47061 32,0,28.88,0,0,2,3866.8552 31,1,25.74,0,0,1,3756.6216 46,1,33.44,1,0,1,8240.5896 37,1,27.74,3,0,2,7281.5056 37,0,29.83,2,0,3,6406.4107 60,1,25.84,0,0,2,28923.13692 25,0,26.22,0,0,3,2721.3208 62,1,26.29,0,1,1,27808.7251 23,0,34.4,0,0,0,1826.843 56,1,39.82,0,0,1,11090.7178 27,0,42.13,0,1,1,39611.7577 19,0,24.6,1,0,0,1837.237 52,1,30.78,1,0,3,10797.3362 23,0,23.845,0,0,3,2395.17155 56,0,40.3,0,0,0,10602.385 30,0,35.3,0,1,0,36837.467 60,1,36.005,0,0,3,13228.84695 30,1,32.4,1,0,0,4149.736 18,0,34.1,0,0,1,1137.011 34,1,31.92,1,1,3,37701.8768 37,0,28.025,2,0,2,6203.90175 59,1,27.72,3,0,1,14001.1338 63,1,23.085,0,0,3,14451.83515 55,1,32.775,2,0,2,12268.63225 23,0,17.385,1,0,2,2775.19215 31,0,36.3,2,1,0,38711 22,0,35.6,0,1,0,35585.576 18,1,26.315,0,0,3,2198.18985 19,1,28.6,5,0,0,4687.797 63,0,28.31,0,0,2,13770.0979 28,0,36.4,1,1,0,51194.55914 19,0,20.425,0,0,2,1625.43375 62,1,32.965,3,0,2,15612.19335 26,0,20.8,0,0,0,2302.3 35,0,36.67,1,1,3,39774.2763 60,0,39.9,0,1,0,48173.361 24,1,26.6,0,0,3,3046.062 31,1,36.63,2,0,1,4949.7587 41,0,21.78,1,0,1,6272.4772 37,1,30.8,2,0,1,6313.759 38,0,37.05,1,0,3,6079.6715 55,0,37.3,0,0,0,20630.28351 18,1,38.665,2,0,3,3393.35635 28,1,34.77,0,0,2,3556.9223 60,1,24.53,0,0,1,12629.8967 36,0,35.2,1,1,1,38709.176 18,1,35.625,0,0,3,2211.13075 21,1,33.63,2,0,2,3579.8287 48,0,28,1,1,0,23568.272 36,0,34.43,0,1,1,37742.5757 40,1,28.69,3,0,2,8059.6791 58,0,36.955,2,1,2,47496.49445 58,1,31.825,2,0,3,13607.36875 18,0,31.68,2,1,1,34303.1672 53,1,22.88,1,1,1,23244.7902 34,1,37.335,2,0,2,5989.52365 43,0,27.36,3,0,3,8606.2174 25,0,33.66,4,0,1,4504.6624 64,0,24.7,1,0,2,30166.61817 28,1,25.935,1,0,2,4133.64165 20,1,22.42,0,1,2,14711.7438 19,1,28.9,0,0,0,1743.214 61,1,39.1,2,0,0,14235.072 40,0,26.315,1,0,2,6389.37785 40,1,36.19,0,0,1,5920.1041 28,0,23.98,3,1,1,17663.1442 27,1,24.75,0,1,1,16577.7795 31,0,28.5,5,0,3,6799.458 53,1,28.1,3,0,0,11741.726 58,0,32.01,1,0,1,11946.6259 44,0,27.4,2,0,0,7726.854 57,0,34.01,0,0,2,11356.6609 29,1,29.59,1,0,1,3947.4131 21,0,35.53,0,0,1,1532.4697 22,1,39.805,0,0,3,2755.02095 41,1,32.965,0,0,2,6571.02435 31,0,26.885,1,0,3,4441.21315 45,1,38.285,0,0,3,7935.29115 22,0,37.62,1,1,1,37165.1638 48,1,41.23,4,0,2,11033.6617 37,1,34.8,2,1,0,39836.519 45,0,22.895,2,1,2,21098.55405 57,1,31.16,0,1,2,43578.9394 56,1,27.2,0,0,0,11073.176 46,1,27.74,0,0,2,8026.6666 55,1,26.98,0,0,2,11082.5772 21,1,39.49,0,0,1,2026.9741 53,1,24.795,1,0,2,10942.13205 59,0,29.83,3,1,3,30184.9367 35,0,34.77,2,0,2,5729.0053 64,1,31.3,2,1,0,47291.055 28,1,37.62,1,0,1,3766.8838 54,1,30.8,3