Create a Python program, with associated comments to make the code understandable, which performs searches andanalyses in the medical dataset loaded in a Pandas dataframe. The dataset is the same as...

1 answer below »

Create a Python program, with associated comments to make the code understandable, which performs searches andanalyses in the medical dataset loaded in a Pandas dataframe. The dataset is the same as for Assignment #5, and contains the variables provided on this documentation page:https://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/DoctorContacts.html(Links to an external site.).


An iPython notebook is provided with comments asking for the tasks to complete, with associated output. It is attached both as a notebook and a Python script.


Turn in your program in the message section of this drop-box, oras an attached iPythonnotebook (.ipynbextension), or as an attached Python file (.py extension).


The tasks to complete are explained both in Assignment#6.pdf, in the associated notebook in the associated Python program, and below:




1) Read the Medical Record Dataset



a.
Read the dataset from
https://vincentarelbundock.github.io/Rdatasets/csv/Ecdat/(Links to an external site.)



b.
Format numbers to be displayed with two decimal positions



c.
View first 5 rows in the Medical Dataset



d.
Customize the Column Names so that the first column’s label becomes ‘id.



2)
Create a Function Analyzing the Association between Two Variables



a.
Create the function medical_stats calculating the regression line between two variables var1 and var2, and returning the slope and intercept. The variable should display in a message whether:




i.
There is a significant association or not (pvalue




ii.
In the case where there is a significant association, whether the association is positive (slope >0), negative (slope



b.
Apply the function for the following pair of variables, to determine whether there is an association between them:




i.
age and ndisease







      1. mdu and ndisease

      2. educdec and ndisease

      3. lpi and ndisease

      4. linc and ndisease

      5. lfam and ndisease

      6. idp and ndisease

      7. lc and ndisease

      8. fmde and ndisease.






3) Predict the number of chronic diseases for a family with head of household of educational years of 15.


4) Create a Function Plotting the Association between Two Variables and the Regression Line.





    1. The function plots the graph between the two variables in argument and returns the plot, after displaying it using the ‘jointplot’ function of module ‘seaborn’.

    2. Apply the function to plot the association between following pairs of variables:

      1. age and ndisease.

      2. educdec and ndisease.












Answered 2 days AfterJul 02, 2021

Answer To: Create a Python program, with associated comments to make the code understandable, which performs...

Pritam Kumar answered on Jul 05 2021
149 Votes
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Read the Medical Record Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Read the dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"data=pd.read_csv('D:\\\\New\\\\DoctorContacts.csv')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"data_df=pd.DataFrame(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Format numbers to be displayed with two decimal positions"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"data_df['lc']=round(data_df['lc'],2)\n",
"data_df['lpi']=round(data_df['lpi'],2)\n",
"data_df['fmde']=round(data_df['fmde'],2)\n",
"data_df['ndisease']=round(data_df['ndisease'],2)\n",
"data_df['linc']=round(data_df['linc'],2)\n",
"data_df['lfam']=round(data_df['lfam'],2)\n",
"data_df['educdec']=round(data_df['educdec'],2)\n",
"data_df['age']=round(data_df['age'],2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"View first 5 rows in the Medical Dataset"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
Unnamed: 0mdulcidplpifmdephyslimndiseasehealthlinclfameducdecagesexchildblack
0100.0True6.910.0False13.73good9.531.3912.042.88maleFalseTrue
1220.0True6.910.0False13.73good9.531.3912.043.88maleFalseTrue
2300.0True6.910.0False13.73good9.531.3912.044.88maleFalseTrue
3400.0True6.910.0False13.73good9.531.3912.045.88maleFalseTrue
4500.0True6.910.0False13.73good9.531.3912.046.88maleFalseTrue
\n",
"
"
],
"text/plain": [
" Unnamed: 0 mdu lc idp lpi fmde physlim ndisease health linc \\\n",
"0 1 0 0.0 True 6.91 0.0 False 13.73 good 9.53 \n",
"1 2 2 0.0 True 6.91 0.0 False 13.73 good 9.53 \n",
"2 3 0 0.0 True 6.91 0.0 False 13.73 good 9.53 \n",
"3 4 0 0.0 True 6.91 0.0 False 13.73 good 9.53 \n",
"4 5 0 0.0 True 6.91 0.0 False 13.73 good 9.53 \n",
"\n",
" lfam educdec age sex child black \n",
"0 1.39 12.0 42.88 male False True \n",
"1 1.39 12.0 43.88 male False True \n",
"2 1.39 12.0 44.88 male False True \n",
"3 1.39 12.0 45.88 male False True \n",
"4 1.39 12.0 46.88 male False True "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Customize the Column Names so that the first column’s label becomes ‘id"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
idmdulcidplpifmdephyslimndiseasehealthlinclfameducdecagesexchildblack
0100.0True6.910.0False13.73good9.531.3912.042.88maleFalseTrue
1220.0True6.910.0False13.73good9.531.3912.043.88maleFalseTrue
2300.0True6.910.0False13.73good9.531.3912.044.88maleFalseTrue
3400.0True6.910.0False13.73good9.531.3912.045.88maleFalseTrue
4500.0True6.910.0False13.73good9.531.3912.046.88maleFalseTrue
\n",
"
"
],
"text/plain": [
" id mdu lc idp lpi fmde physlim ndisease health linc lfam \\\n",
"0 1 0 0.0 True 6.91 0.0 False 13.73 good 9.53 1.39 \n",
"1 2 2 0.0 True 6.91 0.0 False 13.73 good 9.53 1.39 \n",
"2 3 0 0.0 True 6.91 0.0 False 13.73 good 9.53 1.39 \n",
"3 4 0 0.0 True 6.91 0.0 False 13.73 good 9.53 1.39 \n",
"4 5 0 0.0 True 6.91 0.0 False 13.73 good 9.53 1.39 \n",
"\n",
" educdec age sex child black \n",
"0 12.0 42.88 male False True \n",
"1 12.0 43.88 male False True \n",
"2 12.0 44.88 male False True \n",
"3 12.0 45.88 male False True \n",
"4 12.0 46.88 male False True "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_df.rename(columns={'Unnamed: 0':'id'}).head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create a Function Analyzing the Association between 2 Variables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creating the function"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import statsmodels.api as sm\n",
"\n",
"#Creating a function for countinuous variables only\n",
"def medical_stats(var1,var2):\n",
" \n",
" var1 = sm.add_constant(var1)\n",
" model1 = sm.OLS(var2,var1)\n",
" fitted1 = model1.fit()\n",
" p_values = fitted1.summary2().tables[1]['P>|t|']\n",
" \n",
" parameters=fitted1.params\n",
" \n",
" slope = parameters[1]\n",
" intercept = parameters[0]\n",
" \n",
" if p_values.values[0] > 0.05: #for 95% confidence\n",
" print(\"The association is not significant\")\n",
" elif p_values.values[0] < 0.05:\n",
" if slope...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here