{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select...

1 answer below ยป
Python coding


{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\". (After you have done that, you can delete the 'raise NotImplementedError()' line, and then run your code to check that it works).\n", "\n", "Also, enter your NAME in the next cell.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "NAME = \"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "01648f2c7a2b86d733aea6aeb05487e8", "grade": false, "grade_id": "jupyter", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "# ICT706 SouthBank 2020 Semester 1 Task 2\n", "\n", "This assignment will be done completely inside this Jupyter notebook.\n", "\n", "### Background\n", "A medium-size company has given you one year of data about the online purchases that their customers have made. They want you to analyse the data using statistical and machine learning techniques and produce:\n", "* a prediction algorithm for predicting how much money each customer is likely to spend in a year;\n", "* a classification algorithm for predicting which customers will be 'big spenders';\n", "* some recommendations on what marketing strategy they should use to attract more 'big spender' customers.\n", "\n", "### Instructions\n", "Follow all the instructions in this notebook to complete these tasks. Note that some cells contain 'assert' statements - these will automatically mark your work so that you can check that you have done the preceeding steps correctly. (If they give errors, then go back and correct your previous work until you fix those errors. Once those 'assert' cells execute without errors, you know that you have achieved the marks for that step.) \n", "\n", "When you have finished, this notebook is the only file that you will need to submit to Blackboard.\n", "\n", "Note: If you want some space to try out some Python code of your own, feel free to add extra cells into this notebook. Just make sure that before you submit your notebook, that those extra cells execute without error, or that you delete them before submitting.\n", "\n", "### Overview\n", "You have five sections to complete in this Notebook (total = 100 marks):\n", "* Part A: Load and Clean Data (20 points)\n", "* Part B Data Exploration (30 points)\n", "* Part C: Predicting Spending Levels (20 points)\n", "* Part D: Predicting Big Spenders (20 points)\n", "* Part E: Business Recommendations (10 points)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "deletable": false, "nbgrader": { "checksum": "8e8e40c5312c2594db509b8e4c9f731d", "grade": false, "grade_id": "imports", "locked": false, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# add all your imports here.\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "bbcbe1db68acf76763db3e19b29162d9", "grade": false, "grade_id": "cell-56b1c85226f679a1", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "---\n", "# Part A: Load and Clean Data (20 points)\n", "\n", "Save your CSV data file into the same folder as this notebook.\n", "\n", "Write Python code to load your dataset into a Pandas DataFrame called 'sales'." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "deletable": false, "nbgrader": { "checksum": "f1866306460b18ba8285f9073b7870bb", "grade": false, "grade_id": "read_sales", "locked": false, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "465472c32b09a4d2bd97325a77ab7dae", "grade": false, "grade_id": "cell-08fd91c8f6a3f1ab", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "After you have loaded the data correctly, you should have 10,000 rows. \n", "Run the following cells and tests to check that you have done this correctly." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "30e709e48a4861e49bfcd0f34e07af3b", "grade": false, "grade_id": "cell-802dd990ff7bb39a", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "sales.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "18fa2c34a21ec341e571e978461c2d74", "grade": true, "grade_id": "data_loaded", "locked": false, "points": 5, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "\"\"\"Check that 'sales' has the right shape and number of rows (5 points).\"\"\"\n", "assert len(sales.columns) == 10\n", "assert sales.columns[0] == \"CustNum\"\n", "assert sales.shape == (10000, 10)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "290dd5079318da97a499eb5f4e56e8c0", "grade": false, "grade_id": "cell-cbd5370682d8937a", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "## Cleaning the Data\n", "\n", "Some of the columns are strings, with dollar signs. But we need to convert them to numbers (float) so that we can do calculations on them. The next cell shows what will go wrong if we try doing calculations *before* converting them floats!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "f37fc394053f62b401f14b6d76a4c800", "grade": false, "grade_id": "cell-c0f6f29476bf6fc8", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "s2 = sales[\"Spend\"] * 4\n", "s2.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "656807b5b835cf1f7968d34cea95417c", "grade": false,
Answered Same DayJun 11, 2021

Answer To: { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before you turn this problem...

Sandeep Kumar answered on Jun 17 2021
140 Votes
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
"\n",
"Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\". (After you have done that, you can delete the 'raise NotImplementedError()' line, and then run your code to check that it works).\n",
"\n",
"Also, enter your NAME in the next cell.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"NAME = \"Ashma Dhakal\""
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "01648f2c7a2b86d733aea6aeb05487e8",
"grade": false,
"grade_id": "jupyter",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"# ICT706 SouthBank 2020 Semester 1 Task 2\n",
"\n",
"This assignment will be done completely inside this Jupyter notebook.\n",
"\n",
"### Background\n",
"A medium-size company has given you one year of data about the online purchases that their customers have made. They want you to analyse the data using statistical and machine learning techniques and produce:\n",
"* a prediction algorithm for predicting how much money each customer is likely to spend in a year;\n",
"* a classification algorithm for predicting which customers will be 'big spenders';\n",
"* some recommendations on what marketing strategy they should use to attract more 'big spender' customers.\n",
"\n",
"### Instructions\n",
"Follow all the instructions in this notebook to complete these tasks. Note that some cells contain 'assert' statements - these will automatically mark your work so that you can check that you have done the preceeding steps correctly. (If they give errors, then go back and correct your previous work until you fix those errors. Once those 'assert' cells execute without errors, you know that you have achieved the marks for that step.) \n",
"\n",
"When you have finished, this notebook is the only file that you will need to submit to Blackboard.\n",
"\n",
"Note: If you want some space to try out some Python code of your own, feel free to add extra cells into this notebook. Just make sure that before you submit your notebook, that those extra cells execute without error, or that you delete them before submitting.\n",
"\n",
"### Overview\n",
"You have five sections to complete in this Notebook (total = 100 marks):\n",
"* Part A: Load and Clean Data (20 points)\n",
"* Part B Data Exploration (30 points)\n",
"* Part C: Predicting Spending Levels (20 points)\n",
"* Part D: Predicting Big Spenders (20 points)\n",
"* Part E: Business Recommendations (10 points)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"deletable": false,
"nbgrader": {
"checksum": "8e8e40c5312c2594db509b8e4c9f731d",
"grade": false,
"grade_id": "imports",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [
],
"source": [
"# add all your imports here.\n",
"# YOUR CODE HERE\n",
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.preprocessing import LabelEncoder\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.svm import SVC\n",
"from sklearn import model_selection\n",
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "bbcbe1db68acf76763db3e19b29162d9",
"grade": false,
"grade_id": "cell-56b1c85226f679a1",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"---\n",
"# Part A:
Load and Clean Data (20 points)\n",
"\n",
"Save your CSV data file into the same folder as this notebook.\n",
"\n",
"Write Python code to load your dataset into a Pandas DataFrame called 'sales'."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"deletable": false,
"nbgrader": {
"checksum": "f1866306460b18ba8285f9073b7870bb",
"grade": false,
"grade_id": "read_sales",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [
],
"source": [
"# YOUR CODE HERE\n",
"sales = pd.read_csv(\"greenhatsales.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "465472c32b09a4d2bd97325a77ab7dae",
"grade": false,
"grade_id": "cell-08fd91c8f6a3f1ab",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"After you have loaded the data correctly, you should have 10,000 rows. \n",
"Run the following cells and tests to check that you have done this correctly."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "30e709e48a4861e49bfcd0f34e07af3b",
"grade": false,
"grade_id": "cell-802dd990ff7bb39a",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
CustNumNameSexAgeStateIncomeClicksLastSpendPurchasesSpend
00Brandon Bendermale67NSW120000709$2488.598$1615.00
11Andre Mccormickmale38VIC140000630$4295.3414$1927.20
22Ashley Smithfemale47NSW50000554$1986.098$1660.80
33Ann Rileyfemale33NSW100000309$1532.6410$3041.10
44Timothy Chavezmale49NSW140000520$2082.088$1764.40
\n",
"
"
],
"text/plain": [
" CustNum Name Sex Age State Income Clicks LastSpend \\\n",
"0 0 Brandon Bender male 67 NSW 120000 709 $2488.59 \n",
"1 1 Andre Mccormick male 38 VIC 140000 630 $4295.34 \n",
"2 2 Ashley Smith female 47 NSW 50000 554 $1986.09 \n",
"3 3 Ann Riley female 33 NSW 100000 309 $1532.64 \n",
"4 4 Timothy Chavez male 49 NSW 140000 520 $2082.08 \n",
"\n",
" Purchases Spend \n",
"0 8 $1615.00 \n",
"1 14 $1927.20 \n",
"2 8 $1660.80 \n",
"3 10 $3041.10 \n",
"4 8 $1764.40 "
]
},
"execution_count": 4,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"sales.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "18fa2c34a21ec341e571e978461c2d74",
"grade": true,
"grade_id": "data_loaded",
"locked": false,
"points": 5,
"schema_version": 1,
"solution": false
}
},
"outputs": [
],
"source": [
"\"\"\"Check that 'sales' has the right shape and number of rows (5 points).\"\"\"\n",
"assert len(sales.columns) == 10\n",
"assert sales.columns[0] == \"CustNum\"\n",
"assert sales.shape == (10000, 10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "290dd5079318da97a499eb5f4e56e8c0",
"grade": false,
"grade_id": "cell-cbd5370682d8937a",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Cleaning the Data\n",
"\n",
"Some of the columns are strings, with dollar signs. But we need to convert them to numbers (float) so that we can do calculations on them. The next cell shows what will go wrong if we try doing calculations *before* converting them floats!"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "f37fc394053f62b401f14b6d76a4c800",
"grade": false,
"grade_id": "cell-c0f6f29476bf6fc8",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 $1615.00$1615.00$1615.00$1615.00\n",
"1 $1927.20$1927.20$1927.20$1927.20\n",
"2 $1660.80$1660.80$1660.80$1660.80\n",
"3 $3041.10$3041.10$3041.10$3041.10\n",
"4 $1764.40$1764.40$1764.40$1764.40\n",
"Name: Spend, dtype: object"
]
},
"execution_count": 6,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"s2 = sales[\"Spend\"] * 4\n",
"s2.head()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false,
"deletable": false,
"nbgrader": {
"checksum": "656807b5b835cf1f7968d34cea95417c",
"grade": false,
"grade_id": "remove_dollars",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [
],
"source": [
"# Complete the following remove_dollar function \n",
"# so that it removes any dollar signs and spaces\n",
"# and then returns the string as a number (float).\n",
"def remove_dollar(s):\n",
" \"\"\"Removes dollar signs and spaces from s.\n",
" Returns it as a float.\n",
" \"\"\"\n",
" # YOUR CODE HERE\n",
" s = float(s.replace('$', '').replace(' ', ''))\n",
" return s"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "5f43a2833f4cb63651fe509dadb38345",
"grade": true,
"grade_id": "test_remove_dollars",
"locked": false,
"points": 5,
"schema_version": 1,
"solution": false
}
},
"outputs": [
],
"source": [
"\"\"\"Check that remove_dollar() removes dollars and spaces properly (5 points).\"\"\"\n",
"assert remove_dollar(\"12\") == 12.0\n",
"assert remove_dollar(\"$123\") == 123.0\n",
"assert remove_dollar(\" $1234\") == 1234.0\n",
"assert remove_dollar(\" $42.3 \") == 42.3"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "0f12e4450e96bddc301b3dd171c141f7",
"grade": false,
"grade_id": "cell-2674b20169c63acf",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Clean up the Spend columns\n",
"\n",
"Apply your remove_dollar function to the \"Spend\" column (every row), and put the cleaned-up float values into a new column of your 'sales' DataFrame called **\"SpendValue\"**.\n",
"\n",
"Then do the same for the \"LastSpend\" column and put the float values into a new column called **\"LastSpendValue\"**."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"deletable": false,
"nbgrader": {
"checksum": "2c3bc0d06aef53b2152af1d9de89ac5f",
"grade": false,
"grade_id": "clean_spends",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [
],
"source": [
"# YOUR CODE HERE\n",
"sales['SpendValue'] = sales['Spend'].apply(remove_dollar)\n",
"sales['LastSpendValue'] = sales['LastSpend'].apply(remove_dollar)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
CustNumNameSexAgeStateIncomeClicksLastSpendPurchasesSpendSpendValueLastSpendValue
00Brandon Bendermale67NSW120000709$2488.598$1615.001615.02488.59
11Andre Mccormickmale38VIC140000630$4295.3414$1927.201927.24295.34
22Ashley Smithfemale47NSW50000554$1986.098$1660.801660.81986.09
33Ann Rileyfemale33NSW100000309$1532.6410$3041.103041.11532.64
44Timothy Chavezmale49NSW140000520$2082.088$1764.401764.42082.08
\n",
"
"
],
"text/plain": [
" CustNum Name Sex Age State Income Clicks LastSpend \\\n",
"0 0 Brandon Bender male 67 NSW 120000 709 $2488.59 \n",
"1 1 Andre Mccormick male 38 VIC 140000 630 $4295.34 \n",
"2 2 Ashley Smith female 47 NSW 50000 554 $1986.09 \n",
"3 3 Ann Riley female 33 NSW 100000 309 $1532.64 \n",
"4 4 Timothy Chavez male 49 NSW 140000 520 $2082.08 \n",
"\n",
" Purchases Spend SpendValue LastSpendValue \n",
"0 8 $1615.00 1615.0 2488.59 \n",
"1 14 $1927.20 1927.2 4295.34 \n",
"2 8 $1660.80 1660.8 1986.09 \n",
"3 10 $3041.10 3041.1 1532.64 \n",
"4 8 $1764.40 1764.4 2082.08 "
]
},
"execution_count": 13,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"sales.head()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "0a2f7f9048af24c8c44cc2c73e6ea5e9",
"grade": false,
"grade_id": "cell-a2b9fa129543cf1f",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"CustNum int64\n",
"Name object\n",
"Sex object\n",
"Age int64\n",
"State object\n",
"Income int64\n",
"Clicks int64\n",
"LastSpend object\n",
"Purchases int64\n",
"Spend object\n",
"SpendValue float64\n",
"LastSpendValue float64\n",
"dtype: object"
]
},
"execution_count": 14,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"sales.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "63d529e18e524b173fd0bcc1425a4120",
"grade": true,
"grade_id": "test_clean_spends",
"locked": true,
"points": 5,
"schema_version": 1,
"solution": false
}
},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'Index' object has no attribute 'contains'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# check the new SpendValue columns (5 points)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontains\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"SpendValue\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontains\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"LastSpendValue\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# check that they are floats\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"SpendValue\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"float64\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mAttributeError\u001b[0m: 'Index' object has no attribute 'contains'"
]
}
],
"source": [
"# check the new SpendValue columns (5 points)\n",
"assert sales.columns.contains(\"SpendValue\")\n",
"assert sales.columns.contains(\"LastSpendValue\")\n",
"# check that they are floats\n",
"assert sales[\"SpendValue\"].dtype == \"float64\"\n",
"assert sales[\"LastSpendValue\"].dtype == \"float64\"\n",
"# check that the values are greater than zero.\n",
"assert (sales[\"SpendValue\"] > 0.0).all()\n",
"assert (sales[\"LastSpendValue\"] >= 0.0).all()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"## above code is wrong it should be sales.columns.str.contains ('SpendValue')"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "00496c84c110bdfaa8d1ab0f6a9bfdbc",
"grade": false,
"grade_id": "cell-9317487e7c923ef9",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"## Make Sex and State numeric\n",
"\n",
"To use the Sex and State columns as input features for the machine learning algorithms in Scikit-Learn they must be numeric.\n",
"\n",
"Use the **LabelEncoder** object from the sklearn.preprocessing package to convert the 'Sex' column into an integer column called **\"SexValue\"**. \n",
"\n",
"Also convert the \"State\" column into a integer column called **\"StateValue\"**. "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"deletable": false,
"nbgrader": {
"checksum": "7dd8627d56479eb8657776ba1ea3e8a8",
"grade": false,
"grade_id": "sexvalue_statevalue",
"locked": false,
"schema_version": 1,
"solution": true
}
},
"outputs": [
],
"source": [
"enc = LabelEncoder()\n",
"sales[\"SexValue\"] = enc.fit_transform(sales['Sex'])\n",
"sales[\"StateValue\"] = enc.fit_transform(sales['State'])"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "0ae13631cb08d1c0cbff3e5bf3de78f3",
"grade": false,
"grade_id": "cell-7282fcbc237bd3c9",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
NameSexSexValue
0Brandon Bendermale1
1Andre Mccormickmale1
2Ashley Smithfemale0
3Ann Rileyfemale0
4Timothy Chavezmale1
\n",
"
"
],
"text/plain": [
" Name Sex SexValue\n",
"0 Brandon Bender male 1\n",
"1 Andre Mccormick male 1\n",
"2 Ashley Smith female 0\n",
"3 Ann Riley female 0\n",
"4 Timothy Chavez male 1"
]
},
"execution_count": 19,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"# see if Sex has been mapped to ints properly?\n",
"cols = [\"Name\", \"Sex\", \"SexValue\"]\n",
"sales[cols].head()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "e49114bef2381621dd5b70443e207dfe",
"grade": false,
"grade_id": "cell-c7ca24f1daefbe4c",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
NameStateStateValue
0Brandon BenderNSW1
1Andre MccormickVIC6
2Ashley SmithNSW1
3Ann RileyNSW1
4Timothy ChavezNSW1
5John BennettVIC6
6Teresa WiseQLD3
7Andrew NelsonQLD3
8Jon AguilarNSW1
9Priscilla BriggsNSW1
\n",
"
"
],
"text/plain": [
" Name State StateValue\n",
"0 Brandon Bender NSW 1\n",
"1 Andre Mccormick VIC 6\n",
"2 Ashley Smith NSW 1\n",
"3 Ann Riley NSW 1\n",
"4 Timothy Chavez NSW 1\n",
"5 John Bennett VIC 6\n",
"6 Teresa Wise QLD 3\n",
"7 Andrew Nelson QLD 3\n",
"8 Jon Aguilar NSW 1\n",
"9 Priscilla Briggs NSW 1"
]
},
"execution_count": 20,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"# see if State has been mapped to ints properly?\n",
"cols = [\"Name\", \"State\", \"StateValue\"]\n",
"sales[cols].head(10)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "9568081227ace8e87156458f4c6b4715",
"grade": true,
"grade_id": "test_sexvalue_statevalue",
"locked": true,
"points": 5,
"schema_version": 1,
"solution": false
}
},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'Index' object has no attribute 'contains'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# test the new SexValue and StateValue columns (5 points)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontains\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"SexValue\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0msales\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcontains\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"StateValue\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# check that they are integer\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msales\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"SexValue\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstartswith\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"int\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# \"int32\" or \"int64\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mAttributeError\u001b[0m: 'Index' object has no attribute 'contains'"
]
}
],
"source": [
"# test the new SexValue and StateValue columns (5 points)\n",
"assert sales.columns.contains(\"SexValue\")\n",
"assert sales.columns.contains(\"StateValue\")\n",
"# check that they are integer\n",
"assert str(sales[\"SexValue\"].dtype).startswith(\"int\") # \"int32\" or \"int64\"\n",
"assert str(sales[\"StateValue\"].dtype).startswith(\"int\") # \"int32\" or \"int64\"\n",
"# check that the values are greater than zero.\n",
"assert sales[\"SexValue\"].max() == 1 # 0 and 1 only\n",
"assert sales[\"StateValue\"].max() == 7 # 7 states in Australia"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "0d29c2c288046210e1115c9171223383",
"grade": false,
"grade_id": "cell-251af098637b4c0a",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
CustNumSexValueAgeStateValueIncomeClicksPurchasesSpendValue
00167112000070981615.0
111386140000630141927.2
2204715000055481660.8
330331100000309103041.1
44149114000052081764.4
\n",
"
"
],
"text/plain": [
" CustNum SexValue Age StateValue Income Clicks Purchases SpendValue\n",
"0 0 1 67 1 120000 709 8 1615.0\n",
"1 1 1 38 6 140000 630 14 1927.2\n",
"2 2 0 47 1 50000 554 8 1660.8\n",
"3 3 0 33 1 100000 309 10 3041.1\n",
"4 4 1 49 1 140000 520 8 1764.4"
]
},
"execution_count": 22,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"# Finally, let us view just the numeric columns.\n",
"numcols = [\"CustNum\", \"SexValue\", \"Age\", \"StateValue\",\n",
" \"Income\", \"Clicks\", \"Purchases\", \"SpendValue\"]\n",
"sales[numcols].head()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
CustNumNameSexAgeStateIncomeClicksLastSpendPurchasesSpendSpendValueLastSpendValueSexValueStateValue
00Brandon Bendermale67NSW120000709$2488.598$1615.001615.02488.5911
11Andre Mccormickmale38VIC140000630$4295.3414$1927.201927.24295.3416
22Ashley Smithfemale47NSW50000554$1986.098$1660.801660.81986.0901
33Ann Rileyfemale33NSW100000309$1532.6410$3041.103041.11532.6401
44Timothy Chavezmale49NSW140000520$2082.088$1764.401764.42082.0811
\n",
"
"
],
"text/plain": [
" CustNum Name Sex Age State Income Clicks LastSpend \\\n",
"0 0 Brandon Bender male 67 NSW 120000 709 $2488.59 \n",
"1 1 Andre Mccormick male 38 VIC 140000 630 $4295.34 \n",
"2 2 Ashley Smith female 47 NSW 50000 554 $1986.09 \n",
"3 3 Ann Riley female 33 NSW 100000 309 $1532.64 \n",
"4 4 Timothy Chavez male 49 NSW 140000 520 $2082.08 \n",
"\n",
" Purchases Spend SpendValue LastSpendValue SexValue StateValue \n",
"0 8 $1615.00 1615.0 2488.59 1 1 \n",
"1 14 $1927.20 1927.2 4295.34 1 6 \n",
"2 8 $1660.80 1660.8 1986.09 0 1 \n",
"3 10 $3041.10 3041.1 1532.64 0 1 \n",
"4 8 $1764.40 1764.4 2082.08 1 1 "
]
},
"execution_count": 23,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"sales.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"deletable": false,
"editable": false,
"nbgrader": {
"checksum": "1f77126ed4a81a595f68f8d426ce81c1",
"grade": false,
"grade_id": "cell-e459377e95816230",
"locked": true,
"schema_version": 1,
"solution": false
}
},
"source": [
"---\n",
"\n",
"# Part B Data Exploration (30 points)\n",
"\n",
"In this section, you will explore the data statistically and visually, to get a feel for what kinds...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions ยป

Submit New Assignment

Copy and Paste Your Assignment Here