In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here...

1 answer below »
The zip folder has the assignment and data set.


In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here
Answered Same DayMar 15, 2021

Answer To: In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here...

Ximi answered on Mar 17 2021
137 Votes
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we'll be working with some data from the Indego bikeshare company:\n",
"\n",
"- `./data/indego-trips-2017-q3.csv`\n",
"\n",
"Our goal is to look at a particular numeric aspect:\n",
"\n",
"- how often bikes get used (and worn out).\n",
"\n",
"The entire data set takes place over a quarter of 2017. So all of the bikes are represented according to the same quantity of time, right? Well, if so and if each gets rented randomly at a fixed rate, $\\lambda$, then the distribution of bike usage probabilities:\n",
"\n",
"$$P(\\text{a bike gets rented }\\:x\\:\\text{ times in a quarter})$$\n",
"\n",
"will be a Poisson distribution! Let's investigate to see if we can support this possibility."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__C1.__ _(2 pts_) To get started, import pandas and load the data as usual. Print the spreadsheet's head so t
hat the data's structure is close at hand."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
trip_iddurationstart_timeend_timestart_stationstart_latstart_lonend_stationend_latend_lonbike_idplan_durationtrip_route_categorypassholder_type
0144361832122017-07-01 00:04:002017-07-01 00:16:00316039.956619-75.198624316339.949741-75.1809691188330One WayIndego30
1144361829312017-07-01 00:06:002017-07-01 00:37:00304639.950119-75.144722310139.942951-75.15955453940One WayWalk-up
2144361830152017-07-01 00:06:002017-07-01 00:21:00300639.952202-75.203110310139.942951-75.159554333130One WayIndego30
3144361831152017-07-01 00:06:002017-07-01 00:21:00300639.952202-75.203110310139.942951-75.159554351530One WayIndego30
4144361828302017-07-01 00:07:002017-07-01 00:37:00304639.950119-75.144722310139.942951-75.159554119130One WayWalk-up
\n",
"
"
],
"text/plain": [
" trip_id duration start_time end_time \\\n",
"0 144361832 12 2017-07-01 00:04:00 2017-07-01 00:16:00 \n",
"1 144361829 31 2017-07-01 00:06:00 2017-07-01 00:37:00 \n",
"2 144361830 15 2017-07-01 00:06:00 2017-07-01 00:21:00 \n",
"3 144361831 15 2017-07-01 00:06:00 2017-07-01 00:21:00 \n",
"4 144361828 30 2017-07-01 00:07:00 2017-07-01 00:37:00 \n",
"\n",
" start_station start_lat start_lon end_station end_lat end_lon \\\n",
"0 3160 39.956619 -75.198624 3163 39.949741 -75.180969 \n",
"1 3046 39.950119 -75.144722 3101 39.942951 -75.159554 \n",
"2 3006 39.952202 -75.203110 3101 39.942951 -75.159554 \n",
"3 3006 39.952202 -75.203110 3101 39.942951 -75.159554 \n",
"4 3046 39.950119 -75.144722 3101 39.942951 -75.159554 \n",
"\n",
" bike_id plan_duration trip_route_category passholder_type \n",
"0 11883 30 One Way Indego30 \n",
"1 5394 0 One Way Walk-up \n",
"2 3331 30 One Way Indego30 \n",
"3 3515 30 One Way Indego30 \n",
"4 11913 0 One Way Walk-up "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# code here\n",
"import pandas as pd\n",
"data = pd.read_csv('data/indego-trips-2017-q3.csv')\n",
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(276785, 14)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__C2.__ _(5 pts)_ Now, let's start things out by counting the number of trips that each bike has in total, using pandas `df.groupby()` to group the trips, and a counter, `NumBikes`, to store the number of bikes, $n$, rented $x$ times in the quarter, $n(x)$."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# code here\n",
"from collections import Counter\n",
"grouped = data.groupby('bike_id').agg({\"trip_id\": \"count\"})\n",
"NumBikes = Counter()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.\n",
" \"\"\"Entry point for launching an IPython kernel.\n"
]
}
],
"source": [
"for x,y in grouped.reset_index().as_matrix():\n",
" NumBikes[x] = y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__C3.__ _(5 pts)_ Now that we've got our bikes counted up, let's compute the empirical probabilities:\n",
"\n",
"$$P(x) = P(\\text{a bike is rented }\\:x\\:\\text{ times in a quarter}) = \n",
"\\frac{n(x)}{\\sum n(x)}.$$\n",
"\n",
"We already have $n(x)$ in our `Counter()` from __C2__, so let's start by turning its keys and values into numpy arrays (vectors), `n`, and `x`. After this is done, we can make the probabilities, `p`, from a scalar product of `n`: divide it by its sum."
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.\n",
" \n"
]
}
],
"source": [
"# code here\n",
"NumBikesVectors = grouped.reset_index().as_matrix()\n",
"n = NumBikesVectors[:, 0]\n",
"x = NumBikesVectors[:, 1]"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"p = n/x.sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__C4.__ _(2 pts)_ Now it's time to find the average number of times a bike gets rented in a quarter. We'll call this quantity $\\lambda$. So far, we've talked about averages of data, e.g., the arithmetic...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here