The zip folder has the assignment and data set.In [ ]: # code here In [ ]: # code here In [ ]: #...

Question

The zip folder has the assignment and data set.

In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here

qfile_636882049742876181_125278_1.zip qfile_636882049742876181_125278_2.docx

Ximi · Accepted Answer

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we'll be working with some data from the Indego bikeshare company:
",
    "
",
    "- `./data/indego-trips-2017-q3.csv`
",
    "
",
    "Our goal is to look at a particular numeric aspect:
",
    "
",
    "- how often bikes get used (and worn out).
",
    "
",
    "The entire data set takes place over a quarter of 2017. So all of the bikes are represented according to the same quantity of time, right? Well, if so and if each gets rented randomly at a fixed rate, $\lambda$, then the distribution of bike usage probabilities:
",
    "
",
    "$$P(\text{a bike gets rented }\:x\:\text{ times in a quarter})$$
",
    "
",
    "will be a Poisson distribution! Let's investigate to see if we can support this possibility."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__C1.__ _(2 pts_) To get started, import pandas and load the data as usual. Print the spreadsheet's head so that the data's structure is close at hand."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      trip_id
",
       "      duration
",
       "      start_time
",
       "      end_time
",
       "      start_station
",
       "      start_lat
",
       "      start_lon
",
       "      end_station
",
       "      end_lat
",
       "      end_lon
",
       "      bike_id
",
       "      plan_duration
",
       "      trip_route_category
",
       "      passholder_type
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      144361832
",
       "      12
",
       "      2017-07-01 00:04:00
",
       "      2017-07-01 00:16:00
",
       "      3160
",
       "      39.956619
",
       "      -75.198624
",
       "      3163
",
       "      39.949741
",
       "      -75.180969
",
       "      11883
",
       "      30
",
       "      One Way
",
       "      Indego30
",
       "    
",
       "    
",
       "      1
",
       "      144361829
",
       "      31
",
       "      2017-07-01 00:06:00
",
       "      2017-07-01 00:37:00
",
       "      3046
",
       "      39.950119
",
       "      -75.144722
",
       "      3101
",
       "      39.942951
",
       "      -75.159554
",
       "      5394
",
       "      0
",
       "      One Way
",
       "      Walk-up
",
       "    
",
       "    
",
       "      2
",
       "      144361830
",
       "      15
",
       "      2017-07-01 00:06:00
",
       "      2017-07-01 00:21:00
",
       "      3006
",
       "      39.952202
",
       "      -75.203110
",
       "      3101
",
       "      39.942951
",
       "      -75.159554
",
       "      3331
",
       "      30
",
       "      One Way
",
       "      Indego30
",
       "    
",
       "    
",
       "      3
",
       "      144361831
",
       "      15
",
       "      2017-07-01 00:06:00
",
       "      2017-07-01 00:21:00
",
       "      3006
",
       "      39.952202
",
       "      -75.203110
",
       "      3101
",
       "      39.942951
",
       "      -75.159554
",
       "      3515
",
       "      30
",
       "      One Way
",
       "      Indego30
",
       "    
",
       "    
",
       "      4
",
       "      144361828
",
       "      30
",
       "      2017-07-01 00:07:00
",
       "      2017-07-01 00:37:00
",
       "      3046
",
       "      39.950119
",
       "      -75.144722
",
       "      3101
",
       "      39.942951
",
       "      -75.159554
",
       "      11913
",
       "      0
",
       "      One Way
",
       "      Walk-up
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "     trip_id  duration           start_time             end_time  \
",
       "0  144361832        12  2017-07-01 00:04:00  2017-07-01 00:16:00   
",
       "1  144361829        31  2017-07-01 00:06:00  2017-07-01 00:37:00   
",
       "2  144361830        15  2017-07-01 00:06:00  2017-07-01 00:21:00   
",
       "3  144361831        15  2017-07-01 00:06:00  2017-07-01 00:21:00   
",
       "4  144361828        30  2017-07-01 00:07:00  2017-07-01 00:37:00   
",
       "
",
       "   start_station  start_lat  start_lon  end_station    end_lat    end_lon  \
",
       "0           3160  39.956619 -75.198624         3163  39.949741 -75.180969   
",
       "1           3046  39.950119 -75.144722         3101  39.942951 -75.159554   
",
       "2           3006  39.952202 -75.203110         3101  39.942951 -75.159554   
",
       "3           3006  39.952202 -75.203110         3101  39.942951 -75.159554   
",
       "4           3046  39.950119 -75.144722         3101  39.942951 -75.159554   
",
       "
",
       "   bike_id  plan_duration trip_route_category passholder_type  
",
       "0    11883             30             One Way        Indego30  
",
       "1     5394              0             One Way         Walk-up  
",
       "2     3331             30             One Way        Indego30  
",
       "3     3515             30             One Way        Indego30  
",
       "4    11913              0             One Way         Walk-up  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# code here
",
    "import pandas as pd
",
    "data = pd.read_csv('data/indego-trips-2017-q3.csv')
",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(276785, 14)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__C2.__ _(5 pts)_ Now, let's start things out by counting the number of trips that each bike has in total, using pandas `df.groupby()` to group the trips, and a counter, `NumBikes`, to store the number of bikes, $n$, rented $x$ times in the quarter, $n(x)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# code here
",
    "from collections import Counter
",
    "grouped = data.groupby('bike_id').agg({"trip_id": "count"})
",
    "NumBikes = Counter()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
",
      "  """Entry point for launching an IPython kernel.
"
     ]
    }
   ],
   "source": [
    "for x,y in grouped.reset_index().as_matrix():
",
    "    NumBikes[x] = y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__C3.__ _(5 pts)_ Now that we've got our bikes counted up, let's compute the empirical probabilities:
",
    "
",
    "$$P(x) = P(\text{a bike is rented }\:x\:\text{ times in a quarter}) = 
",
    "\frac{n(x)}{\sum n(x)}.$$
",
    "
",
    "We already have $n(x)$ in our `Counter()` from __C2__, so let's start by turning its keys and values into numpy arrays (vectors), `n`, and `x`. After this is done, we can make the probabilities, `p`,  from a scalar product of `n`: divide it by its sum."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
",
      "  
"
     ]
    }
   ],
   "source": [
    "# code here
",
    "NumBikesVectors = grouped.reset_index().as_matrix()
",
    "n = NumBikesVectors[:, 0]
",
    "x = NumBikesVectors[:, 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np
",
    "p = n/x.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__C4.__ _(2 pts)_ Now it's time to find the average number of times a bike gets rented in a quarter. We'll call this quantity $\lambda$. So far, we've talked about averages of data, e.g., the arithmetic mean of $x$:
",
    "
",
    "$$\overline{x} = \frac{1}{n}\sum_{i=1}^n x_i$$
",
    "
",
    "But what we're now interested in is the average—center—of our probability distribution, $P(x)$. This quantity has a special name: the _expectation of $x$_, which is computed as:
",
    "
",
    "$$E[x] = \sum_{i=1}^nxP(x)$$
",
    "
",
    "This is actually a generalization of arithmetic mean above, if you view the arithmetic mean as utilizing a _uniform_ probability distribution, having equal probability ($1/n$) for each value, $x_i$. Here's the nice part for us: looking at the equation for $E[x]$, we simply have an inner product between $P(x)$ and $x$.

In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here...

Answer To: In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here In [ ]: # code here...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

	trip_id	duration	start_time	end_time	start_station	start_lat	start_lon	end_station	end_lat	end_lon	bike_id	plan_duration	trip_route_category	passholder_type
0	144361832	12	2017-07-01 00:04:00	2017-07-01 00:16:00	3160	39.956619	-75.198624	3163	39.949741	-75.180969	11883	30	One Way	Indego30
1	144361829	31	2017-07-01 00:06:00	2017-07-01 00:37:00	3046	39.950119	-75.144722	3101	39.942951	-75.159554	5394	0	One Way	Walk-up
2	144361830	15	2017-07-01 00:06:00	2017-07-01 00:21:00	3006	39.952202	-75.203110	3101	39.942951	-75.159554	3331	30	One Way	Indego30
3	144361831	15	2017-07-01 00:06:00	2017-07-01 00:21:00	3006	39.952202	-75.203110	3101	39.942951	-75.159554	3515	30	One Way	Indego30
4	144361828	30	2017-07-01 00:07:00	2017-07-01 00:37:00	3046	39.950119	-75.144722	3101	39.942951	-75.159554	11913	0	One Way	Walk-up