assignment is in ipynb file that I have attached
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Homework 5\n", "\n", "In this week's class, we went through the Recsys Chanllege 2015: \\\n", "https://2015.recsyschallenge.com/challenge.html\\\n", "For this homework, we will work on task 1. So, when you create features, you will need to think on the session level but not item level.\n", "\n", "The click and buy datasets are unploaded to iCollege. These 2 files are sampled down to ~50k buy and ~50k not buy sessions to simplify your homework.\n", "\n", "In this homework, please do feature engineering to create new features from the click and buy datasets. You can use the ideas I provided in the class but I encourage you to be creative and think on your own as well. \n", "\n", "Each feautre you creat will worth 10 points and the maximal points from feature engineerring will cap at 80 points. But do not limit yourself to 8 features because the more features you can create, the better they will help you in your next homework(6) for Machine Learning modeling.\n", "\n", "In the end, you will need to create the Analytics Base Table(ABT) which worth 20 points. In the ABT, you should have each row representing a unique click session but not each session with each item. This is different from my code on github because we are only doing task 1 for this homework." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# loading data\n", "import pandas as pd\n", "\n", "click = pd.read_csv('click_sep.csv', low_memory=False)\n", "\n", "buy = pd.read_csv('buy_sep.csv',low_memory=False)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
|
SessionID |
TimeStamp |
ItemID |
Category |
---|
0 |
9293568 |
2014-09-01 18:07:00.855000+00:00 |
214853225 |
S |
---|
1 |
9293653 |
2014-09-01 10:38:47.087000+00:00 |
214834871 |
S |
---|
2 |
9293653 |
2014-09-01 10:39:49.115000+00:00 |
214849327 |
S |
---|
3 |
9293653 |
2014-09-01 10:40:31.736000+00:00 |
214828970 |
S |
---|
4 |
9293653 |
2014-09-01 10:41:01.640000+00:00 |
214849327 |
S |
---|
\n", "
" ], "text/plain": [ " SessionID TimeStamp ItemID Category\n", "0 9293568 2014-09-01 18:07:00.855000+00:00 214853225 S\n", "1 9293653 2014-09-01 10:38:47.087000+00:00 214834871 S\n", "2 9293653 2014-09-01 10:39:49.115000+00:00 214849327 S\n", "3 9293653 2014-09-01 10:40:31.736000+00:00 214828970 S\n", "4 9293653 2014-09-01 10:41:01.640000+00:00 214849327 S" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "click.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "unique click session\n", "99998\n" ] } ], "source": [ "print('unique click session')\n", "print(len(click.SessionID.unique()))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "\n", "\n", " \n", "