Initial_Project_Directions CourseProject:Your OnlyAssignment DueLastDayofClass WhyaProject • Aprojecthelpbuildaportfolio. • Itforcesyoutodoanalysis (usually)andcommunicate...

1 answer below »
the assignment is analyzing any set of data


Initial_Project_Directions CourseProject:Your OnlyAssignment DueLastDayofClass WhyaProject • Aprojecthelpbuildaportfolio. • Itforcesyoutodoanalysis (usually)andcommunicate yourresults. • Farmoreinterestingtograde thananexam. “It doesn’t matter how great your analysis is unless you can explain it to others: you need to communicate your results.” R for Data Science Hadley Wickham & Garrett Grolemund GeneralProjectGuidelines A)FindaDataSourceandAnalyzeit • YoucansearchKaggle orotherdatasetrepositories) • Work* • Makeyourowndataset. • UseanAPI(likethetwitterAPI)togatherdata(Noteasy) • WebScraping(hard) • ImproveonPreviousAnalysisProjects B)OtherPythonrelatedoptions(pickany) • CodeaMLAlgorithmfromscratch • ImproveaPythonprojectfromanotherclass(doesn’thavetobeanalysis related) • TranslateyouroldcodetoPython GeneralProjectGuidelines • ChooseoptionAorB. Grading • Duetothenatureofdataanalyticsbeingabitsubjectiveand peoplebeingabletochoosebetweendifferentprojectoptions, anyonewhoturnsinaprojectwillgetanA(95%)inthisclass. • StudentsmaygetuptoanA+(100%)iftheprojectisimpressive. • Iwillstillprovidefeedbackandareasforimprovementonyour project. GeneralProjectAdviceOptionA • Thefollowingcoupleofslidesarejustadvice. OptionA:Task1 •Makeaproblemstatement • Afterpickingyourdataset,itis importanttofigureoutwhat problemyouaretryingto solve. OptionA:Task2 • Identifywhomayuseyour result • Inotherwords,figureoutwhat isthepotentialusefulnessof youranalysis OptionA:Task3 •Makesomepreliminarygoals foryourproject(whatmay comeoutofyourwork) OptionA:Task4 • ThinkofsomeSuccessMetrics foryouranalysis • Formachinelearningtasks, accuracycanhelp • HypothesisTesting • Howdoweknowit works/improves OptionA:Task5 •Mentionanyuncertainty/risks thatmaybeachallengeto completeyourproject. • Example:Formachinelearning tasks,ithelpstohavemorethan 250rows. OptionB B)Pythonrelatedoptions(pick any) • CodeaMLAlgorithmfrom scratch • ImproveaPythonprojectfrom anotherclass • TranslateyourcodetoPython OptionB Sincethisisincrediblyopen ended,youwillhavetofigure outwhatyouwanttoshow. Microsoft Word - onlineSyllabus.docx Data Analytics using Python Syllabus Course Number: CSE-41204 Instructor Information Name: Michael Galarnyk Email: [email protected] LinkedIn: https://www.linkedin.com/in/michaelgalarnyk/ Communication Policy You may contact me by email. It usually helps to contact me a couple days before your one project is due as most students tend to ask around that time. Course Information Course Description (Goals and Objectives) In this course, you will learn the rich set of tools, libraries, and packages that comprise the highly popular and practical Python data analysis ecosystem. This course is primarily taught via screen sharing programming videos. Topics taught range from basic Python syntax all the way to more advanced topics like supervised and unsupervised machine learning techniques. Key Topics • Installing Python/Jupyter/IPython on Windows and Mac • Python Basics (variables, strings, simple math, conditional logic, for loops, lists, tuples, dictionaries, etc) • Using the Pandas library to manipulate data (filtering and sorting data, combining files, GroupBy, etc) • Plotting data in Python using Matplotlib and Seaborn • Logistic Regression using Scikit-Learn • Classification and Regression Metrics • Decision Trees using Scikit-Learn • Random Forests (Scikit-Learn) • Clustering Algorithms (K-Means, Hierarchical Clustering) • Dimensionality Reduction (Principal Component Analysis) Page 2 of 5 Course Materials and Textbooks Suggested Texts: None Student Learning Outcomes By the end of this course, students will be able to: a) Interpret trends in data b) Produce a project that they can use as part of their data analytics/science portfolio. Course Schedule While a lot of the students in this class know the basics, reviewing the basics is important even for experienced python programmers. That is why the first two weeks are dedicated to the basics and will be continuously reviewed throughout the remainder of the course during more advanced topics. Session Topic Assignments w/due dates 1 Intro + Setup + Basics (strings, lists, tuples, etc) 2 Tuples, dictionaries, sets, functions 3 Pandas Part 1 4 Pandas Part 2 5 Matplotlib + Logistic Regression 6 Decision Trees 7 Decision Trees + Random Forests 8 Unsupervised Learning (KMeans + dimensionality reduction) 9 Topics of Interest and How to Learn Them* * Lecture about the topics we didn’t cover in this class and how to learn them. Final Project Due Page 3 of 5 Grading and Assignment Information Letter grades are based on the UC San Diego Extension Grading Scale. Your final course grade is based on the percentage of points you have earned. Passing Grades A+ 100% A 90-99% A- 88-89% B+ 86-87% B 83-85% B- 80-82% C+ 76-80% C 71-75% C- 20-70% Weighted Grading Criteria UC San Diego Extension does NOT have a requirement about how instructors weight their grading criteria. I have decided to make nearly 100% of your grade be a project. Details and rationale for this are explained in assignment section of Blackboard. Assignments (Class Project) 100% TOTAL 100% Grading Policies This course can be taken as part of the Python Programming certificate. In order for the class to count towards your certificate it must be taken for a letter grade or as pass/no pass. Classes that are taken as NFC cannot count towards a certificate. You can change your grading option any time BEFORE the last day of class through My Extension. Late Policy: Final Project is due on date specified on course schedule. An assignment is considered late if it is posted or sent after the due date/time. Late assignments will be accepted at the discretion of the instructor and cannot be accepted more than 1 week late. A couple hours or a day late is typically okay. I don’t take off points for late assignments. Page 4 of 5 Assignments Due to the nature of this course (a final project), any type of submission you see fit is typically acceptable. I normally see some variation of: a) .ipynb file b) .ipynb file + powerpoint file c) .ipynb file + report (I highly discourage writing a report as a blog post is better for an online presence. d) .ipynb file + blog post e) .py file f) .py file + powerpoint file The reason for allowing different type of projects is that I want this class to be a way for students to improve themselves as they see fit. Everyone coming into the class has different goals and I allow for people to show me however they want that they learned something or improved on previous knowledge in this class. Discussion Board Feel free to ask questions on the board or on the unlisted youtube videos for the course. Quizzes & Tests No quizzes or tests. UC San Diego Extension Policies and Resources Academic Policies and Procedures Please refer to UC San Diego Extension’s website (Student Resources tab) for specific details about academic policies and procedures: Student Resources. MyExtension Your MyExtension account is your student records portal. Log into MyExtension (https://myextension.ucsd.edu/) to enroll in a course, drop a course, request verification of enrollment, request official transcripts and more. Campus Emergencies In the event of an emergency, information will be posted at UC San Diego Extension (http://extension.ucsd.edu/). Extension students must access the website to find out the status of the emergency situation. Email and or phone lines may not be accessible. Page 5 of 5 Information will be updated online as the situation progresses and an ALL CLEAR will be posted once the situation is resolved. Code of Conduct All participants in a course at UC San Diego Extension are bound by the University of California, Code of Conduct found at Student Conduct Code. Academic Integrity Policy The University is an institution of learning, research, and scholarship predicated on the existence of an environment of honesty and integrity. As members of the academic community, faculty, students, and administrative officials share responsibility for maintaining this environment. It is essential that all members of the academic community subscribe to the ideal of academic honesty and integrity and accept individual responsibility for their work. Academic dishonesty is unacceptable and will not be tolerated at the University of California. Cheating, forgery, dishonest conduct, plagiarism, and collusion in dishonest activities erode the University's educational, research, and social roles. If students who knowingly or intentionally conduct or help another student perform dishonest conduct, acts of cheating, or plagiarism will be subject to disciplinary action at the discretion of UC San Diego Extension. Please refer to UC San Diego Extension website to view this policy: Student Conduct Policy. Access and Accommodations At UC San Diego Extension, we strive to make learning experiences as accessible as possible. If you anticipate or experience physical or academic barriers based on disability, we encourage you to contact the Extension Disability Coordinator to apply for reasonable accommodations. Visit our website: Services for Students with Disabilities. Please note that it is your responsibility to initiate contact with the Disability Coordinator. Phone: 858-822-1366 Email: [email protected]
Answered Same DayNov 02, 2021

Answer To: Initial_Project_Directions CourseProject:Your OnlyAssignment DueLastDayofClass WhyaProject...

Ximi answered on Nov 14 2021
144 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"df = pd.read_csv('new-york-city-airbnb-open-data/AB_NYC_2019.csv')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(48895, 16)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index([u'id', u'name', u'host_id', u'host_name', u'neighbourhood_group',\n",
" u'neighbourhood', u'latitude', u'longitude', u'room_type', u'price',\n",
" u'minimum_nights', u'number_of_reviews', u'last_review',\n",
" u'reviews_per_month', u'calculated_host_listings_count',\n",
" u'availability_365'],\n",
" dtype='object')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
idnamehost_idhost_nameneighbourhood_groupneighbourhoodlatitudelongituderoom_typepriceminimum_nightsnumber_of_reviewslast_reviewreviews_per_monthcalculated_host_listings_countavailability_365
02539Clean & quiet apt home by the park2787JohnBrooklynKensington40.64749-73.97237Private room149192018-10-190.216365
12595Skylit Midtown Castle2845JenniferManhattanMidtown40.75362-73.98377Entire home/apt2251452019-05-210.382355
23647THE VILLAGE OF HARLEM....NEW YORK !4632ElisabethManhattanHarlem40.80902-73.94190Private room15030NaNNaN1365
33831Cozy Entire Floor of Brownstone4869LisaRoxanneBrooklynClinton Hill40.68514-73.95976Entire home/apt8912702019-07-054.641194
45022Entire Apt: Spacious Studio/Loft by central park7192LauraManhattanEast Harlem40.79851-73.94399Entire home/apt801092018-11-190.1010
\n",
"
"
],
"text/plain": [
" id name host_id \\\n",
"0 2539 Clean & quiet apt home by the park 2787 \n",
"1 2595 Skylit Midtown Castle 2845 \n",
"2 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632 \n",
"3 3831 Cozy Entire Floor of Brownstone 4869 \n",
"4 5022 Entire Apt: Spacious Studio/Loft by central park 7192 \n",
"\n",
" host_name neighbourhood_group neighbourhood latitude longitude \\\n",
"0 John Brooklyn Kensington 40.64749 -73.97237 \n",
"1 Jennifer Manhattan Midtown 40.75362 -73.98377 \n",
"2 Elisabeth Manhattan Harlem 40.80902 -73.94190 \n",
"3 LisaRoxanne Brooklyn Clinton Hill 40.68514 -73.95976 \n",
"4 Laura Manhattan East Harlem 40.79851 -73.94399 \n",
"\n",
" room_type price minimum_nights number_of_reviews last_review \\\n",
"0 Private room 149 1 9 2018-10-19 \n",
"1 Entire home/apt 225 1 45 2019-05-21 \n",
"2 Private room 150 3 0 NaN \n",
"3 Entire home/apt 89 1 270 2019-07-05 \n",
"4 Entire home/apt 80 10 9 2018-11-19 \n",
"\n",
" reviews_per_month calculated_host_listings_count availability_365 \n",
"0 0.21 6 365 \n",
"1 0.38 2 355 \n",
"2 NaN 1 365 \n",
"3 4.64 1 194 \n",
"4 0.10 1 0 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
">2.915218e+07\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
idhost_idlatitudelongitudepriceminimum_nightsnumber_of_reviewsreviews_per_monthcalculated_host_listings_countavailability_365
count4.889500e+044.889500e+0448895.00000048895.00000048895.00000048895.00000048895.00000038843.00000048895.00000048895.000000
mean1.901714e+076.762001e+0740.728949-73.952170152.7206877.02996223.2744661.3732217.143982112.781327
std1.098311e+077.861097e+070.0545300.046157240.15417020.51055044.5505821.68044232.952519131.622289
min2.539000e+032.438000e+0340.499790-74.2444200.0000001.0000000.0000000.0100001.0000000.000000
25%9.471945e+067.822033e+0640.690100-73.98307069.0000001.0000001.0000000.1900001.0000000.000000
50%1.967728e+073.079382e+0740.723070-73.955680106.0000003.0000005.0000000.7200001.00000045.000000
75%1.074344e+0840.763115-73.936275175.0000005.00000024.0000002.0200002.000000227.000000
max3.648724e+072.743213e+0840.913060-73.71299010000.0000001250.000000629.00000058.500000327.000000365.000000
\n",
"
"
],
"text/plain": [
" id host_id latitude longitude price \\\n",
"count 4.889500e+04 4.889500e+04 48895.000000 48895.000000 48895.000000 \n",
"mean 1.901714e+07 6.762001e+07 40.728949 -73.952170 152.720687 \n",
"std 1.098311e+07 7.861097e+07 0.054530 0.046157 240.154170 \n",
"min 2.539000e+03 2.438000e+03 40.499790 -74.244420 0.000000 \n",
"25% 9.471945e+06 7.822033e+06 40.690100 -73.983070 69.000000 \n",
"50% 1.967728e+07 3.079382e+07 40.723070 -73.955680 106.000000 \n",
"75% 2.915218e+07 1.074344e+08 40.763115 -73.936275 175.000000 \n",
"max 3.648724e+07 2.743213e+08 40.913060 -73.712990 10000.000000 \n",
"\n",
" minimum_nights number_of_reviews reviews_per_month \\\n",
"count 48895.000000 48895.000000 38843.000000 \n",
"mean 7.029962 23.274466 1.373221 \n",
"std 20.510550 44.550582 1.680442 \n",
"min 1.000000 0.000000 0.010000 \n",
"25% 1.000000 1.000000 0.190000 \n",
"50% 3.000000 5.000000 0.720000 \n",
"75% 5.000000 24.000000 2.020000 \n",
"max 1250.000000 629.000000 58.500000 \n",
"\n",
" calculated_host_listings_count availability_365 \n",
"count 48895.000000 48895.000000 \n",
"mean 7.143982 112.781327 \n",
"std 32.952519 131.622289 \n",
"min 1.000000 0.000000 \n",
"25% 1.000000 0.000000 \n",
"50% 1.000000 45.000000 \n",
"75% 2.000000 227.000000 \n",
"max 327.000000 365.000000 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import seaborn as sns\n",
"sns.set()\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(16, 6))\n",
"sns.boxplot(x=\"neighbourhood_group\", y=\"price\", data=df[df['price'] < 1000])"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(16, 6))\n",
"sns.boxplot(x=\"room_type\", y=\"minimum_nights\", data=df[df['minimum_nights'] > 100])"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(16, 6))\n",
"sns.boxplot(x=\"neighbourhood_group\", y=\"minimum_nights\", data=df[df['minimum_nights'] > 100])"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(16, 6))\n",
"sns.scatterplot(x=\"neighbourhood_group\", y=\"price\", data=df.sort_values('price', ascending=False).head(20))"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('Total Neighbourhoods: ', 221)\n"
]
}
],
"source": [
"# 2\n",
"import numpy as np\n",
"print (\"Total Neighbourhoods: \", len(np.unique(df.neighbourhood)))"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"df_top_prices_by_neighbourhood = df.groupby('neighbourhood').agg({'price': 'mean'}).sort_values('price').reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(25, 10))\n",
"sns.barplot(x=\"neighbourhood\", y=\"price\", data=df_top_prices_by_neighbourhood.head(20))"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
idnamehost_idhost_nameneighbourhood_groupneighbourhoodlatitudelongituderoom_typepriceminimum_nightsnumber_of_reviewslast_reviewreviews_per_monthcalculated_host_listings_countavailability_365
02539Clean & quiet apt home by the park2787JohnBrooklynKensington40.64749-73.97237Private room149192018-10-190.216365
12595Skylit Midtown Castle2845JenniferManhattanMidtown40.75362-73.98377Entire home/apt2251452019-05-210.382355
23647THE VILLAGE OF HARLEM....NEW YORK !4632ElisabethManhattanHarlem40.80902-73.94190Private room15030NaNNaN1365
33831Cozy Entire Floor of Brownstone4869LisaRoxanneBrooklynClinton Hill40.68514-73.95976Entire home/apt8912702019-07-054.641194
45022Entire Apt: Spacious Studio/Loft by central park7192LauraManhattanEast Harlem40.79851-73.94399Entire home/apt801092018-11-190.1010
\n",
"
"
],
"text/plain": [
" id name host_id \\\n",
"0 2539 Clean & quiet apt home by the park 2787 \n",
"1 2595 Skylit Midtown Castle 2845 \n",
"2 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632 \n",
"3 3831 Cozy Entire Floor of Brownstone 4869 \n",
"4 5022 Entire Apt: Spacious Studio/Loft by central park 7192 \n",
"\n",
" host_name neighbourhood_group neighbourhood latitude longitude \\\n",
"0 John Brooklyn Kensington 40.64749 -73.97237 \n",
"1 Jennifer Manhattan Midtown 40.75362 -73.98377 \n",
"2 Elisabeth Manhattan Harlem 40.80902 -73.94190 \n",
"3 LisaRoxanne Brooklyn Clinton Hill 40.68514 -73.95976 \n",
"4 Laura Manhattan East Harlem 40.79851 -73.94399 \n",
"\n",
" room_type price minimum_nights number_of_reviews last_review \\\n",
"0 Private room 149 1 9 2018-10-19 \n",
"1 Entire home/apt 225 1 45 2019-05-21 \n",
"2 Private room 150 3 0 NaN \n",
"3 Entire home/apt 89 1 270 2019-07-05 \n",
"4 Entire home/apt 80 10 9 2018-11-19 \n",
"\n",
" reviews_per_month calculated_host_listings_count availability_365 \n",
"0 0.21 6 365 \n",
"1 0.38 2 355 \n",
"2 NaN 1 365 \n",
"3 4.64 1 194 \n",
"4 0.10 1 0 "
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 171,
"metadata": {},
"outputs": [],
"source": [
"df_host_popularity = df.groupby('host_id').agg({'number_of_reviews': 'sum'})#.reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 172,
"metadata": {},
"outputs": [],
"source": [
"total_reviews = df.number_of_reviews.sum()"
]
},
{
"cell_type": "code",
"execution_count": 173,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1138005"
]
},
"execution_count": 173,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total_reviews"
]
},
{
"cell_type": "code",
"execution_count": 174,
"metadata": {},
"outputs": [],
"source": [
"df_host_popularity['popularity_index'] = df_host_popularity['number_of_reviews']. \\\n",
" apply(lambda x: x/float(total_reviews)*100)"
]
},
{
"cell_type": "code",
"execution_count": 175,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
number_of_reviewspopularity_index
host_id
3731295922730.199736
34403522050.193760
2643213320170.177240
3552431619710.173198
4017610118180.159753
473439817980.157996
1667732613550.119068
688515713460.118277
21951786112810.112565
2359116412690.111511
\n",
"
"
],
"text/plain": [
" number_of_reviews popularity_index\n",
"host_id \n",
"37312959 2273 0.199736\n",
"344035 2205 0.193760\n",
"26432133 2017 0.177240\n",
"35524316 1971 0.173198\n",
"40176101 1818 0.159753\n",
"4734398 1798 0.157996\n",
"16677326 1355 0.119068\n",
"6885157 1346 0.118277\n",
"219517861 1281 0.112565\n",
"23591164 1269 0.111511"
]
},
"execution_count": 175,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_host_popularity.sort_values('popularity_index', ascending=False).head(10)"
]
},
{
"cell_type": "code",
"execution_count": 176,
"metadata": {},
"outputs": [],
"source": [
"df_host_popularity = df_host_popularity.reset_index()\n",
"del df_host_popularity['number_of_reviews']"
]
},
{
"cell_type": "code",
"execution_count": 177,
"metadata": {},
"outputs": [],
"source": [
"df = pd.merge(df, df_host_popularity)#.sort_values('popularity_index', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 178,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(48895, 17)"
]
},
"execution_count": 178,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"40233 Theater District\n",
"40284 Financial District\n",
"40165 Theater District\n",
"40169 Chelsea\n",
"40171 Theater District\n",
"40281 Financial District\n",
"40175 Theater District\n",
"40176 Theater District\n",
"40177 Theater District\n",
"40178 Theater District\n",
"40179 Theater District\n",
"40182 Theater District\n",
"40183 Theater District\n",
"40197 Upper East Side\n",
"40198 Upper East Side\n",
"40220 Financial District\n",
"40221 Financial District\n",
"40280 Financial District\n",
"40223 Financial District\n",
"40283 Financial District\n",
"Name: neighbourhood, dtype: object"
]
},
"execution_count": 179,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 5\n",
"df[df.minimum_nights > 10].sort_values('popularity_index', ascending=False).head(20)['neighbourhood']"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [],
"source": [
"# 6\n",
"df_popular_regions = df.groupby('neighbourhood').agg({'popularity_index': 'sum'}).reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here