Create Optimal Hotel Recommendations Data set: found at Kaggle:Expedia Hotel Recommendations& Train.csv Requirements: ·Write a one-page summary of your approach in getting to your prediction...

1 answer below »


Create Optimal Hotel Recommendations




Data set:found at Kaggle:Expedia Hotel Recommendations& Train.csv




Requirements:


·Write a one-page summary of your approach in getting to your prediction methods.


·Use a combination ofR and Pythonin your coding answer inR Markdown file.




Task:


All online travel agencies are scrambling to meet the Artificial Intelligence driven personalization standard set by Amazon and Netflix. In addition, the world of online travel has become a highly competitive space where brands try to capture our attention (and wallet) with recommending, comparing, matching, and sharing. For this project, we aim to create the optimal hotel recommendations for Expedia’s users that are searching for a hotel to book. For this project, you need to predict which “hotel cluster” the user is likely to book, given his (or her) search details. In doing so, you should be able to demonstrate your ability to use four different algorithms (of your choice). The data set can be found atKaggle: Expedia Hotel Recommendations. To get you started, I would suggest you use train.csv which captured the logs of user behavior, and destinations.csv which contains information related to hotel reviews made by users.




Create Optimal Hotel Recommendations Data set: found at Kaggle: Expedia Hotel Recommendations & Train.csv Requirements: · Write a one-page summary of your approach in getting to your prediction methods. · Use a combination of R and Python in your coding answer in R Markdown file. Task: All online travel agencies are scrambling to meet the Artificial Intelligence driven personalization standard set by Amazon and Netflix. In addition, the world of online travel has become a highly competitive space where brands try to capture our attention (and wallet) with recommending, comparing, matching, and sharing. For this project, we aim to create the optimal hotel recommendations for Expedia’s users that are searching for a hotel to book. For this project, you need to predict which “hotel cluster” the user is likely to book, given his (or her) search details. In doing so, you should be able to demonstrate your ability to use four different algorithms (of your choice). The data set can be found at Kaggle: Expedia Hotel Recommendations. To get you started, I would suggest you use train.csv which captured the logs of user behavior, and destinations.csv which contains information related to hotel reviews made by users.
Answered Same DayOct 09, 2021

Answer To: Create Optimal Hotel Recommendations Data set: found at Kaggle:Expedia Hotel Recommendations&...

Kshitij answered on Oct 12 2021
147 Votes
OPTIMAL HOTEL RECOMMENDATION.docx
OPTIMAL HOTEL RECOMMENDATION
The task was to predict the hotel cluster a user would book based on his/her booking history. The data given included user’s sea
rch trend, interaction with searches and booking history. Also, various features were also given corresponding to each hotel’s review.
The data provided for the problem had over 9 million data points with 24 feature vectors in train.csv and over 62 thousand data points with 150 feature vectors in destination.csv. Thus, the data available in hand was very big. The main task while choosing the correct algorithm was to manage this data. Also, the pre-processing steps showed a uniform distribution of hotel cluster choice thus dimensionality reduction was the main focus.
1. Data Preprocessing: We assumed that people who made more than 20 bookings were actually travel agents and thus didn’t represent the target population. Thus, they were removed from the data. People who didn’t make bookings and were click counts also weren’t a part of our target population and they also were removed. There were three columns which had date associated with them. This information as it is wasn’t useful, so we extracted month and year from them as additional features and the original features were dropped. After chi square test and PCA we came to conclusion that our current feature vector except customer id was a good final data.
2. Data Splitting: after combining the destination data through the common key, the data was split into a ratio of 3:1 in train and test set.
3. Modeling: We tried 4 different approaches toward model for the task. Each model’s accuracy was judged through the confusion matrix built upon the test data set. Since the data still had few abnormalities, we didn’t see exceptional performance by any model. The task could be called a multiclass classification problem. Through initial data preprocessing we found that...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here