Text Analytics and Natural Language Processing (NLP)Business insight report
A3: Business Insight Report Due Sunday by 11:59pm Points 4 Submitting a website url, a media recording, or a file upload Available until Dec 7 at 11:59pm Start Assignment For this assignment, you must find an area of interest, collect (you can use the TwitterR library as a last resort) and analyze text data in R using at least 2 frameworks covered in class. (Here are a few examples of what I mean by "framework": N-grams, sentiment analysis, tf-idf, correlograms, classification with Bayes, LDA, .etc ). Your area of interest can be anything that you are passionate about (e.g. I would choose to compare different Tweets from people who are kite-surfers, windsurfers, and surfers - to see what difference there are in those tweets - and sentiment). Submit 1 pdf file that has two parts: 1. 1. A report detailing applied business insight (include any charts that support your business insight). Build a strong argument for a business decision based upon your findings. You should mention at least 2 frameworks, covered in class, that helped you gain the business insight. (750- 1500 words) 2. 2. In the appendix, please include your R code and R code output. Guidelines: PDF submission, the word count should be between 750-1500 (this applies to the 1st part of your pdf file) You must follow HULT’s guidelines for citation and referencing outside resources. Build a strong argument for a business decision based upon your findings. This assignment is worth 20% of your final grade. This is an individual assignment! Do NOT work in teams or groups on this assignment. Your paper will be flagged for similarities to other papers and reported to the Academic Integrity Committee for review. Your report will be graded using the attached grading rubric. Please read the rubric before submitting your work. The rubric includes the following requirements(more details in grading rubric): 1. Collect text or other unstructured data that aligns with your area of interest - you must perform the data collection and imputation yourself (you can use TwitterR library if collecting data becomes impossible) 2. Submit a pdf file that has 2 parts (1. a business insight report that counts between 750 and 1500 words, 2. R code with R output) Business Insight Report Rubric 3. Mentioning at least 2 frameworks covered in class - you should include charts- and explain how they impact your business decision (in the 1st part of your report). 4. Build a strong argument for a business decision based upon your findings. 5. Follow HULT’s guidelines for citation and referencing outside resources. Criteria Ratings Pts -- -- -- Text data or unstructured data collected by student threshold: 4.0 pts 4 pts Student collected text data and/or unstructured data independently, for a selected topic/industry. 0 pts Student did not collect text data nor unstructured data. Usage of topics covered in class, calculations , and charts. threshold: 2.0 pts 4 pts Includes all necessary evidence and/or calculations. In the first part of the report, there are all the elements: - at least 2 frameworks covered in class, -charts, - and explainations to the charts. 3 pts Includes all necessary evidence and/or calculations. In the first part of the report, there will be one of the following that is missing -at least 2 frameworks covered in class, -charts, -and explainations to the charts. 2 pts In the first part of the report, there will be two of the following that are missing -at least 2 frameworks covered in class, -charts, -and explainations to the charts. 1 pts In the first part of the report, there will be three of the following that are missing -at least 2 frameworks covered in class, -charts, - and explainations to the charts. 0 pts Analysis is missing or entirely inadequate. Report Content threshold: 2.0 pts 4 pts Report has both parts: 1.Business insight report with supporting quantitative analysis 2. R code The first part has less than 1500 words 2 pts Report is missing one part: 1.Business insight report with supporting quantitative analysis 2. R code Or the first part has more than 1500 words. 0 pts Report is missing both parts: 1.Business insight report with supporting analysis 2. R code Or one part is missing and the first part has more than 1500 words. Total Points: 4 Criteria Ratings Pts 4 pts 4 pts Presentation Conclusion 4 pts Conclusion is clear, compelling, and supported by quality research and analysis. Includes a review of key points. 3 pts Conclusion is clear and supported by research and analysis. Includes a review of key points. 2 pts Conclusion is not clear and only partially supports the analysis or key points. 1 pts Conclusion is not clear and did not seem to support the analysis or key points. 0 pts Conclusion is either missing or significantly lacking basic analysis. Overall 4 pts A 3 pts B 2 pts C 1 pts D 0 pts F Last Export: Dec 3 at 12:22am Collapse All Export Course Content REQUIRED TEXTBOOKREQUIRED TEXTBOOK Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. Sebastopol, CA: O'Reilly Media. (https://www.amazon.com/Text-Mining-R-Tidy- Approach/dp/1491981652/ref=sr_1_3?ie=UTF8&qid=1543620629&sr=8- 3&keywords=Text+Mining+with+R) Class 1: Intro and getting ready with codeClass 1: Intro and getting ready with code Pre-class work and readings: Read this website to understand what DPLYR does in R (https://dplyr.tidyverse.org/) %>% Read this to get a sense of what a pipe is %>% (https://r4ds.had.co.nz/pipes.html) ---------------------- Topic(s) to be Covered Data Science in R - overview of data science process Shiny - how can we use it to create addiction? dplyr :: new coding technique: PIPING ---------------------- Mentimter: Menti1: R class review (https://www.mentimeter.com/s/653e94a9c019c534bca4188844ee7a4f) https://mycourses.hult.edu/courses/3317729/offline_web_exports https://www.amazon.com/Text-Mining-R-Tidy-Approach/dp/1491981652/ref=sr_1_3?ie=UTF8&qid=1543620629&sr=8-3&keywords=Text+Mining+with+R https://dplyr.tidyverse.org/ https://r4ds.had.co.nz/pipes.html https://www.mentimeter.com/s/653e94a9c019c534bca4188844ee7a4f Menti 2 : Piping (https://www.mentimeter.com/s/ea185df93f88f68772badb3c6da1b80b) Class 2: Structuring the unstructuredClass 2: Structuring the unstructured Complete One Item Pre-class work and readings: Required readings: Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach, Chapters 1 List of libraries needed for the course. - install prior to class Print this file before the session: print - document for tokenizing.pdf Viewed Pre-class video: Episode 1: Unstructured data Nov 18 | 10 pts Pre-class video: Episode 2: Tokenizing data and creating DTM Nov 18 | 10 pts Pre-class video: Episode 2.1: Tokenization and frequencies Nov 18 | 10 pts Upload your R script:: Pre-class video: Episode 2.1: Tokenization and frequencies Nov 18 | 10 pts ---------------------- IN-CLASS SIMULATION !!!!! get ready for this simulation: 1. Have your (smart)phone fully charged 2. Make sure you can connect your (smart)phone to your computer and transfer photos and videos from phone to computer. (do this prior to class) 3. Bring scissors https://www.mentimeter.com/s/ea185df93f88f68772badb3c6da1b80b https://mycourses.hult.edu/courses/3317729/modules/items/74856186 https://mycourses.hult.edu/courses/3317729/modules/items/74653357 https://mycourses.hult.edu/courses/3317729/modules/items/74812050 https://mycourses.hult.edu/courses/3317729/modules/items/74812051 https://mycourses.hult.edu/courses/3317729/modules/items/74812052 https://mycourses.hult.edu/courses/3317729/modules/items/74812053 ---------------------- Topic(s) to be Covered Unstructured data : types, volume, challenges, opportunities Text data :: types, sources, how can the business digest this data 1st text mining code in R :: Tokenizing the BACON Storing, Importing, Managing :: Text data and other unstructured data types ---------------------- Files: usaecondata.csv europeecondata.csv netflix.csv Twitter API.R Downloading Netflix data from tidytuesdayR studnet template.R 2. Day 1 - Text- text formats files studnet template.R 2.1. Day 1 - rtweet - downloaidng rtweet data.R 3. Day1 - bacon tokenizing with proper ggplot - student template.R 4. Day1 - Netflix movies student template.R 5. Day1 - gutenberg project student template.R 5.1 Day 1 correlations for Netflix student template.R 6 Day1 gutenberg and twitter student template R https://mycourses.hult.edu/courses/3317729/modules/items/74653379 https://mycourses.hult.edu/courses/3317729/modules/items/74653380 https://mycourses.hult.edu/courses/3317729/modules/items/74826708 https://mycourses.hult.edu/courses/3317729/modules/items/74653381 https://mycourses.hult.edu/courses/3317729/modules/items/74826575 https://mycourses.hult.edu/courses/3317729/modules/items/74699491 https://mycourses.hult.edu/courses/3317729/modules/items/74699492 https://mycourses.hult.edu/courses/3317729/modules/items/74699493 https://mycourses.hult.edu/courses/3317729/modules/items/74699494 https://mycourses.hult.edu/courses/3317729/modules/items/74699495 https://mycourses.hult.edu/courses/3317729/modules/items/74699496 https://mycourses.hult.edu/courses/3317729/modules/items/74699497 6. Day1 - gutenberg and twitter student template.R -------------- Mentimeter: Menti 1 (https://www.mentimeter.com/s/e99e41fd1744ec2a326430f25513d4e0) Menti 2: (https://www.mentimeter.com/s/27b5b4aa1cc1ee01bd6eb6b13425fc8d) Menti 3 (https://www.mentimeter.com/s/f6f117c34d72ddce0f9da8627eeaab90) Class 3: NLP (Natural Language Processing)Class 3: NLP (Natural Language Processing) Pre-class work and readings: Required readings: Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach, Chapters 5 Watch this video about NLP: NLP explained by Google Research (https://www.youtube.com/watch? v=MNvT5JekDpg&feature=youtu.be) ---------------------- Topic(s) to be Covered DTM VCorpus Mechanics behind NLP (sound waves) ---------------------- Files: https://mycourses.hult.edu/courses/3317729/modules/items/74699497 https://www.mentimeter.com/s/e99e41fd1744ec2a326430f25513d4e0 https://www.mentimeter.com/s/27b5b4aa1cc1ee01bd6eb6b13425fc8d https://www.mentimeter.com/s/f6f117c34d72ddce0f9da8627eeaab90 https://www.youtube.com/watch?v=MNvT5JekDpg&feature=youtu.be MBA.txt MIB.txt 1. Day2 - App text file import and anlysis studnet template.R 2. Day2 - DTM with Netflix student template.R 3. Day2 - VCorpus with Reuters Articles student template.R ------------------------------------ Menitmeter: Menti 1 :: (https://www.mentimeter.com/s/a2292c92e4e8fa22ffce84e8bf68ebc9) ----------------- Code Solutions from class: Downloading Netflix data from tidytuesdayR.R 4. Day1 - Netflix movies.R 5.1 Day 1 correlations for Netflix.R 2. Day 1 - Text- text formats files_xxx.R 2. Day2 - DTM with Netflix.R 3. Day2 - VCorpus with Reuters Articles.R ------------------------------- Practice script :: 5. Day1 - gutenberg project.R https://mycourses.hult.edu/courses/3317729/modules/items/74653395 https://mycourses.hult.edu/courses/3317729/modules/items/74653396 https://mycourses.hult.edu/courses/3317729/modules/items/74699501 https://mycourses.hult.edu/courses/3317729/modules/items/74699502 https://mycourses.hult.edu/courses/3317729/modules/items/74699503 https://www.mentimeter.com/s/a2292c92e4e8fa22ffce84e8bf68ebc9 https://mycourses.hult.edu/courses/3317729/modules/items/74830350 https://mycourses.hult.edu/courses/3317729/modules/items/74830348 https://mycourses.hult.edu/courses/3317729/modules/items/74830349 https://mycourses.hult.edu/courses/3317729/modules/items/74830358 https://mycourses.hult.edu/courses/3317729/modules/items/74830346 https://mycourses.hult.edu/courses/3317729/modules/items/74830347 https://mycourses.hult.edu/courses/3317729/modules/items/74830563 Class 4Class 4 MUST DO before session 4:: Sign up for the free Otter.ai account- a NLP tool for Zoom calls (link below) Sing up for Otter.ai (https://otter.ai/signup) How to use Otter.ai with your online meeting (https://blog.otter.ai/how-to- transcribe-any-video-meetings-with-otter- ai/#:~:text=Transcribe%20my%20video%20meetings&text=1.,you%20and%20your%20co mputer's%20speaker.) Upload your R script with code: Pre-class video: Episode 5: Sentiment lexicons Nov 23 | 10 pts Pre-class video: Episode 5: Sentiment lexicons Nov 23 | 10 pts ---------------------- Pre-class work and readings: Required readings: Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach, Chapter 2 ---------------------- Topic(s) to be Covered NLP Simulation (teams) Post simulation exercises and discussions Sentiment analysis in text mining. ---------------------- Files: https://otter.ai/signup https://blog.otter.ai/how-to-transcribe-any-video-meetings-with-otter-ai/#:~:text=Transcribe%20my%20video%20meetings&text=1.,you%20and%20your%20computer's%20speaker. https://mycourses.hult.edu/courses/3317729/modules/items/74812061 https://mycourses.hult.edu/courses/3317729/modules/items/74812060 0.Day3 - adding your stop words to lexicon student template.R 2. Day3 - sentiment - lexicons student template.R 3. Day3 - Netflix sentiment analysis student template.R --------------- Mentimeter: Menti 1 (https://www.mentimeter.com/s/387f9173761b800d5a24634c0e22d6fa) Menti 2 (https://www.mentimeter.com/s/5186198286ea4c48c59c0562ad735ccd) Menti 3 (https://www.mentimeter.com/s/99101799787ce7a29b7fa0ece58449cc)