Microsoft Word - BUS5CA Assignment1 2019 [ XXXXXXXXXX].docx 1 BUS5CACustomerAnalyticsandSocialMedia Semester22019 Assignment1...

dataset


Microsoft Word - BUS5CA Assignment1 2019 [20190822].docx 1 BUS5CACustomerAnalyticsandSocialMedia Semester22019 Assignment1 SocialMediaAnalysisforUnderstandingCustomerPreferencesandSentiments ReleaseDate:22ndAugust2019 DueDate:9thSeptember2019@9:00am AssignmentType:Individual Weight:30% FormatofSubmission:Areport(electronicform)and electronicsubmissionsofprojectfiles (SASprojectfilesandRscripts)inLMSsite. LearningObjective: ThelearningobjectiveofAssignment1istofurtherdevelopyourunderstandingandskillsonsocial mediaanalyticsviaperforminganalysisontwocasestudies: 1. CaseStudyA:youwillworkasasocialmarketinganalystinaconsultingcompanyto uncover the impactsofonlineadvertisingandcommunicationwithcustomers. The aim of the study is to educate the marketing teams of their clients (in diverse industries) to market their products and/or services on social media to maximise customers’involvement(positiveinterestandsharing).Thecompanyisinterestedin findingouttherelationshipbetweenthekeywords,shares,sentimentsandwhether there is a relationship in different topic categories such as entertainment, technology, business, etc. that are of interest to different clients in various industries. 2. CaseStudyB:youwillbeadatascientistworkingforahotelreviewfirmtodevelopa sentimentanalyticsengine forTwitter,which isused topredict consumers’ review sentiments.Theaimistodevelopbothdictionary-basedandmachinelearning-based sentimentanalyticsscriptsusinganumberofRlibrariesandSASSentimentAnalysis Studio(coveredintheworkshopactivitiesonWeek4andWeek5).Youarerequired tousethedevelopedenginetopredicthotelreviewers’sentimentsandbenchmark variousalgorithmsandanalyticstools. 2 CaseStudyA(15%) Leveragingthepowerofcontentandsocialmediamarketingcanhelpelevatetheaudience andcustomerbase inadramaticway.However,usingsocialmedia formarketingwithout anypreviousexperienceor insightcouldbechallenging. It isvital foramarketingteamto understandsocialmediamarketingfundamentals.Ifacompanypublishexciting,high-quality contentandbuildanonlineaudienceofqualityfollowers,theycanshareitwiththeirown followeraudienceonTwitter,Facebook,LinkedIn,Google+,theirownblogsandmanyother socialmediaplatforms.Thissharinganddiscussingofcontentopenupnewentrypointsfor searchengineslikeGoogletofinditinakeywordsearch.Thoseentrypointscouldgrowto hundredsor thousandsormorepotentialways for people to find a company, productor service online. Finding and understanding the online influencers in themarketwho have qualityaudiencesandare likely tobe interested in theproduct, serviceorbusiness could makeahugepositiveimpact. The consulting company collected information on articles thatwere shared by people on socialmedia.Thedatasetcontainsapproximately39000articlesanda largenumber(with thetotalof31)offeatureswereextractedfromtheHTMLcodeofthearticle,includingthe title and the content of each article. (The description of the dataset is provided as an appendix.)Someofthefeaturesdependoncharacteristicsoftheserviceused,whichcould be analysed based on the meta-data provided: articles have the meta-data, such as keywords, data channel type and the total number of shares (on Facebook, Twitter, Google+, LinkedIn, Pinterest), etc. The data channel categories are: ‘Lifestyle’, ‘Business’, ‘Entertainment’, ‘Social Media’, ‘Technology’, and ‘World’. In addition, several natural languageprocessingfeatureswerealsoextracted. TaskRequirements Asadataanalyticsteammemberfortheconsultancyfirm,youarerequiredtocarryouta numberofdataanalyticstasksfortheconsultingcompanyusingthedatacollected.Youare givenaccesstoasampleofthedatawheresomeofthevariableshavebeenremovedasthey arenotconsideredimportantfortheanalysisofthisassignment. Thecompanyisinterestedinidentifyingforeachdatachannel: • Investigatetheimpactofthearticlepropertiesonsharing; • UsetheSASTextMinerfortextanalysistoidentifykeyfeaturesinthearticlesand analysetheircontributiontowardslowandhighsharing. Toachievetheabove,youneedtocarryoutthefollowingdataanalyticstasks: a) Task1:Exploretheimpactofarticleproperties(7%) Explorethedataand investigatewhatpropertiesofthe articlecorrelatewiththehigh numberofsharesofthearticleonsocialmedia. • Openthedataset‘online_news_popularity.xlsx’usingMicrosoftExcel. • Explore the dataset to understand and manage the six types of data channels (lifestyle,entertainment,bus,socmed,tech,world)andtheassociatingdata.Ineach 3 data channel column, the value of 1 represents that the data in the row is of the correspondingdatachannel. • CopytheseparatedatasetsforeachchanneltodifferentExcelsheets(sortandfilter byeachdatachanneltoseparate). • In each data channel, identify the articles with a high number ofshares (with the thresholdoftop10%inthedataset). • Investigate the following properties and explain how they could have affected the highnumberofshares.Youshouldprovidetheexplanationtosupportyourargument. o Numberoftokensinthetitle o Numberoftokensinthecontent o Wasthearticlepublishedontheweekend o Numberoflinks o Numberofimages o Numberofvideos (Hint:Todothis,youcancreateplotsinRbetweenthecorrespondingcolumnsandthe numberofshares.Youmaywanttoincludeafittedlinetoyourplotstoinvestigatethe correlationforcontinuousvariables.) 2. Task2:UseSASTextMinerforkeywordanalysis(8%) • UsetheSASTextMinertoextractthekeywordsfromthetitle ineachdatachannel. (Hint:Todothis,youcanrefertotheworkshopactivitiesinWeek3andWeek4;by setting‘Title’columnastheonly‘Text’roleinthevariablesetting.) • Whatarethehighlyused(top10)topicsineachcategory?UsetheSASResultwindow toexplainyouranswers. (Hint:‘Topic’columnwillneedtobesetastheonly‘Text’role.) • Are there common topics which span across data channels and relate to a high numberofsharesandalownumberofshares?UsethewholedatasetintheSASText Minertoidentifytherelationship.Youshouldprovidetheexplanationtosupportyour argument. (Hint:Usethewholedatasetto identifythearticleswiththehighnumberofshares and the lownumberof shares –byusing appropriate thresholdswith the top10% and thebottom10% in thedataset. Separate thedatasetusingExcelbasedon this beforetheanalysisandusethesetwodatasetstoanalysethecommontopicsineach of them. In this question, pleaseuse ‘Title’ columnas theonly ‘Text’ role for topic modelling.) Youarerequiredto: a) PrepareareportfortheCaseStudyAwithalltheanalyticsresultstotheabovetwo key tasks. (You can use an appendix for any additional screenshots, figures and tables,whichyoufeelareimportantforthereport).Thereportshouldbenamedas:Assignment1A_Report.doc b) SavetheRscriptafterTask1aboveas:Assignment1A.r c) SavetheSASproject forTask2aboveasAssignment1_Task1.spk.You may zip theSPKs files ifyouhavemultipleofthem.TheSASproject file shouldbe namedas:Assignment1_SAS1.zip 4 CaseStudyB(15%) Sentimentanalysisisthetechniqueaimingtogaugetheattitudesofcustomersinrelationto topics,productsandservicesofinterests.Itisapivotaltechnologyforprovidinginsightsto enhancethebusinessbottomlineincampaigntracking,customer-centricmarketingstrategy and brand awareness. Sentiment analytics approaches are used to produce sentiment categoriessuchas‘positive’,‘negative’and‘neutral’.Morespecifichumanemotionsarealso thetopicofinterest. Therearetwomajorstreamsofmethodstodevelopsentimentanalytics engine:thedictionary-basedandmachinelearning-basedapproaches.Inthisassignment,you arerequiredtoperformsentimentanalyticsbasedonbothapproaches. TaskRequirements Asadatascientist,youarerequiredtoperformanumberofdataanalyticstasks.Youare taskedtodevelopbothdictionary-basedandmachine-learningsentimentanalyticsengines usingRprogramminglanguageandapplyittopredictthesentimentsofhotelreviewtweets fromasampleofdata.Youarealso required touse theSASSentimentAnalysisStudio to comparetheresults. Toachievetheabove,youneedtocarryoutthefollowingdataanalyticstasks: 1. Develop a dictionary-based sentiment analytics engine based on the R library ‘syuzhet’toanalysethedifferentemotionsfromhotelreviewtweets(5%). • Analyseandaggregatetheeightemotions(anger,anticipation,disgust,fear,joy, sadness,surpriseandtrust)fromthehotelreviewtweetsfile‘hotel_tweets.csv’ usingthefunction‘get_nrc_sentiment’. • Youare required toplotachart tovisualise theseemotionsusing theR library ‘ggplot2’. • You should combine both negative and positive tweets into one before conductingtheanalysis. 2. Developamachinelearning-basedmodelusingtheRlibraries‘tm’and‘e1071’as wellasevaluatethepredictiveaccuraciesofSVMclassifier(5%). • Develop R scripts and import the data set ‘hotel_tweets.csv’ for training and testing. • Usethefirst200negativetweetsandthefirst200positivetweetsasthetraining dataset;andusetherestofthe63negativetweetsand63positivetweetsasthe testingdataset. (Hint:Youmayneedtouseas.character()functiontoconvertadataframecolumn fromfactorstocharacters.) • Develop a machine learning-based sentiment analytics engine and predict sentiment categories (only ‘positive’ and ‘negative’) using ‘tm’ and ‘e1071’with theSVMclassifier. • Evaluatethetestingaccuraciesandreportthepredictedresults. 5 3. DevelopastatisticalmodelusingSASSentimentAnalysisstudioandevaluatethe accuracies(5%). • Usethedatafolder:‘hotel_tweets’whichcontain‘negative’and‘positive’tweets fortrainingandtesting. • Build a statistical model using SAS Sentiment Analysis (either simple or advanced),youmaychangeconfigurationsintheadvancedmodeltoobtainthe besttrainingaccuracy. (Hint:RefertotheSASSentimentAnalysisStudiotutorial.) • Evaluateandcomparethetestingaccuraciesfordifferentmodelsandreportthe results. • ComparethisresultwiththepreviouspredictiveresultsusingRanddiscuss. Youarerequiredto: a) PrepareareportforCaseStudyBwithalltheanalyticsresultstotheabovethreekey tasks. (You can use an appendix foranyadditional screenshotswhichyoufeel are importantforthereport).Thereportshouldbenamedas:Assignment1B_Report.doc b) SavetheRscriptafterTask2aboveas:Assignment1B.r c) SavetheSASSentimentStudioprojectas:Assignment1_SAS2.zip Important:Youshouldsubmitallthereports,RscriptsandSASSentimentStudioproject viatheLMSAssignment1submissionlink. ReportGuidelines 1. The report should consist of a table of contents, an introduction, and logically organisedsections/topics(suchas‘casestudyA’,‘casestudyB’),aconclusionandalist ofreferenceswherenecessary. 2. Chooseafittingsequenceofsections/topics forthebodyofthereport.Twosections for the two case studies are essential, you may add other sub-sections deemed relevant. 3. Youmustincludediagrams,tablesandchartsfromtheanalyticssolutionstoeffectively present your results. (Consider using Alt + Print Screen to capture screenshots if needed.) 4. Pagelimit:Foreachcasestudy,five(5)pagesforreportwritingbutnotmorethanten (10)pagesincludingappendices. 5. Reportsshouldbewritten inMicrosoftWord(fontsize11)andsubmittedasaWord file. 6. Finalsubmissionwillcomprisesixseparatefiles: a. Assignment1A_Report.doc(shouldnotbezipped); b. Assignment1B_Report.doc(shouldnotbezipped); c. Assignment1A.r; d. Assignment1B.r; e. Assignment1_SAS1.zip; f. Assignment1_SAS2.zip. 6 MarkingRubrics Agradewillbeawardedtoeachofthetasksandthenanoverallmarkdeterminedforthe entireassessment.Therubricbelowgivesyouanideaofwhatyoumustachievetoearna certain‘grade’. Asageneralrule,tomeeta‘C’,youmustfirstsatisfytherequirementsofa‘D’.Andforan ‘A’, youmust first satisfy the requirementsof a ‘B’,whichmustof course firstmeet the requirementsofa‘C’andsoon. Themarkingrubricforthisassignmentisgivenbelow. Criterion Pass Credit Distinction HighDistinction Casestudyone: Limitedeffortto Faireffortto Excellenteffortto Exceptionaleffortto Impactofarticle structureandpresent structureandpresent structureandpresent structureandpresent properties informationand informationand informationand informationand (7marks) insights. insights. insights. insights. Casestudyone:Use Limitedeffortto Faireffortto Excellenteffortto Exceptionaleffortto SASTextMinerfor structureandpresent structureandpresent structureandpresent structureandpresent keywordanalysis informationand informationand informationand informationand (8marks) insights. insights. insights. insights. Limitedknowledgeof Fairknowledgeof Excellentknowledge Comprehensive SASTextMiner. SASTextMiner. ofSASTextMiner. knowledgeofSAS TextMiner. Casestudytwo: Developdictionary- basedsentiment analyticengineand analyseemotions (5marks) Limitedeffortto Faireffortto Excellenteffortto Exceptionaleffortto structureandpresent structureandpresent structureandpresent structureandpresent insightsforemotions insightsforemotions insightsforemotions insightsforemotions fromtweets. fromtweets. fromtweets. fromtweets. Limitedknowledgeof Fairknowledgeofthe Excellentknowledge Comprehensive theRprogramming. Rprogramming. oftheR knowledgeoftheR programming. programming. Casestudytwo: Developmachine learning-based sentiment analyticengineand evaluatepredictive accuraciesusingR (5marks) Limitedeffortto Faireffortto Excellenteffortto Exceptionaleffortto structureandpresent structureandpresent structureandpresent structureandpresent informationand informationand informationand informationand insights. insights. insights. insights. Limitedknowledgeof Fairknowledgeofthe Excellentknowledge Comprehensive theRprogramming. Rprogramming. oftheR knowledgeoftheR programming. programming. Casestudytwo: Limitedeffortto Faireffortto Excellenteffortto Exceptionaleffortto Developsentiment structureandpresent structureandpresent structureandpresent structureandpresent analyticengine informationand informationand informationand informationand usingSASSentiment insights. insights. insights. insights. AnalysisStudio Limitedknowledgeof Fairknowledgeof Excellentknowledge Comprehensive (5marks) SASSentiment SASSentiment ofSASSentiment knowledgeofSAS Studio. Studio. Studio. SentimentStudio. Otherinformation • Standard plagiarism and collusion policy, and extension and special consideration policyofthisuniversityapplytothisassignment. • AcoversheetisNOTrequired.Bysubmittingyourworkonline,thedeclarationonthe university’sassignmentcoversheetisimpliedandagreedtobyyou. 7 Appendix–AttributeInformation 1. This section contains a description of the attributes of the dataset ‘online_news_popularity.xlsx’. {‘nameofthecolumn’:‘description’} 1. url:URLofthearticle(unique) 2. title:Titleofthearticle 3. topic:topicsrelatedtothearticle 4. content:contentofthearticle 5. timedelta:Daysbetweenthearticlepublicationandthedatasetacquisition 6. n_tokens_title:Numberofwordsinthetitle 7. n_tokens_content:Numberofwordsinthecontent 8. n_unique_tokens:Rateofuniquewordsinthecontent 9. n_non_stop_words:Rateofnon-stopwordsinthecontent 10. n_non_stop_unique_tokens:Rateofuniquenon-stopwordsinthecontent 11. num_hrefs:Numberoflinks 12. num_self_hrefs:NumberoflinkstootherarticlespublishedbyMashable 13. num_imgs:Numberofimages 14. num_videos:Numberofvideos 15. average_token_length:Averagelengthofthewordsinthecontent 16. num_keywords:Numberofkeywordsinthemetadata 17. data_channel_is_lifestyle:Isdatachannel'Lifestyle'? 18. data_channel_is_entertainment:Isdatachannel'Entertainment'? 19. data_channel_is_bus:Isdatachannel'Business'? 20. data_channel_is_socmed:Isdatachannel'SocialMedia'? 21. data_channel_is_tech:Isdatachannel'Tech'? 22. data_channel_is_world:Isdatachannel'World'? 23. weekday_is_monday:WasthearticlepublishedonaMonday? 24. weekday_is_tuesday:WasthearticlepublishedonaTuesday? 25. weekday_is_wednesday:WasthearticlepublishedonaWednesday? 26. weekday_is_thursday:WasthearticlepublishedonaThursday? 27. weekday_is_friday:WasthearticlepublishedonaFriday? 28. weekday_is_saturday:WasthearticlepublishedonaSaturday? 29. weekday_is_sunday:WasthearticlepublishedonaSunday? 30. is_weekend:Wasthearticlepublishedontheweekend? 31. shares:Numberofshares 2. Thedescriptionforthedataset‘hotel_tweets.csv’. {‘nameofthecolumn’:‘description’} 1. Negative:negativetweetscontent 2. Positive:positivetweetscontent 3. Thedescriptionforthedataset‘hotel_tweets’folder. Samecontentasthedatasetin(2),groupedasnegativeandpositive
Aug 29, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here