Objective:Implementakeywordsearchinterfacethatenables:(A) websearchviaaweb search API;(B)localsearchonagivendataset.(A)Websearch (40points): Thesearchinterfaceshouldlookand functionsimilartoa mainstream web search engine. Search results should contain title and snippet information.Titlesshouldbeclickablelinkingtotheircorrespondingpages. You may use either Google, Bing, or Yahoo web search API. You may use any programminglanguageyouprefer.TheyahooAPI.pdf (maybeout-of-dated)isprovided asareferenceonly.Youdonotneedittocompletethisassignment. Yourimplementation shouldbehostedasawebservice.Thefollowinglinkhasinfofor settingupyourTxStateLinuxaccountURL(e.g.,youcouldcreateasimple“helloworld” html fileandnameitindex.html,putitin thepublic_htmldirectoryunderyourhome directory,thenvisithttp://cs.txstate.edu/~NetID).Feelfreetouseadifferent URL. https://cs.txstate.edu/resources/labs/accounts/linux/(B)Localsearch (60points): Youmayuse thesamesearchinterfaceasinPart(A),where youcanusebuttonstoallowtheusertoswitchbetweenwebsearchandlocalsearch. Youmayalsouseaseparateinterface.Ideally,thesearchinterfaceshouldbehostedas awebservicethatallowspublicaccess.NotethatyouruniversityLinuxaccountwillnot work for this purpose because you don’t have installation permission on university servers. You’ll need to set up your own server (e.g., using Apache) and URL. You’re responsible for reading online tutorialsandworking onitindependently.Ifyou fail to host thewebservice,you’llneed toscheduleanin-persondemowithme forgrading, andyourimplementationwillbesubjecttoaminordeduction(only afew points as this isnotoneofthelearningobjectivesforthiscourse.Nonetheless,it’sanintegralpartof areal-lifesearchengineprototypeandit’sausefulskillforCSstudents/ITpractitioners).Forthisimplementation,youmayuseLucene, oranotheropensourceplatform(suchas SolrorElasticsearch)ofyourchoice.You’reresponsibleforreadingonlinetutorialsand workingonitindependently.Theprovideddataset,lyrics.csv,contains50yearsofpop music lyrics (modified from a source file in https://github.com/walkerkq/musiclyrics). It’s up to your own interest (no bonus/credit) to index additional datasets such as Wikipedia and Amazon reviews. For the lyrics.csv dataset, each song is considered a documentcontainingrank,title,year,artistandlyricsinformation. Your search interface should allow the user to enter keywords as queries (similar to Google).Foreachquery,alistofsearchresultsshouldbeclearlydisplayed.Eachsearch result corresponds toa document (song).Foreach search result, show the title, rank, year and artist information (but no actual lyrics) for the corresponding document, as wellasa dynamicallygenerated snippet.The title should bemade clickableand upon clicking, the entire document (including actual lyrics) should be displayed, either on anotherpageorinapop-outwindow. Major web search engines such as Google all display snippets for search results. A snippetisashortsummaryofadocument.ItcanbegenerateddynamicallyinaKWIC (keyword in context) style. Brief explanations about snippets can be found in https://nlp.stanford.edu/IR-book/html/htmledition/results-snippets-1.html (ortextbook Introduction toInformationRetrievalSection8.7Resultssnippetsonpage157).While you may design your own simple algorithms, it’s a much better idea to generate snippets using tools/API already provided by Lucene or Elasticsearch. Search “Lucene highlighter”formoreinformationandstartfromthere.
Submission: Prepareashortreport intxtformatthatincludesthefollowinginformation: • URLofthewebservice.MakesuretheURLworksuntilthegrade isreleased.We assume theserviceworksinastraightforwardway.Otherwisepleaseincludea shortinstructiononhowitworks. Ifyouuse twoseparateURLs forPartAand PartB,providebothofthem. • Write a short summary in free style describing your implementation, observationsandcomments. SubmitthisshortreportasatxtfiletoTracs.Pleasealsosubmitaseparate zipfileto Tracsthatincludes yourmainsourcecode forverificationpurposes.