Write a program thatshall calculatethe vocabulary richness of a text in a file and the frequency of the mostcommonword.The vocabulary richness is the number of words in the text divided by the number of distinct words.The frequency of a word is the number of times the word is mentioned in the text divided by the total number of words in the text.
Define and implementclass WordCounterwith two private fieldsString wordandint count, constructorWordCounter(String word), and public methodsString getName(),int getCount(), andvoid addToCounter().
Define and implementclass Corpus(as in text corpus) with one private fieldArrayList words, constructorCorpus(BufferedReaderinfile), and public methodsdouble getVocabularyRichness()andString getMostFrequentWord().
Implement a test program (as thepublic static void mainmethodinCorpus) that reads all files in a specific folder, creates aCorpusobject from each (previously opened) file, and saves the requested statistics into another filestats.csv. You can either create a newCorpusobject for each file or define anArrayListof the corpora.
Each line of the CSV file must consist of three fields separated by commas (but no spaces!): the file name, the vocabulary richness, and the most frequently used word. Run your program on allShakespeare's plays.Submit the CSV file together with the Java file.
Suggestions:
- ArrayList+ WordCounteris not an efficient implementation of aCorpus, but quite acceptable as an exercise. Any "production" implementation should use aHashMap.
- Usefolder.listFiles()to obtain the list of files in the folder. The folder shall be a previously createdFileobject (e.g.,new File(".")for the current folder.)
- Consider anything returned by thenext()method as one word (even if it has punctuation).
- To count the next word, check if the word is already present in the ArrayList(you will have to use a loop). If it is, increment the word's counter. Otherwise, add another WordCounterto the list.
- Convert "words" to the lower case before adding them to the corpus.
- The requested statistics can be obtained by combining the counts of the words (e.g., the sum of all counts is the total number of words in the text).