Part 1: Spell Checker App - Enhancement
- Implement the SpellCheck app in Chapter 15.3 of your text. To do this, create a new project in IntelliJ called WordListReader or a name of your choice. Add a Java class to your src directory called SpellCheck.java. DownloadSpellCheck.java
downloadhere and replace the empty code file in IntelliJ.
Follow the tutorial on setting up a file path in IntelliJ using a resources directory.
- Add thewords
downloadandalice30.txt
downloadfile to your new resources directory.
- Test the SpellCheck executable using the provided data files.
- Create your own data files and add them to the resources directory. Try to find other large text files on the internet so that you will have a reasonable dataset. Add a new method with code to test SpellCheck to test your new files in addition to the code already included.
- Submit your .java source code files, screenshots or text files of your test output, and all data files to Canvas.
Part 2: HashSets and TreeSets - Testing Algorithm Performance
1. Patterning your code on the SpellCheck class, add a new class your own.
In the previous exercise, we demonstrated a static method called readWords (String fileName) which reads a file and added the words to a HashSet. In this exercise, we will use other data structures, and then test the performance ofthese alternatives. Before you proceed too far, make sure you read the tutorials to become familiar with HashSet and TreeSet, and how these compare to each other and to ArrayList, with which you are by now, quite familiar.
HashSet offers constant time cost while TreeSet offers log(n)
time cost for such operations. Review the meaning of these Big O performance metrics and which was is faster. 2) HashSet does not maintain any order
of elements while TreeSet elements are sorted in ascending order by default.
Which of these two data structures is likely to be faster according to Big O Notation theory?
Steps:
- Create a new class called HashAndTreeSetTester and add it to your WordListReader project. Your new class should have its own main method.
- Implement a new method readWordsArrayList(String filename)similar to the static method readWords(String filename) method from your SpellChecker class. You can put your new method either inside of SpellChecker so that this class has more reading options. I find it more readable to add the method to SpellChecker, but if you wish it could also be inside HashAndTreeSetTester. The new method will load your words into an ArrayList instead of into a HashSet.
- Implement a second new method called readWordsTreeSet(String filename) and add it to your SpellChecker or HashAndTreeSetTester class.You will now have three reading options: readWordsTreeSet, readWordsArrayList, and readWords from the SpellChecker. If you wish you can change the name readWords to readWordsHashSet to better denote what data structure it uses.
- Download the class fileStopWatch.java
download.Add a new class to your project. Name the class StopWatch. Replace the code in your new class with the code in StopWatch.java. Chapter 14.2 of your textbook demonstrates how to use this class for profiling your code.
- You will need a large text file for testing your new class such as the one you used in Lab 1. Otherwise, you can use the text from the bookWar-and-Peace.txt
download, one of the longest novels ever written.
- Make sure your test text file (which can beWar-and-Peace.txt
download) is installed in your resources directory as before.
- You will call each of your reading methods in sequence from your main class but we want to report the execution time for each algorithm. You will do something like the following each of the three tests you will run. Here is one for the HashSet test. (I suggested you rename readWords to readWordHashSet) but here is uses the original name. You will need to do the same as below for the other two new methodsreadWordsTreeSetandreadWordsArrayList:
System.out.println("Starting Read HashSet Test");long hashSetTime;StopWatch timer = new StopWatch();timer.start();SetString> words=readWordsHashSet("./resources/War-andPeace.txt");
timer.stop();
hashSetTime = timer.getElapsedTime();
timer.reset();System.out.println("Elapsed time for hashset "
+ hashSetTime + " milliseconds. Number of words: " + words.size());
Your program print these statistics for ArrayList,Hashset, and TreeSet cases.
You will probably get better results if you run the ArrayList test first.
7. Comparing these data structures, can you explain the differences in these results? Why does ArrayList have so many more words than the others? What is the biggest difference between ArrayList and the other two Data Structures? Why would you not use ArrayList for a large dictionary project?Which Big-O notation time costs of TreeSet versus HashSet are greater? Why would you choose TreeSet over HashSet? Is the data the same between the two structures? Run your test several times. Do the elapsed times vary significantly between test runs? What does this tell you if anything about best practices for profiling and measuring the comparative efficiency of algorithms?
Include a short document including your thoughts on these questions. Your narrative will count for 20% of your grade.