complete this lab?1 CS0012 Introduction to Computing for the Humanities Project 2 Lab 3 Lab 3 ...

Question

complete this lab?1  CS0012 Introduction to Computing for the Humanities   Project 2 Lab 3  Lab 3  Part 1: Counting unique words per week  Looking over the lists of words you generated at the end of Lab 2, you should see that each list will contain many  duplicates. To perform some meaningful analysis of the New York Times data, we’ll need to keep a count of how many  times each unique word is used per week. To keep these counts, you should modify process_file so that, instead of  returning a list of lists, it returns a list of dictionaries. Each nested dictionary should have each unique word per week  as a key and the corresponding value should be the count of the number of times that that word appears.  Assuming your main function is unchanged from Lab 2, here is the (partial) output from a run of the desired  program when processing nytimes_news_articles_SMALL.txt:  Enter a filename: nytimes_news_articles_SMALL.txt  [{'behsud': 4, 'afghanistan': 6, 'the': 2136, 'first': 61, 'time': 42, 'noor':  6, 'ul-haq': 6, 'died': 13, 'his': 203, 'afghan': 12, 'army': 9, 'outpost': 1,  'was': 293, 'completely': 4, 'cut': 9, 'off': 18, 'by': 189, 'taliban': 14,  'on': 300, 'a': 975, 'bleak': 1, 'southern': 9, 'battleground': 1, 'hundreds':  10, 'of': 908, 'insurgent': 3, 'fighters': 2, 'swept': 3, 'in': 857, 'and':  ...  SKIPPING 1300 LINES OF OUTPUT ...  'pledges': 1, 'brawl': 1, '1995': 1, 'installed': 1, 'camera': 1, 'ordered': 1,  'complained': 1, 'untold': 1, 'peek': 1, 'verdes’s': 1, 'splendor': 1, 'warshaw': 1, 'edits': 1, 'encyclopedia': 1, 'undercover':  1, 'guardian': 1,  'harassing': 1, 'jeff': 1, 'kepley': 2, 'swell': 1, 'majestic': 1, 'watched':  1, 'porch': 1, 'snacks': 1, 'equipment': 1, 'stashed': 1, 'hawk': 1,  'magazine': 1}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}]  Part 2: Using the counts  Now that you have a count for how frequently each word is used each week, you should look through this list of  dictionaries to determine and print out the 5 more commonly used words for each week. Write a function named top5  that:  • Takes one argument named tokens and should be a dictionary containing all of the unique words for a week as  keys, with the frequency of use of each word as the associated value (i.e., one of the nested dictionaries from the  result of process_file).  • Return a dictionary containing 5 key/value pairs. For each of the top 5 most frequently used words in week  number week, that word should appear as a key in the returned dictionary, with the associated value being the  number of times that it is used in week number week.  Once you have top5 working, you should modify your main function so that it:  2  1. Calls get_filename and assigns the result to the variable name fname.  2. Calls process_file with fname and assigns the result to the variable name weekly_tokens.  3. For each week number (0-10, inclusive), call top5 with the nested dictionary from weekly_tokens for the current  week number. Then, print out the result.  Here is the (partial) output from a run of the desired program when processing nytimes_news_articles_FULL.txt:  Enter a filename: nytimes_news_articles_FULL.txt   Top five words for Week0:  the: 21734   a: 9276   to: 8972   of: 8773   and: 8258  Top five words for Week1:  the: 42232   a: 19222   to: 18207   of: 18182   and: 17002  ...  SKIPPING 8 WEEKS OF OUTPUT  ...  Top five words for Week10:  the: 42216   a: 18665   to: 17949   of: 17768   and: 16928  NOTE: Slight differences in the counts output by your program could be due the use of slightly different cleaning  (from Part 2 of Lab 2).  As you can see, our initial analysis is... relatively boring. But this is to be expected. “the”, “a”, “to”, “of”, and “and”  are some of the most commonly used words in the English language. In order to pick out words that are actually  relevant per week, we can use TF-IDF. Now, you won’t be expected to implement TF-IDF to complete Project 2. Once you have completed all parts of the lab, be sure to show your work to the lab instructor.  https://en.wikipedia.org/wiki/Tf%E2%80%93idf https://en.wikipedia.org/wiki/Tf%E2%80%93idf 3  Rubric  Your program will be evaluated according to the following rubric:  get_filename works as specified 10  process_file works as specified 25  top5 works as specified 30  Counts from top 5 per week are reasonably close 15  All functions are defined in global scope 10  All other code appears inside of a function  (aside from a call to main and any magic value definitions)  10

1 CS0012 Introduction to Computing for the Humanities Project 2 Lab 3 Lab 3 Part 1: Counting unique words per week Looking over the lists of words you generated at the end of Lab 2, you should see...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment