The Assignment is on on the file named "Assignment_3-2.pdf" all other necessary files and instructions are over there
Assignment_3 CS1026: Assignment 3 ‐ Sentiment Analysis th Weight: 11% Learning Outcome: By completing this assignment, you will gain skills relating to using functions, complex data structures, nested loops, text processing, file input and output, exceptions in Python, writing code that is used by other programs. Background: With the emergence of Internet companies such as Google, Facebook, and Twitter, more and more data accessible online is comprised of text. Textual data and the computational means of processing it and extracting information is also increasingly more important in areas such as business, humanities, social sciences, etc. In this assignment, you will deal with textual analysis. Twitter has become very popular, with many people “tweeting” aspects of their daily lives. This “flow of tweets” has recently become a way to study or guess how people feel about various aspects of the world or their own life. For example, analysis of tweets has been used to try to determine how certain geographical regions may be voting – this is done by analyzing the content, the words, and phrases, in tweets. Similarly, analysis of keywords or phrases in tweets can be used to determine how popular or unpopular a movie might be. This is often referred to as sentiment analysis. Task: In this assignment, you will write a Python module, called sentiment_analysis.py (this is the name of the file that you should use) and a main program, main.py, that uses the module to analyze Twitter information. In this module, you will create a function that will perform simple sentiment analysis on Twitter data. The Twitter data contains comments from individuals about how they feel about their lives and comes from individuals across the continental United States. The objective is to determine which timezone (Eastern, Central, Mountain, Pacific; see below for more information on how to do this) is the “happiest”. To do this, your program will need to: Analyze each individual tweet to determine a score – a “happiness score”. The “happiness score” for a single tweet is found by looking for certain keywords (which are given) in a tweet and for each keyword found in that tweet totaling their “sentiment values”. In this assignment, each value is an integer from 1 to 10. The happiness score for the tweet is simply the sum of the “sentiment values” for keywords found in the tweet divided by the number of keywords found in the tweet. If there are none of the given keywords in a tweet, it is just ignored, i.e., you do NOT count it. To determine the words in a tweet, you should do the following: o Separate a tweet into words based on white space. A “word” is any sequence of characters surrounded by white space (blank, tab, end of line, etc.). o You should remove any punctuation from the beginning or end of the word. So, “#lonely” would become “lonely” and “happy!!” would become “happy”. o You should convert the “word” into just lower case letters. This gives you a “word” from the tweet. o If you match the “word” to any of the sentiment keywords (see below), you add the score of that sentiment keyword to a total for the tweet; you can just do exact matches. A “counted tweet” is a tweet in which there was at least one matched keyword. The “happiness score” for a timezone is just the total of the scores for all the counted tweets in that region divided by the number of counted tweets in that region; again, if a tweet has NO keywords, then it is NOT counted as a tweet in that timezone. A file called tweets.txt contains the tweets and a file called keywords.txt contains keywords and scores for determining the “sentiment” of an individual tweet. These files are described in more detail below. File tweets.txt The file tweets.txt contains the tweets; one per line (some lines are quite long). The format of a tweet is: [lat, long] value date time text where: [lat, long] ‐ the latitude and longitude of where the tweet originated. You will need these values to determine the timezone in which the tweet originated. value – not used; this can be skipped. date – the date of the tweet; not used, this can be skipped. time – the time of day that the tweet was sent; not used this can be skipped. text – the text in the tweet. File keywords.txt The file keywords.txt contains sentiment keywords and their “happiness scores”; one per line. The format of a line is: keyword, value where: keyword ‐ the keyword to look for. value – the value of the keyword; values are limited to 1, 5, 7 and 10, where 1 represents very “unhappy” and 10 represents “very happy”. (you are free to explore different sets of keywords and values at your leisure for the sheer fun of it!). Determining timezones across the continental United States Given a latitude and longitude, the task of determining exactly the location that it corresponds to can be very challenging given the geographical boundaries of the United States. For this assignment, we simply approximate the regions corresponding to the timezones by rectangular areas defined by latitude and longitude points. Our approximation looks like: p9 p7 p5 p3 p1 p10 p8 p6 p4 p2 So the Eastern timezone, for example, is defined by latitude‐longitude points p1, p2, p3, and p4. To determine the origin of a tweet, then, one simply has to determine in which region the latitude and longitude of the tweet belongs. The values of the points are: p1 = (49.189787, ‐67.444574) p2 = (24.660845, ‐67.444574) p3 = (49.189787, ‐87.518395) p4 = (24.660845, ‐87.518395) p5 = (49.189787, ‐101.998892) p6 = (24.660845, ‐101.998892) p7 = (49.189787, ‐115.236428) p8 = (24.660845, ‐115.236428) p9 = (49.189787, ‐125.242264) p10 = (24.660845, ‐125.242264) Pacific Mountain Central Eastern Functional Specifications: Developing code for the processing of the tweets and sentiment analysis. 1. Your module sentiment_analysis.py must include a function compute_tweets that has two parameters. The first parameter will be the name of the file with the tweets and the second parameter will be the name of the file with the keywords. This function will use these two files to process the tweets and output the results. This function should also check to make sure that both files exist and if either does not exist, then your program should generate an exception and the function compute_tweets should return an empty list (see part 1.d below). a. The function should input the keywords and their “happiness values” and store them in a data structure in your program (the data structure is of your choice, but you might consider a list). b. Your function should then process the file of tweets, computing the “happiness score” for each tweet and computing the “happiness score” for each timezone. You will need to read the file of tweets line by line as text and break it apart. The string processing functions in Python (see Chapter 7) are very useful for doing this. Your program should not duplicate code. It is important to determine places that code can be reused and create functions. Your program should ignore tweets with no keywords and also ignore tweets from outside the time zones. c. Your function, compute_tweets, should return a list of tuples. The tuples contain the results of each of the regions, in order: Eastern, Central, Mountain, Pacific. Each tuple contains two values: (average, count), where average is the average “happiness value” of that region and count is the number of tweets found in that region. Note: if there is an exception from a file name that does not exist, then an empty list should be returned. 2. Your main program, main.py, will prompt the user for the name of the two files – the file containing the keywords and the file containing the tweets. It will then call the function compute_tweets with the two files to process the tweets using the given keywords. Your main program will get the results from compute_tweets (a list) and print the results; it should print the results in a readable fashion (i.e., not just numbers). 3. You are also given a program, driver.py, and some test files. The test files are small files of tweets and keywords that driver.py uses to test your program – that is, it will import your program and will make use of the function compute_tweets. The files tweets1.txt and tweets2.txt are small files with tweets and the files key1.txt and key2.txt contain keywords and “happiness values”. The program driver.py will use these to test your function. You should use the program and these files to test your code. Note: while driver.py does some testing, it is by no means guaranteed to test for all possibilities; you should do some of your own testing. Additional Information For both files, it is advised that when you read in the files you the line below to avoid encoding errors. open("fileName.txt","r",encoding="utf‐8") or open('fileName.txt', encoding='utf‐8', errors='ignore') Non‐functional Specifications: 1. Include brief comments in your code identifying yourself, describing the program, and describing key portions of the code. 2. Assignments are to be done individually and must be your own work. Software may be used to detect cheating. 3. Use Python coding conventions and good programming techniques, for example: Meaningful variable names Conventions for naming variables and constants Use of constants where appropriate Readability: indentation, white space, consistency You should submit the files main.py and sentiment_analysis.py (others are not required). Make sure you attach your python file to your assignment; DO NOT put the code inline in the textbox. Functional specifications: Is the program named correctly for testing, i.e., is the module correctly named sentiment_analysis.py ? Is there a function compute_tweets ? Is there a program main.py which imports and makes use of the module sentiment_analysis.py? Does the program behave according to specifications? Does it work on with the test program, driver.py ? Does the program handle incorrect function names? Is there an effective use of functions beyond compute_tweets ? Is the output according to specifications? Note: A program like driver.py and test files will be used to test your program as well. Non‐functional specifications: as described above. [41.298669629999999, -81.915329330000006] 6 2011-08-28 19:02:36 Work needs to fly by ... I'm so excited to see Spy Kids 4 with then love of my life ... ARREIC [33.702900329999999, -117.95095704000001] 6 2011-08-28 19:03:13 Today is going to be the greatest day of my life. Hired to take pictures at my best friend's gparents 50th anniversary. 60 old people. Woo. [38.809954939999997, -77.125144050000003] 6 2011-08-28 19:07:05 I just put my life in like 5 suitcases [27.994195699999999, -82.569434900000005] 6 2011-08-28 19:08:02 @Miss_mariiix3 is the love of my life [18.41688954, -65.971682740000006] 6 2011-08-28 19:11:58 Wahhhhhh I need to figure out what to do wifff my life #lost [34.01402392, -81.016954580000004] 6 2011-08-28 19:14:33 Ready for a change in my life [44.775352220000002, -93.511740079999996] 6 2011-08-28 19:15:24 My life be like. http://t.co/dimoc0m [49.258952729999997, -123.09881353] 6 2011-08-28 19:15:46 @MetaRayMek EVERY DAY OF MY LIFE [33.929915829999999, -84.509663439999997] 6 2011-08-28 19:16:58 Had the best weekend of my life .. All smile MDR I love you [41.923916200000001, -88.777469199999999] 6 2011-08-28 19:24:18 My life is a moviee. [39.734711050000001, -86.406778689999996] 6 2011-08-28 19:24:38 @iAmXquisite lol I can't help it that I work everyday of my life! Once I get a day off.. you'll be the FIRST person I text :) [32.724921999999999, -96.656328000000002] 6 2011-08-28 19:26:40 Moving forward Gota live my life n stay focused for my dream [33.657056169999997, -86.662976869999994] 6 2011-08-28 19:27:06 Been reading n studying since I woke up n I'll b doing it til I fall asleep......story of my life [32.626591400000002, -83.638168800000003] 6 2011-08-28 19:30:32 Wait a minute maybe I put it the wrong way once squad always squad I jux need to get my life together and make sure I'm str8 y'all know SOE [50.929687190000003, -114.03530239] 6 2011-08-28 19:31:44 you two get me through everything. im so incredibly happy to have you both in my life. I love you so much. seriously. the best people I know [29.9311826, -90.050494999999998] 6 2011-08-28 19:31:53 Snap fitness in da a.m. @iDARR_you I NEED IT IN MY LIFE [36.8385642, -76.105177100000006] 6 2011-08-28 19:33:06 I just sang along to All My Life on 107.7! :D [40.048521000000001, -82.924336800000006] 6 2011-08-28 19:42:16 For the rest of my life, I promise myself I will love me first. [33.749389299999997, -116.9860817] 6 2011-08-28 19:44:14 Top Ramen noodles. For the 5th year in my life. An I still think they're great x) I just wish i could cook em quicker x) [36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol. [37.715399429999998, -89.21166221] 6 2011-08-28 19:45:41 Ate more veggie and fruit than meat for the first time in my life [22.190894440000001, -100.99684750999999] 6 2011-08-28 19:48:11 @Karymitaville you can dance you can give having the time of my life(8) ABBA :D [40.311551649999998, -79.963715829999998] 6 2011-08-28 19:48:55 Haha I have no idea what I want to do with my life. [38.992178320000001, -76.898361480000005] 6 2011-08-28 19:51:15 @PrivilegedQueen lls crazy love lls I'm living my life? [40.748330000000003, -73.878609999999995] 6 2011-08-28 19:52:47 Sometimes I wish my life was a movie ; #unreal I hate the fact I feel lonely surronded by so many ppl [33.93656395, -84.521893829999996] 6 2011-08-28 19:54:05 I let people take advantage of me most of my life but not anymore #thatsreal [37.786221300000001, -122.1965002] 6 2011-08-28 19:55:26 I wish I could lay up with the love of my life And watch cartoons all day. [32.703995999999997, -89.648030000000006] 6 2011-08-28 19:56:27 Homework....story of my life. =T (@ Perry Place) http://t.co/3YhBp8O [37.004904949999997, -86.451382580000001] 6 2011-08-28 19:56:32 I know I have done a lot of mistakes in my life but now I have done what's right n best for me. #clearheadedtweet [37.945367150000003, -87.411140849999995] 6 2011-08-28 20:00:46 My 450th tweet goes out to @fallon_salder for being in my life for 5 years, love youu so much. [42.29619864, -82.88389823] 6 2011-08-28 20:07:56 I wish I had a pepsi slushee in my life [39.944235599999999, -82.962411200000005] 6 2011-08-28 20:11:52 I never laughed that hard alone in my life lol [41.647450020000001, -111.92970578000001] 6 2011-08-28 20:13:07 Hardest mission of my life [39.9603003, -75.238981100000004] 6 2011-08-28 20:13:32 For the rest of my life I promise myself I will love me FIRST genuinely <3 [33.201227439999997,="" -117.24271275]="" 6="" 2011-08-28="" 20:21:59="" sorry="" peppertree...="" just="" kind="" of="" wanting="" a="" little="" sonic="" in="" my="" life!!!="" (@="" sonic="" drive-in)="" http://t.co/ajvudhz="" [34.085503869999997,="" -118.37565850999999]="" 6="" 2011-08-28="" 20:22:42="" hahaha="" my="" life="" #beverlycenter="" [31.9685691,="" -99.958251799999999]="" 6="" 2011-08-28="" 20:23:52="" #mynewfollowers="" whattup?="" nice="" having="" all="" of="" ya'll="" on="" board.="" my="" life's="" a="" trip!="" #respect="" [39.160368390000002,="" -76.723692600000007]="" 6="" 2011-08-28="" 20:27:13="" @smilejusbcz="" welcome="" to="" my="" life="" for="" the="" past="" year.="" [32.437866,="" -80.638589999999994]="" 6="" 2011-08-28="" 20:28:05="" i="" love="" working="" at="" my="" old="" job;="" but="" i="" have="" to="" get="" my="" life="" together="" [30.324021089999999,="" -81.655625760000007]="" 6="" 2011-08-28="" 20:31:19="" i="" don't="" really="" care="" people="" follow="" me="" or="" not="" i'm="" still="" goign="" to="" live="" my="" life="" [39.827189079999997,="" -86.219724409999998]="" 6="" 2011-08-28="" 20:42:38="" @katelynn_rose="" that="" basically="" summarizes="" my="" life.="" [39.282490600000003,="" -75.591431299999996]="" 6="" 2011-08-28="" 20:45:22="" i="" just="" witnessed="" rick="" ankiel="" make="" the="" best="" throw="" i've="" ever="" saw="" in="" my="" life="" dawg...="" i="" almost="" cried="" after="" that...="" [48.939800699999999,="" -122.78351676]="" 6="" 2011-08-28="" 17:19:05="" only="" wearing="" vans="" for="" the="" rest="" of="" my="" life="" [48.939800699999999,="" -122.78351676]="" 6="" 2011-08-28="" 17:19:05="" only="" wearing="" vans="" for="" the="" rest="" of="" my="" life="" [19.367807760000002,="" -99.165478640000003]="" 6="" 2011-08-28="" 17:21:01="" el="" mejor="" buffet="" ever="" in="" my="" life="" jajajajaja="" [43.117619509999997,="" -87.961661210000003]="" 6="" 2011-08-28="" 17:22:08="" i'm="" living="" my="" life="" with="" no="" regrets,="" just="" lessons.="" [35.621957100000003,="" -78.495483300000004]="" 6="" 2011-08-28="" 17:23:00="" soo="" the="" love="" of="" my="" life="" txts="" me="" then="" my="" first="" txts="" me="" right="" after="" him="" ...="" awkward="" [30.479496019999999,="" -91.149228489999999]="" 6="" 2011-08-28="" 17:41:45="" my="" life="" you="" entertainment,="" you="" watch="" it="" while="" i="" live="" it...="" [29.8885696,="" -97.921196320000007]="" 6="" 2011-08-28="" 17:41:50="" my="" life="" is="" complete!="" rt="" "@stevenjotv:="" @nomatiic="" thanks="" u="" !="" i="" love="" u="" 2!"="" [40.005128419999998,="" -82.955168380000003]="" 6="" 2011-08-28="" 17:46:25="" my="" life="" iz="" a="" movie="" and="" looking="" for="" my="" main="" actress="" [40.606001919999997,="" -73.563368839999995]="" 6="" 2011-08-28="" 17:48:13="" @deekug="" i="" wish="" i="" had="" that="" in="" my="" life="" right="" now,="" baked="" ziti="muy" bueno.="" save="" me="" some="" leftovers="" please="" [35.108278089999999,="" -92.458651119999999]="" 6="" 2011-08-28="" 17:48:47="" my="" life="" is="" soo="" much="" better="" when="" your="" by="" my="" side="" [32.769935480000001,="" -96.769793800000002]="" 6="" 2011-08-28="" 17:49:39="" -="" first="" time="" i="" swore="" i'd="" never="" forget="" you,="" this="" time,="" i="" promise="" you="" i="" won't="" spend="" another="" min="" in="" my="" life="" thinking="" of="" you.="" [30.299692799999999,="" -97.742068700000004]="" 6="" 2011-08-28="" 17:51:15="" @realshanderson="" is="" officially="" in="" charge="" of="" approving="" my="" life="" decisions.="" [50.96866765,="" -114.00972145]="" 6="" 2011-08-28="" 17:53:01="" i'm="" not="" going="" to="" text="" you="" so="" you="" don't="" have="" my="" number,="" then="" you="" won't="" come="" back="" into="" my="" life="" whenever="" you="" please.="" [38.257180310000003,="" -85.763450860000006]="" 6="" 2011-08-28="" 17:56:03="" finally="" found="" the="" worlds="" largest="" bat!="" my="" life="" no="" longer="" has="" purpose="" @="" louisville="" slugger="" museum="" &="" factory="" http://t.co/fiytimk="" [40.40726471,="" -80.094566349999994]="" 6="" 2011-08-28="" 17:57:08="" she's="" blowing="" my="" life="" [32.455104300000002,="" -93.769132999999997]="" 6="" 2011-08-28="" 18:03:58="" rt="" @thtgtdmnf_staff:="" fck="" my="" life="" rt="" @irideyourtounqe:="" wat="" #fml="" mean??=""> ohh dang lolss [35.144727199999998, -80.8895421] 6 2011-08-28 18:07:13 Have a chance to live like. Celeb for a day I mean really they need to think of.this like for once. I wanna see my life on tv for once [39.094157699999997, -77.146576300000007] 6 2011-08-28 18:08:28 But it could b worse. Im Jus waiting for things to get better in my life [41.521142810000001, -74.059786549999998] 6 2011-08-28 18:12:01 @nic0ke omg really Nicole I have never felt that way in my life! My chest really hurt and I couldn't breath. It was #scary [29.68442001, -98.114491950000001] 6 2011-08-28 18:12:40 Idk anymore. The love of my life won't talk to anymore, and my bestfriend decided to walk out of my life. Ryan and Josh.<3 ... [37.707042129999998, -97.281977139999995] ...="" [37.707042129999998,="">3 ... [37.707042129999998, -97.281977139999995]>3>