given files
Microsoft Word - Assignment1 SP2 2020.ipynb { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#
COMP 5070 Statistical Programming for Data Science\n", "##
Assignment 1\n", "###
DUE: by 11.00pm (CST), Sunday 19 April, 2020" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "•\tThis assignment is worth
25%of your overall grade. \n", "•\t
The assignment is out of 100 marks.To obtain the maximum available marks you should aim to:\n", "\n", " 1. Code all requested components
(60%). \n", " 2. Your written analysis of the output produced by your code
(20%). \n", " 3. Aim for optimised code in terms of computational overhead
(5%). It is not always possible to avoid loops, however you should aim to avoid loops where possible (e.g. use NumPy vectorisation as much as possible). \n", " 4. Use proper coding style
(5%). Code clarity is an important part of your submission. Thus you should choose meaningful variable names and adopt the use of comments - you don't need to comment every single line, as this will affect readability - however you should aim to comment at least each section of code. \n", " 5. Have the code run successfully when I try to run it
(5%). If you have special files not pre- supplied with the assigment, you should provide these as a final part of your submission (ask how if you're unsure). Also, **do not hardcode** your computer path directory into your program - **I should be able to open your .ipynb file and run the code successfully without editing your code.** \n", " 6. Documentation of any code limitations including, but not limited to, the requested functionalities
(5%). \n", " \n", "\n", "•\tAssignments can be submitted as either a .zip file (if you have extra files to supply) or as a single *.ipynb* file using LearnOnline. This includes the coversheet. A partially filled-in coversheet is provided below. You will need to fill in the fields marked in blue. \n", "•\tPlease sign your coversheet (typewritten name acceptable). \n", "•\tWhen answering the following questions, do not include everything you tried that worked/did not work - present only your final solution. \n", "•\tAssignments submitted late, without an extension being granted, will attract a penalty of 10 marks per each day or any part thereof beyond the due date and time. \n", "•\t
*Plagiarism is a specific form of academic misconduct.*Although the University encourages discussing work with others and the Social Forum will support this, ultimately this assignment is to represent your individual work. If plagiarism is found, all parties will be penalised. You should retain copies of all your assignment computer files used during development of the solution to Assignment 1. These files must remain unchanged after submission, for the purpose of checking if required. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "##
Assignment Cover Sheet\n", "\n", " \n", "**This cover sheet needs to be included with your assignment.**
Fields highlighted in blue need to be completed before submission.\n", "\n", "**You can complete the coversheet within this .ipynb file, replaycing the text *(insert ... here ... a line break)* with the requested information. **\n", "\n", "\n", "
**Name:**
*(insert name here and append with two spaces to create a line break)* \n", "
**Student ID:***(insert ID here and append with two spaces to create a line break)* \n", "
**e-mail:***(insert e-mail here and append with two spaces to create a line break)* \n", "**Course code and title:** Statistical Programming for Data Science \n", "**School:** ITMS \n", "**Program Code:** COMP 5070 \n", "**Course Coordinator:** Dr Tim Bogomolov \n", "**Assignment number:** 1\t\n", "**Due date:**
19/04/20 by 11:00pm\n", "**Assignment topic as stated in Outline:** Data manipulation in Python using NumPy and Pandas, including outputting information based on user-driven queries.\n", "\n", "\n", "**Further Information:** (e.g. state if extension was granted and attach evidence of approval, Revised Submission Date)\n", "________________________________________\n", "\n", " \n", "________________________________________\n", "\n", "*I declare that the work contained in this assignment is my own, except where acknowledgement of sources is made.*\n", "\n", "*I authorise the University to test any work submitted by me, using text comparison software, for instances of plagiarism. I understand this will involve the University or its contractor copying my work and storing it on a database to be used in future to test work submitted by others.*\n", "\n", "*I understand that I can obtain further information on this matter at http://www.unisa.edu.au/ltu/students/study/integrity.asp*\n", "\n", "*Note: The attachment of this statement on any electronically submitted assignments will be deemed to have the same authority as a signed statement.*\n", "\n", "\n", "
Signed:*(insert signature here (typed is OK) and append with two spaces to create a line break)* \n", "\n", "
Date:*(insert Date here and append with two spaces to create a line break)* \n", "\n", "**Date received from student:** \n", "\n", "\n", "**Assessment/grade and Comments:** " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "
\n", "\n", "## We Feel Fine! (50 marks)\n", "\n", "The task is dedicated to the memory of the project **[We Feel Fine](http://www.wefeelfine.org)**. This project aimed to provide a data collection engine that periodically scours the Internet to record human emotions or feelings from different blogs from sources such as LiveJournal, Blogger, Flickr, Technorati, Feedster, Ice Rocket and Google. \n", "\n", "We Feel Fine scaned blog posts for occurrences of the phrases \"I feel\" and \"I am feeling\". Once a sentence was found to contain either of these key phrases, the full sentence was saved and scanned to see if it included one of about 5,000 \n", "pre-identified feelings. The full list of feelings (link given in (3) below) contains these valid feelings, as well as the total count of each feeling and the colour assigned to each feeling. For example, the first few lines of the file are:\n", "\n", "
\n", "\n", "The first line can be discarded. The second line contains the feeling (better), the count (872884) and the associated colour in hexadecimal (FFA401). I have provided code in the hints section that will show you how to use this colour. \n", "\n", "**Project website does not work now and you don't need it for the assignment.** You are provided with a zip-file **countries.zip** containing an archived folder **countries** with feelings collected for different countries.\n", "\n", "**For this assignment you are asked to write a program that will analyse and visualise mined feelings from the We Feel Fine data sets based on a default search and then user-driven searches. Note: You do NOT have to search for the phrases \"I feel\" and \"I am feeling\" as We Feel Fine have already done this work for you. We are going to analyse what they found.** \n", "\n", "There are five components in this job: user prompt, data loading, data