Assignment is attached below
Project Specification 1 Project Specification 301116 Social Media Intelligence (Quarter 2, 2019) Due Date: Friday of Week 10 Aim The Project provides us with a chance to analyse the Social Media using the knowledge obtained from this unit with assistance from a computer based statistical package. For this project, we will focus on Twitter. Method To complete this project: 1. Read through this specification 2. Form groups of 3 students. This project is to be done in a group. Group work does not mean the work should be distributed and done independently, rather each member may be allocated a task as a part of the group activity but should accomplish to complete the task on your own. The group should set aside at least one weekend before the due date to meet and select out the best parts of each member’s contribution while creating an integrated solution. 3. Complete the data analysis required by the specification 4. Write up your analysis using your favourite word processing/typesetting program, making sure that all of the working is shown and that is it presented well. 5. Include the student declaration text on the front page of your report. Please make sure that your name and student number is clearly displayed on the front page. 6. Each group should submit one report as a PDF file by the due date. Due date and Submission The project report is due in by 11:59 p.m. on the Friday of week 10. The report must be submitted as a PDF file using the assignment submission facilities in the Project section of 301116 Q2 2019 in vUWS. Report Format Once the required analysis is performed, write up the analysis as a report. Remember that the assessor will only see the report and will be marking the analysis based on your report. Therefore, the report should contain a clear and concise description of the procedures carried out, the analysis of results and any conclusions reached from the analysis. http://www.twitter.com/ 2 The required analysis in this specification covers the material presented in lectures and labs. Students should use the computer software R to carry out the required analysis and then present the results from the analysis in the report. Marks This project is worth 30% of your final grade, and so the project will be marked out of 30. The project consists of five parts where each part contributes equally to your final mark. There are five parts to the project, each will be marked using the following criteria: Marks Criteria Satisfied 0 The method does not lead to insightful analysis. 1 The method is flawed, but the analysis would have provided insight had the method been correct. 2 The method provides the reader with some understanding of the data. 3 The correct method leads to partially correct results and analysis. 4 The correct method leads to correct results and analysis. 5 The correct method leads to correct results and analysis, with an insightful aim and conclusion. Also, the overall quality of the report will be marked using the following criteria: Marks Criteria Satisfied 0 The report is poorly formatted and has little structure. 1 The report is structured to allow the reader to identify each problem. 2-3 The report is clearly structured and formatted. 4 The report is clearly structured and formatted, and correct grammar and mathematical notation is used. 5 The report has the look of a professional report (clearly structured and formatted, correct grammar and mathematical notation, and the report is written so that is flows from start to end, not as a series of five disjoint problems). If a report is submitted late, the maximum mark it can achieve will be reduced by 10% (3 marks) per day. E.g., if a report is submitted five days late, it can receive at most 15 marks. Declaration The following declaration must be included in a clearly visible and readable place on the first page of the report. By including this statement, we the authors of this work, verify that: • I hold a copy of this assignment that we can produce if the original is lost or damaged. • I hereby certify that no part of this assignment/product has been copied from any other student’s work or from any other source except where due acknowledgement is made in the assignment. • No part of this assignment/product has been written/produced for us by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned. 3 • I am aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking). • I hereby certify that we have read and understand what the School of Computing and Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning guide for this unit. Note: An examiner or lecturer/tutor has the right not to mark this project report if the above declaration has not been added to the cover of the report. Project Description The “Let’s all be friends” political party has hired you as a consultant to examine existing relationships between political parties. They want to determine if the political parties are disjoint or if most of the members are friends, with only a few members making the parties seem like enemies. Your team leader has assigned the analysis to you with the following general instructions. The following r libraries are required for manipulating Twitter details: library("rtweet") library("twitteR") The following Twitter information is available as a download file, politicians.RData, on vUWS: Twitter id and the ids of a set of 10 friends (not followers) from each of: • Scott Morrison (Liberal Party of Australia) • Bill Shorten (Australian Labor Party) • Michael McCormack (National Party of Australia) Note: Assuming the file in your current working directory In r, you may load the file using : load("politicians.RData") The following variables will be loaded in memory once you download the file b : The Bill Shorten’s twitter details which was got by using b = getUser("billshortenmp") bFds: Bill Shorten’s friends, max of 300 only. The friends were downloaded by using, say, y =b$getFriends(300) bFds.10MostFollowers.100friends: 100 friends of the top 10 friends in bFds. The top 10 friends in bFds were assumed based on the count of followers of each of the 300 friends of Bill Shorten. Only the top 10 were considered to reduce the complexity of the network. Note that friends are stored in the list y, so the first friend can be accessed using, say, y[[1]]. 4 Similarly s, sFds, sFds.10MostFollowers.100friends represent the values for Scott Morrisson and m, mFds, mFds.10MostFollowers.100friends represent Michael McCormack’s details Question 1: Creating the network from Twitter data a) Find how many friends does each politician have? b) What are their screen names? c) Are there any common friends between Bill Shorten and Scott Morrison? If so, how many? d) Are there any common friends among the three politicians? If so, how many. e) Find the follower count of each user in bFds, sFds and mFds. Who is the most popular based on follower count? f) Find the screen names of each of the users in sFds.10MostFollowers.100friends, mFds.10MostFollowers.100friends and bFds.10MostFollowers.100friends. g) Find all unique users in your system. Note that some friends’ politicians may have common friends and the friends of friends may be common as well. Hence generally the unique friends will be less than the expected value. h) Create an edge list of the screen name of Bill Shorten and the screen name of all his friends. The first column of the edge list should represent a user’s screen name and the second column should represent a user’s friends’ screen name i) Create another edge of the screen name of each friend of Bill Shorten in bFds and his/her/its 100 friends. j) Create an edge list for the whole network by following the principles in (h) and (i). k) Create an undirected graph using the edge list created in (j) using igraph library’s graph.edgelist() function Question 2: Plotting the graph a) Find the vertex in the graph with the highest degree. b) Change the size of each vertex in the graph so that a vertex with a higher degree has larger area than a vertex with a lower degree. c) Get the adjacency matrix from the graph. d) Find the position (i.e. x-y co-ordinate) of each of the three politicians in the adjacency matrix. e) Find the edges in the adjacency matrix, from each politician to his friends. f) Set the colour of edges from Bill Shorten to his friends to Red, from Scott Morrison to his friends to Blue and from Michael Mc Cormack to his friends to Orange g) Create a sub-graph containing only those vertices with degree greater than 3. h) Plot the sub-graph. i) Which politician appears to be most popular? j) Determine if there are friend relationships between the friends Question 3: Analysis of the data a) Assuming each politician has a positive relationship with their friend but each of the three have negative relationships to each other, set the corresponding edges to appropriate values (+1 or -1). b) Compute the density and neighbourhood overlap of each edge and determine which nodes have the greatest social capital. c) Compute if there is homophily between the Labour and combined Liberal/National parties. To do this, assume that any friends of a politician share their political views. Use a significance level of α = 0.05 Write up a report containing your code and analysis of the data with each section clearly labelled. Clearly annotate your code and make sure to state any conclusions you make from each piece of analysis.