Project Specification COMP7025 Social Media Intelligence Aim The Project requires us to analyse social media data using the knowledge obtained from this unit with assistance from a computer based statistical package. For this project, we will focus on Twitter. Method To complete this project: 1. Read through this specification 2. Complete the data analysis required by the specification 3. Write up your analysis using your favourite word processing/typesetting program, making sure that all of the working is shown and that is it presented well. 4. Include the student declaration text on the front page of your report. Please make sure that your name and student number are clearly displayed on the front page. 5. Submit the report as a PDF by the due date. Report Format Once the required analysis is performed, write up the analysis as a report. Remember that the assessor will only see the report and will be marking the analysis based on your report. Therefore the report should contain a clear and concise description of the procedures carried out, the analysis of results, and any conclusions reached from the analysis. The required analysis in this specification covers material presented in lectures and labs. Students should use the computer software R to carry out the required analysis and then present the results from the analysis in the report. 1 Marks This project is worth 30 % of your final grade, and so the project will be marked out of 30. The project consists of six parts where each part contributes equally to your final mark. There are five parts to the project, each will be marked using the following criteria: MarksCriteria Satisfied 0 The method does not lead to insightful analysis. 1 The method is flawed, but the analysis would have provided insight had the method been correct. 2 The correct method leads to partially correct results and analysis. 3 The correct method leads to correct results and analysis. 4 The correct method leads to correct results and analysis, with an insightful aim and conclusion. 5 The correct method leads to correct results and analysis, with an insightful aim and conclusion. Limitations of the analysis are identified and suggestions for further analysis are provided. If a report is submitted late, the maximum mark it can achieve will be reduced by 10% (3 marks) per day. E.g., if a report is submitted five days late, it can receive at most 15 marks. Declaration The following declaration must be included in a clearly visible and readable place on the first page of the report. By including this statement, I the author of this work, verify that: · I hold a copy of this assignment that I can produce if the original is lost or damaged. · I hereby certify that no part of this assignment/product has been copied from any other student’s work or from any other source except where due acknowledgement is made in the assignment. · No part of this assignment/product has been written/produced for us by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned. · I am aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking). · I hereby certify that I have read and understand what the School of Computing and Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning guide for this unit. Note: An examiner or lecturer/tutor has the right not to mark this project report if the above declaration has not been added to the cover of the report. Project Description A social and behavioural research group at Western Sydney University is studying social activists. They have consulted you to investigate the flow of information regarding environmental activist Greta Thunberg on Twit- ter. Researchers have provided a set of tasks below that need completion. The results are to be presented at the International Social and Behaviour Change Communication (SBCC) Summit. Perform this analysis using R with the rtweet and igraph libraries. Use the rtweet documentation to find functions that will assist your analysis: · https://cran.r-project.org/web/packages/rtweet/vignettes/intro.html · https://cran.r-project.org/web/packages/rtweet/rtweet.pdf 1 Followed by Greta Find 12 people followed by Greta that have the most followers. Use only people, not any company’s twitter handles. Examine the twitter accounts and summarise the types of people. 2 Followers of Greta Find the 12 people who follow Greta and have the most followers and examine if they have a positive or negative relationship with Greta based on their tweets. Examine their twitter accounts and summarise the types of people. 3 Bypassing Greta Plot the graph containing people followed by Greta and 12 followers. Identify if any of the found following or followers are friends with each other and add these edges to the graph. Then determine if any of the following and followers should be friends, based on their background, and add those edges to the graph. 4 Graph Statistics Compute the diameter and density of the graph, and neighbourhood overlap of each edge and determine which nodes have the greatest social capital. State if the results are obvious from the graph structure and why. 5 Graph Homophily Compute if there is homophily in the graph. To do this, label each node as either a supporter or non-supporter of Greta using the information gathered in parts 1, 2 and 3. Write out the hypotheses, the test statistic and a conclusions of the test. Use a significance level of α = 0.05. 6 Structural Balance Finally, determine if the signed network is weakly balanced (using hierarchical clustering) and identify if any within or between signed relationships are not as expected. To perform this analysis, first label all existing edges as either positive or negative, based on their association to Greta. Write up a report containing your code and analysis of the data with each section clearly labelled. Clearly annotate your code and make sure to state any conclusions you make from each piece of analysis. The report is being marked using the marking criteria, so make sure that each piece of analysis covers all of the criteria. Remember that you are examining the relationship of twitter users to Greta, so make sure that the conclusion of each section refers back to this. ##ASSIGNMENT SOCIAL MEDIA INTELLIGENCE COMP7025 ##STUDENT_NAME : SUHAS THOTA ##STUDENT_ID : 19914060 version install.packages("rtweet") install.packages("base64enc") install.packages("httpuv") install.packages("rtweet") install.packages("dplyr") install.packages("tidytext") install.packages("tidyr") install.packages("textdata") library("rtweet") library("base64enc") library("httpuv") library("magrittr") library("dplyr") library("textdata") #app="1657696929873301504suhasthota1" #api_key="1ag4NiBTizl4S5vRf40jsYFhH" #api_secret_key="kNPoy4r1spzb7ZaZaB7RoDjrTWucPHxiDdjZDDEDjwGgYR3v9f" #acc_token="2278893654-YcpXyvRhjzdELJwDxUWPBXwYkwgEME6u2afVMbc" #acc_secret_token="4Yutcn8OaSvn6i7xPEZaVTqurWKmeRzVcWH7Vv6pH184t" ### Using the above keys resulted in an API error [403] from Twitter; to avoid this, ##I used the keys supplied in the 6a solutions. Twitterkeys.txt #Authenticating with Twitter API Credentials app='SMIProject_2023' api_key='AagjVq96hOMojkDdc0fz8OJPI' api_secret_key='DWrqQZWe2QDabVKDT5nVped8jqDk6UrPGAmJM74xX1xMIVL6Cf' acc_token='124194957-1fvDtoNyoah7sq92QWFZ8GGsAkmmSl1xWBSgb3E3' acc_secret_token='N29dRKpzRSgt7vCcVj8AFCuwfHUROGStK15X7HMeBWvg4' #generate token create_token( app=app, consumer_key=api_key, consumer_secret=api_secret_key, access_token=acc_token, access_secret=acc_secret_token ) #Retrieving tweets tweets=search_tweets("Greta Thunberg",n=5,include_rts=FALSE) print(tweets) ##################################################################################################################################################################################################### #######################Q1.)Followed by Greta Thunberg ############################################################################################################################################### ##################################################################################################################################################################################################### # Get Greta Thunberg's friends (people followed by Greta) friends_data <- get_friends("gretathunberg",="" n="1000)" #="" extract="" the="" user="" ids="" of="" the="" friends="" friend_ids="">-><- friends_data$to_id="" #="" fetch="" complete="" user="" information="" for="" the="" friends="" full_friends_data="">-><- lookup_users(user="friend_ids)" #="" filter="" out="" company="" accounts="" based="" on="" user="" description="" filtered_friends="">-><- full_friends_data[!grepl("^[a-za-z0-9_]{1,15}$",="" tolower(full_friends_data$description)),="" ]="" ##using="" a="" regular="" expression="" pattern,="" the="" code="" above="" attempts="" to="" filter="" out="" twitter="" accounts="" based="" on="" their="" user="" description.="" ##however,="" the="" pattern="" we="" specified,="" "[a-za-z0-9_]1,15$",="" matches="" sequences="" with="" 1="" to="" 15="" alphabetic="" or="" underscore="" characters.="" ##this="" pattern="" is="" ineffective="" at="" filtering="" out="" company="" accounts="" and="" does="" not="" provide="" meaningful="" results.="" filtered_friends="">-><- full_friends_data[!grepl("company|corporation|organization",="" tolower(full_friends_data$description)),="" ]="" #we="" can="" employ="" a="" different="" strategy="" to="" exclude="" corporation="" accounts="" based="" on="" their="" user="" description="" from="" the="" full_friends_data="" dataframe.="" #this="" code="" searches="" the="" lowercase="" version="" of="" the="" user="" descriptions="" for="" the="" words="" "company,"="" "corporation,"="" or="" "organization"="" using="" the="" grepl()="" function="" and="" a="" regular="" expression="" pattern.="" #the="" negation="" of="" the="" pattern="" by="" the!="" before="" grepl()="" eliminates="" the="" rows="" in="" which="" the="" pattern="" matches.="" #the="" subset="" of="" individuals="" in="" the="" filtered_friends="" dataframe="" who="" are="" not="" corporation="" accounts="" according="" to="" their="" user="" descriptions="" will="" be="" present="" after="" applying="" this="" filter.="" #after="" then,="" we="" can="" continue="" with="" our="" investigation="" or="" research="" of="" these="" users.="" #="" sort="" filtered="" friends="" by="" follower="" count="" in="" descending="" order="" sorted_friends="">-><- filtered_friends[order(-filtered_friends$followers_count),="" ]="" #="" select="" the="" top="" 12="" friends="" with="" the="" most="" followers="" top_friends="">-><- head(sorted_friends,="" 12)="" #="" summarize="" the="" types="" of="" people="" summary(top_friends$description)="" top_friends$description="" filtered_top_friends="">-><- top_friends[complete.cases(top_friends$name,="" top_friends$location,="" top_friends$screen_name,="" top_friends$description),="" ]="" ##group="" the="" desired="" columns="" and="" summarise="" type="" of="" friends="" summary_friends="">-><- filtered_top_friends="" %="">% group_by(name, location, screen_name, description) %>% summarize(Count = n()) %>% ungroup() print(summary_friends) ################################################################################################################################################################ ################################ Q2.) Greta Thunberg Followers ################################################################################################# ################################################################################################################################################################ library(tidytext) #Loads the tidytext package, which provides functions for text mining and analysis. library(dplyr) # Loads the dplyr package, which provides tools for data manipulation and transformation. library(tidyr) #Loads the tidyr package, which provides functions for data tidying and reshaping. #list of Greta Thunberg's followers follower_ids <- get_followers("gretathunberg",="" n="100)" #="" retrieves="" the="" ids="" of="" greta="" thunberg's="" followers="" by="" using="" the="" get_followers="" function="" from="" the="" rtweet="" package.="" it="" retrieves="" 100="" follower="" ids.="" #get="" the="" follower's="" profiles="" and="" sort="" them="" by="" the="" number="" of="" followers:="" follower_profiles="">-><- lookup_users(user="follower_ids$from_id)" #retrieves="" the="" profile="" information="" of="" greta="" thunberg's="" followers="" using="" the="" lookup_users="" function="" from="" the="" rtweet="" package.="" #it="" takes="" the="" follower="" ids="" as="" input="" and="" returns="" their="" profiles.="" sorted_profiles="">-><- follower_profiles[order(follower_profiles$followers_count,="" decreasing="TRUE)," ]="" #sorts="" the="" follower="" profiles="" based="" on="" the="" number="" of="" followers="" in="" descending="" order,="" using="" the="" order="" function.="" #the="" profiles="" with="" the="" highest="" number="" of="" followers="" will="" be="" at="" the="" top.="" top_followers="">-><- head(sorted_profiles,="" 12)="" #selects="" the="" top="" 12="" followers="" from="" the="" sorted="" profiles="" using="" the="" head="" function.="" #these="" are="" the="" followers="" with="" the="" highest="" number="" of="" followers="" themselves.="" view(top_followers)="" #examine="" their="" relationship="" with="" greta="" thunberg="" based="" on="" their="" tweets:="" follower_tweets="">-><- get_timeline(user="top_followers$id_str," n="100)" #retrieves="" the="" timeline="" tweets="" of="" the="" top="" followers="" by="" using="" the="" get_timeline="" function="" from="" the="" rtweet="" package.="" #it="" takes="" the="" user="" ids="" of="" the="" top="" followers="" as="" input="" and="" retrieves="" 100="" tweets="" from="" each="" follower.="" view(follower_tweets)="" colnames(follower_tweets)="" follower_sentiments="">-><- follower_tweets="" %="">% select(in_reply_to_screen_name,text)%>% unnest_tokens(word, text) %>% inner_join(get_sentiments("bing")) %>% count(in_reply_to_screen_name, sentiment) %>% spread(sentiment, n, fill = 0) View(follower_sentiments) #Performs sentiment analysis on the follower tweets. It selects the relevant columns #(in_reply_to_screen_name and text), tokenizes the text using unnest_tokens, joins the sentiment lexicon using inner_join and #calculates the count of each sentiment for each follower. #Finally, it spreads the sentiment counts into separate columns using spread. summary_followers_1 <- data.frame(="" name="table(top_followers$name)," location="table(top_followers$location)," screenname="table(top_followers$screen_name)," description="table(top_followers$description)," stringsasfactors="FALSE" )="" #creates="" a="" data="" frame="" called="" summary_followers_1="" with="" columns="" for="" name,="" location,="" screenname,="" and="" description.="" #it="" uses="" the="" table="" function="" to="" count="" the="" occurrences="" of="" each="" value="" in="" the="" respective="" columns="" of="" the="" top_followers="" dataset.="" summary_followers_2="">-><- subset(summary_followers_1,="" select="-c(Name.Freq," location.freq,="" screenname.freq,="" description.freq))="" #creates="" a="" subset="" of="" summary_followers_1="" called="" summary_followers_2,="" excluding="" the="" columns="" with="" #frequency="" counts="" (name.freq,="" location.freq,="" screenname.freq,="" description.freq).="" view(summary_followers_2)="" #########################################################################################################################################="" ###############################################="" q3.)="" bypassing="" greta="" #####################################################################="" #########################################################################################################################################="" #######################################################################################################="" #="" retrieve="" the="" user="" ids="" of="" greta's="" followers="" and="" the="" people="" she="" follows="" follower_ids="">-><- get_followers("gretathunberg",="" n="1000)$from_id" following_ids="">-><- get_friends("gretathunberg",="" n="1000)$to_id" #="" get="" the="" profiles="" of="" greta's="" followers="" and="" the="" people="" she="" follows="" follower_profiles="">-><- lookup_users(user="follower_ids)" following_profiles="">-><- lookup_users(user="following_ids)" #="" extract="" the="" screen="" names="" of="" the="" followers="" and="" the="" people="" greta="" follows="" follower_screen_names="">-><- follower_profiles$screen_name="" following_screen_names="">-><- following_profiles$screen_name="" #="" find="" common="" screen="" names="" between="" followers="" and="" following="" common_screen_names="">-><- intersect(follower_screen_names,="" following_screen_names)="" ###="" there="" are="" no="" common="" friends="" between="" people="" following="" greta="" and="" people="" whom="" greta="" is="" following,="" ###="" for="" a="" border="" perscpective="" to="" see="" if="" there="" are="" any="" indirect="" connections="" or="" shared="" interests="" among="" them,="" we="" are="" investigating="" further="" considering="" factors="" like="" shared="" locations,="" similar="" interests="" or="" common="" affiliations="" ##="" #="" retrieve="" the="" user="" ids="" of="" greta's="" followers="" and="" the="" people="" she="" follows="" follower_ids="">-><- get_followers("gretathunberg",="" n="1000)$from_id" following_ids="">-><- get_friends("gretathunberg",="" n="1000)$to_id" #="" get="" the="" profiles="" of="" greta's="" followers="" and="" the="" people="" she="" follows="" follower_profiles="">-><- lookup_users(user="follower_ids)" following_profiles="">-><- lookup_users(user="following_ids)" #="" extract="" the="" screen="" names="" and="" locations="" of="" the="" followers="" and="" the="" people="" greta="" follows="" follower_screen_names="">-><- follower_profiles$screen_name="" follower_locations="">-><- follower_profiles$location="" following_screen_names="">-><- following_profiles$screen_name="" following_locations="">-><- following_profiles$location="" #="" find="" common="" locations="" between="" followers="" and="" following="" common_locations="">-><- intersect(follower_locations,="" following_locations)="" #="" filter="" profiles="" based="" on="" common="" locations="" follower_profiles_common="">-><- follower_profiles[follower_locations="" %in%="" common_locations,="" ]="" following_profiles_common="">-><- following_profiles[following_locations="" %in%="" common_locations,="" ]="" #="" identify="" connections="" between="" followers="" follower_friends="">-><- get_friends(users="follower_profiles_common$id_str," n="250)" follower_friends_common="">-><- follower_friends[follower_friends$to_id="" %in%="" follower_profiles_common$id_str,="" ]="" #="" identify="" connections="" between="" following="" following_friends="">-><- get_friends(users="following_profiles_common$user_id," n="250)" #="" add="" edges="" to="" the="" graph="" for="" followers="" and="" following="" connections="" edges_followers="">-><- c(rep(1,="" length(follower_friends_common)),="" match(follower_friends_common$to_id,="" follower_profiles_common$id_str))="" edges_following="">-><- c(match(following_friends$from_id,="" following_profiles_common$user_id),="" rep(1,="" length(following_friends)))="" edges="">-><- c(edges_followers,="" edges_following)="" #="" create="" an="" empty="" graph="" graph="">-><- graph.empty()="" #="" add="" vertices="" for="" greta,="" followers,="" and="" following="" vertex_names="">-><- c("greta="" thunberg",="" follower_profiles_common$screen_name,="" following_profiles_common$screen_name)="" graph="">-><- add_vertices(graph,="" nv="length(vertex_names)," name="vertex_names)" #="" add="" edges="" to="" the="" graph="" graph="">-><- add_edges(graph, edges, directed = false) # determine if any of the followers and following should be friends based on their background # you can add logic here based on your criteria for determining friendship # print the graph print(graph) 1 ##assignment social media intelligence comp7025 ##student_name : suhas thota ##student_id : add_edges(graph,="" edges,="" directed="FALSE)" #="" determine="" if="" any="" of="" the="" followers="" and="" following="" should="" be="" friends="" based="" on="" their="" background="" #="" you="" can="" add="" logic="" here="" based="" on="" your="" criteria="" for="" determining="" friendship="" #="" print="" the="" graph="" print(graph)="" 1="" ##assignment="" social="" media="" intelligence="" comp7025="" ##student_name="" :="" suhas="" thota="" ##student_id="">- add_edges(graph, edges, directed = false) # determine if any of the followers and following should be friends based on their background # you can add logic here based on your criteria for determining friendship # print the graph print(graph) 1 ##assignment social media intelligence comp7025 ##student_name : suhas thota ##student_id :>->->