75012/Social Web Analytics/1 Analysis of Twitter about Jon Barilaro.Rmd---
title: "8.1 Analysis of Twitter about hunk like this:
```{r}
tweets = readRDS("tweets.rds")
```
## Including Plots
You can also embed plots, for example:
```{r}
library(twitteR)
library(tm)
library(SnowballC)
library(dplyr)
set.seed(21)
tweets = readRDS("SWA-Group-Project/tweets.rds")
names(tweets)
tweets$text
wordsInTweets = strsplit(tweets$text, "[^a-zA-Z]+")
freq.tab = table(unlist(wordsInTweets))
print(freq.tab)
sort(freq.tab, decreasing = TRUE)[1:20]
CorpusTweets = Corpus(VectorSource(tweets$text))
#gets rid of usernames
CorpusTweets = tm_map(CorpusTweets, content_transformer(function(x) gsub("@(.*?)\\>", "", x)))
#need to get rid of twitter words
CorpusTweets = tm_map(CorpusTweets, removeWords, c("amp", "rt"))
CorpusTweets = tm_map(CorpusTweets, function(x) iconv(x, to = 'UTF8', sub='byte'))
CorpusTweets = tm_map(CorpusTweets, function(x) iconv(x, to = 'ASCII', sub=''))
CorpusTweets = tm_map(CorpusTweets, removeNumbers)
CorpusTweets = tm_map(CorpusTweets, removePunctuation)
CorpusTweets = tm_map(CorpusTweets, stripWhitespace)
CorpusTweets = tm_map(CorpusTweets, tolower)
CorpusTweets = tm_map(CorpusTweets, removeWords, stopwords())
CorpusTweets = tm_map(CorpusTweets, stemDocument)
#gets rid of links -- Needs to be after special characters since punctuation stuffs regex up
CorpusTweets = tm_map(CorpusTweets, content_transformer(function(x) gsub("(http)(.*?)\\>", "", x)))
CorpusTweets[[100]]
documentMatrix = DocumentTermMatrix(CorpusTweets)
TweetMatrix = as.matrix(documentMatrix)
n = nrow(TweetMatrix)
IDF = log(n/colSums(TweetMatrix > 0))
TF = log(TweetMatrix + 1)
WeightedMatrix = TF %*% diag(IDF)
w = colSums(WeightedMatrix)
o = order(w, decreasing = TRUE)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
75012/Social Web Analytics/2 Clustering Tweets.Rmd```{r}
rmarkdown::render(input = "
",
output_dir = "",
knit_root_dir = "")
```
---
title: "8.2 Clustering Tweets"
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see .
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
ajhgfksdgkfjgskjdfgskjhdgfkj
```{r}
tweets = readRDS("tweets.rds")
print(tweets)
```
augjagdjfjshgd
# # Using RMarkdown
```{r}
x = 1 + 2
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
75012/Social Web Analytics/3 Who to follow.Rmd---
title: "8.3 Who to follow"
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see .
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r}
#Load libraries that are required
library(rtweet)
library(twitteR)
#Setting up authentication to access twitter for Q13
key = "COghWIH9VbXOtQI6m9xrSbOVN"
secret = "f0M4L0oMsvGpxcdb7MycEwxyyJC7Hub29hjnVUTEUO7oKeufsx"
access_tokens = "2388475730-lZ4Yqs8Jd3XtKZZQoTOz3wcYjypHbbnHKjwAQE1"
access_secret = "uHZHaALuELTlITgvJXR7PuG1pYGk8297RNGr8cUspO2C9"
setup_twitter_oauth(key, secret, access_tokens, access_secret)
#11
#Reading the tweets.rds file
tweets = readRDS("tweets.rds")
#Attaching the tweets to make it into a global variable
attach(tweets)
#Storing the attached tweets into its own dataset with it now being reordered to descending
top100 <- tweets[order(retweet_count, decreasing=TRUE)[1:100] ,]
#Getting only the retweets texts from the "top100" dataset and storing the retweets into in own variable set.
top100.retweeted=top100[1:100, c("retweet_text")]
#Removing all duplicates tweets to only show the most common tweets that were retweeted
unq.top100Retweeted <-top100.retweeted [! duplicated( top100.retweeted) ]
#Printing the top100 retweets with no duplicates
print(unq.top100Retweeted)
#12
#By using the "top100" dataset, we store all the users that tweeted in the top 100 in a variable set
UsersOfTweets=top100[1:100, c("screen_name")]
#Printing all the users to identify the users
print(UsersOfTweets)
#13
#By using the "UsersOfTweets" variable data set we search all of the users and gather all their data and store it in the variable set "UsersData"
UsersData <- lookupUsers(UsersOfTweets)
#Using the "sapply" function we extract the follower counts of all the users and store it in a variable set "followersCounts"
followerCounts = sapply(UsersData, function(x) x$followersCount)
#Using the "sapply" function we extract the statuses counts of all the users and store it in a variable set...