您的位置:人工智能 > 大数据 > 使Twitter数据对可乐进行客户情感分析

使Twitter数据对可乐进行客户情感分析

【人工智能网】

先容适口可乐(Coca-Cola)和百事可乐(PepsiCo)是软饮料行业的着名品牌,两家公司均跻身《财富》500强。在竞争猛烈的市场中拥有普遍产物线的公司相互之间存在着猛烈的竞争,并在随后的险些所有垂直产物市场中不停争取市场份额。通过从每家公司的官方推特下载5000条推文来剖析这两家公司的客户情绪,并在R中举行剖析。在这一剖析中,我们可以领会若何从品牌的社交媒体介入(在本例中为推特)中剖析客户情绪。目录涉及的软件包及其应用什么是情绪剖析?祛除文本词云在一天和一周内公布推文推特数据的情绪评分客户推特的情绪剖析结论R中使用的软件包

什么是情绪剖析?情绪剖析是一种文本挖掘手艺,它为文本提供上下文,能够从主观抽象的源质料中明晰信息,借助Facebook、Instagram等社交媒体平台上的在线对话,辅助明晰对品牌产物或服务的社会情绪,推特或电子邮件。众所周知,盘算机不明晰我们的通用语言,为了让他们明晰自然语言,我们首先将单词转换成数字名堂。接下来我们将实验一步一步地去实现这一历程。祛除文本我们已经从Twitter下载了数据集,由于推特的文本形式包罗了链接、hashtags、推特er句柄名称和神色符号,为了删除它们,我们在R中编写了函数ions。删除这些无用信息后,所有文本都将转换为小写,删除英语中没有意义的住手词(如冠词、介词等)、标点符号和数字,然后再将它们转换为文档术语矩阵。文档术语矩阵:是一个矩阵,包罗每个单词在每个文档上泛起的次数。removeURL <- function(x) gsub(“(f|ht)tp(s?)://S ”, “”, x, perl=T)
removeHashTags <- function(x) gsub(“#S ”, “”, x)
removeTwitterHandles <- function(x) gsub(“@S ”, “”, x)
removeSlash <- function(x) gsub(“n”,” “, x)
removeEmoticons <- function(x) gsub(“[^x01-x7F]”, “”, x)
data_pepsi$text <- iconv(data_pepsi$text, to = “utf-8”)
pepsi_corpus <- Corpus(VectorSource(data_pepsi$text))
pepsi_corpus <- tm_map(pepsi_corpus,tolower)
pepsi_corpus <- tm_map(pepsi_corpus,removeWords,stopwords(“en”))
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeHashTags))
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeTwitterHandles))
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeURL))
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeSlash))
pepsi_corpus <- tm_map(pepsi_corpus,removePunctuation)
pepsi_corpus <- tm_map(pepsi_corpus,removeNumbers)
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeEmoticons))
pepsi_corpus <- tm_map(pepsi_corpus,stripWhitespace)
pepsi_clean_df <- data.frame(text = get(“content”, pepsi_corpus))
dtm_pepsi <- DocumentTermMatrix(pepsi_corpus)
dtm_pepsi <- removeSparseTerms(dtm_pepsi,0.999)
pepsi_df <- as.data.frame(as.matrix(dtm_pepsi))
data_cola$text <- iconv(data_cola$text, to = “utf-8”)
cola_corpus <- Corpus(VectorSource(data_cola$text))
cola_corpus <- tm_map(cola_corpus,tolower)
cola_corpus <- tm_map(cola_corpus,removeWords,stopwords(“en”))
cola_corpus <- tm_map(cola_corpus,content_transformer(removeHashTags))
cola_corpus <- tm_map(cola_corpus,content_transformer(removeTwitterHandles))
cola_corpus <- tm_map(cola_corpus,content_transformer(removeURL))
cola_corpus <- tm_map(cola_corpus,content_transformer(removeSlash))
cola_corpus <- tm_map(cola_corpus,removePunctuation)
cola_corpus <- tm_map(cola_corpus,removeNumbers)
cola_corpus <- tm_map(cola_corpus,content_transformer(removeEmoticons))
cola_corpus <- tm_map(cola_corpus,stripWhitespace)
cola_clean_df <- data.frame(text = get(“content”, cola_corpus))
dtm_cola <- DocumentTermMatrix(cola_corpus)
dtm_cola <- removeSparseTerms(dtm_cola,0.999)
cola_df <- as.data.frame(as.matrix(dtm_cola))
词云wordcloud是测试数据的一种示意形式,它通过增添测试数据的巨细来突出显示最常用的单词,该手艺用于将文本可视化为图像,是单词或标签的聚集。在R中,可以使用worldcloud2包来实现,以下是它的输出代码。word_pepsi_df <- data.frame(names(pepsi_df),colSums(pepsi_df))
names(word_pepsi_df) <- c(“words”,”freq”)
word_pepsi_df <- subset(word_pepsi_df, word_pepsi_df$freq > 0)
wordcloud2(data = word_pepsi_df,size = 1.5,color = “random-light”,backgroundColor = “dark”)
word_cola_df <- data.frame(names(cola_df),colSums(cola_df))
names(word_cola_df) <- c(“words”,”freq”)
word_cola_df <- subset(word_cola_df, word_cola_df$freq > 0)
wordcloud2(data = word_cola_df,size = 1.5,color = “random-light”,backgroundColor = “dark”)
百事可乐和适口可乐的推特数据的词云

正如我们所知,词云中的词巨细取决于其在推特中的频率,因此词会不停转变, just, native, right, racism许多泛起在百事可乐客户的推特中,而get和support等词更多地泛起在适口可乐客户的推特中。在一天和一周内公布推文由于推特网络的时间跨度跨越一周,因此我们可以剖析大多数用户活跃或用户在该品牌上公布最多推文的时间和事情日,这可以通过使用ggplot2库的折线图来可视化。下面是与输出一起使用的函数data_pepsi$Date <- as.Date(data_pepsi$created_at)
data_pepsi$hour <- hour(data_pepsi$created_at)
data_pepsi$weekday<-factor(weekdays(data_pepsi$Date),levels=c(“Monday”,”Tuesday”,”Wednesday”,”Thursday”,”Friday”,”Saturday”,”Sunday”))
ggplot(data_pepsi,aes(x= hour)) geom_density() theme_minimal() ggtitle(“Pepsi”)
ggplot(data_pepsi,aes(x= weekday)) geom_bar(color = “#CC79A7”, fill = “#CC79A7”) theme_minimal() ggtitle(“Pepsi”) ylim(0,1800)
data_cola$Date <- as.Date(data_cola$created_at)
data_cola$Day <- day(data_cola$created_at)
data_cola$hour <- hour(data_cola$created_at)
data_cola$weekday<-factor(weekdays(as.Date(data_cola$Date)),levels=c(“Monday”,”Tuesday”,”Wednesday”,”Thursday”,”Friday”,”Saturday”,”Sunday”))
ggplot(data_cola,aes(x= hour)) geom_density() theme_minimal() ggtitle(“Coca-Cola”)
ggplot(data_cola,aes(x=
weekday)) geom_bar(color = “#CC79A7”, fill = “#CC79A7”) theme_minimal()

从上面的图表中,我们可以看到百事可乐和适口可乐在下昼3-4点和破晓1点左右都泛起了峰值,由于人们喜欢在事情无聊或深夜使用社交媒体,这在我们的事情中是显而易见的。

一周内推特的漫衍情形

当逐日推文显示在条形图上时,对于百事来说,周四是推特数目最多的一天,这是由于他们公布了季度讲述,但就适口可乐而言,周二我们看到的推特数目最少。推特数据的情绪评分在本节中,我们把推特数据分为努力的、消极的和中立的,这可以通过使用sendimentR包来实现,该软件包为每个词典单词分配一个从-1到 1的情绪评分,并取推特中每个单词的平均值,获得每个推特的最终情绪评分。sentiments <- sentiment_by(get_sentences(pepsi_clean_df$text))
data$sentiment_score <- round(sentiments$ave_sentiment,2)
data$sentiment_score[data_pepsi$sentiment_score > 0] <- “Positive”
data$sentiment_score[data_pepsi$sentiment_score < 0] <- “Negative”
data$sentiment_score[data_pepsi$sentiment_score == 0] <- “Neutral”
data$sentiment_score <- as.factor(data$sentiment_score)
ggplot(data,aes(x = sentiment_score)) geom_bar(color = “steelblue”, fill = “steelblue”) theme_minimal()
险些75%的推特用户都持一定态度,由于这两个品牌在他们的客户中相当受迎接。主顾推特的情绪剖析推特的情绪是由Syuzhet软件包执行的,该软件包凭证十个情绪指数对每个词典单词举行评分,包罗气忿、预期、厌恶、恐惧、喜悦、悲痛、惊讶、信托、消极和努力。若是我们把索引上每个词的值加起来,所有推特的情绪都可以用条形图示意。cols <- c(“red”,”pink”,”green”,”orange”,”yellow”,”skyblue”,”purple”,”blue”,”black”,”grey”)
pepsi_sentimentsdf <- get_nrc_sentiment(names(pepsi_df))
barplot(colSums(pepsi_sentimentsdf),
main = “Pepsi”,col = cols,space = 0.05,horiz = F,angle = 45,cex.axis = 0.75,las = 2,srt = 60,border = NA)
cola_sentimentsdf <- get_nrc_sentiment(names(cola_df))
barplot(colSums(cola_sentimentsdf),
main = “Coca-Cola”,col = cols,space = 0.05,horiz = F,angle = 45,cex.axis = 0.75,las = 2,srt = 60,border = NA)

上面的输出是所有情绪在条形图上的显示,由于从条形图可以很清晰地看出,努力性对两家公司都起主导作用,这进一步增强了我们的上述假设。继续跟踪图表中的转变可以作为对新产物或广告的反馈。最常用词word_pepsi_df$words <- factor(word_pepsi_df$words, levels = word_pepsi_df$words[order(word_pepsi_df$freq)])
word_cola_df$words <- factor(word_cola_df$words, levels = word_cola_df$words[order(word_cola_df$freq)])
ggplot(word_pepsi_df[1:15,],aes(x = freq, y = words)) geom_bar(stat = “identity”, color = “#C4961A”,fill = “#C4961A”) theme_minimal() ggtitle(“Pepsi”)
ggplot(word_cola_df[1:15,],aes(x = freq, y = words)) geom_bar(stat = “identity”, color = “#C4961A”,fill = “#C4961A”) theme_minimal() ggtitle(“Coca-Cola”)
createNgram <-function(stringVector, ngramSize){
ngram <- data.table()
ng <- textcnt(stringVector, method = “string”, n=ngramSize, tolower = FALSE)
if(ngramSize==1){
ngram <- data.table(w1 = names(ng), freq = unclass(ng), length=nchar(names(ng)))
}
else {
ngram <- data.table(w1w2 = names(ng), freq = unclass(ng), length=nchar(names(ng)))
}
return(ngram)
}
pepsi_bigrams_df <- createNgram(pepsi_clean_df$text,2)
cola_bigrams_df <- createNgram(cola_clean_df$text,2)
pepsi_bigrams_df$w1w2 <- factor(pepsi_bigrams_df$w1w2,levels = pepsi_bigrams_df$w1w2[order(pepsi_bigrams_df$freq)])
cola_bigrams_df$w1w2 <- factor(cola_bigrams_df$w1w2,levels = cola_bigrams_df$w1w2[order(cola_bigrams_df$freq)])
names(pepsi_bigrams_df) <- c(“words”, “freq”, “length”)
names(cola_bigrams_df) <- c(“words”, “freq”, “length”)
ggplot(pepsi_bigrams_df[1:15,],aes(x = freq, y = words)) geom_bar(stat = “identity”, color = “#00AFBB”,fill = “#00AFBB”) theme_minimal() ggtitle(“Pepsi”)
ggplot(cola_bigrams_df[1:15,],aes(x = freq, y = words)) geom_bar(stat = “identity”, color = “#00AFBB”,fill = “#00AFBB”) theme_minimal() ggtitle(“Coca-Cola”)

二元语法二元语法是一对字词,当句子被拆分成两个字词时发生的。获取单词的上下文是有用的,由于单个单词通常不提供任何上下文。

性感可爱丝袜女郎高清写真
High definition photo of sexy and lovely stockings girl
上一篇:从金融科技到数据治理,如何才能发挥监管沙盒的效用?
下一篇:人大代表和政协委员的与“数字化”有关的议案和提案

您可能喜欢