TextAnalysis

This is an R package documents the workflow of text mining, topic modeling, and sentiment analysis. Specifically, it was used for the project to analyze Twitter and news articles related to the Orlando Shooting incidence. At the time we work on this project, another shooting incidence occurred in Las Vegas. We also compared the response to Orlando Shooting v.s. Las Vegas Shooting.

This is version_1.

There are bugs. Will fix them.

Loading data

We store the LexisNexis news data into the SQLite database.

LexisNexis_Orlando <- read_LexisNexis("LexisNexis_v1.db", metadata=TRUE, format=TRUE)
document <- LexisNexis_Orlando$document
meta <- LexisNexis_Orlando$meta

Document-term matrix

The create_DTM function can create the document term matrix conveniently.

text_stemmed <- create_DTM(document, ID=ID, text=FULL_TEXT, n_gram=1, stemming=TRUE)
text_nonstemmed <- create_DTM(document, ID=ID, text=FULL_TEXT, n_gram=1, stemming=FALSE)

Visualize the top 10% frequent words

The nonstemmed version of text

freq_plot <- plot_word_freq(text_nonstemmed, q=0.1, display = TRUE)

Sentiment analysis

We can use extract_sentiment function to calculate the sentiment scores with different lexicons, for example, afinn and nrc.

stm_affin <- extract_sentiment(text, "afinn")
stm_nrc <- extract_sentiment(text, "nrc")

LDA topics

run_LDA wraps the results from Latent Dirichlet Allocation (LDA) model.

lda_result <- run_LDA(text_stem, topic_num = 6, topic_term_n = 20, q = 0.1)
Beta <- lda_result$Beta
Gamma <- lda_result$Gamma %>% rename(ID = document)
print(lda_result$topic_term_plot)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TextAnalysis

Loading data

Document-term matrix

Visualize the top 10% frequent words

Sentiment analysis

LDA topics

Files

README.md

Latest commit

History

README.md

File metadata and controls

TextAnalysis

Loading data

Document-term matrix

Visualize the top 10% frequent words

Sentiment analysis

LDA topics