Skip to content

stepthom/sandbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8511937 · Sep 25, 2020
Sep 25, 2020
Aug 26, 2020
Jun 30, 2017
Jun 30, 2017
Jun 30, 2017
Jun 30, 2017
Oct 12, 2018
Oct 12, 2018
Oct 12, 2018
Oct 12, 2018
Oct 12, 2018
Oct 24, 2018
Oct 24, 2018
Oct 30, 2018
Oct 30, 2018
May 1, 2020
Dec 11, 2017
Aug 14, 2018
Dec 11, 2017
Aug 14, 2018
Jul 13, 2018
Dec 11, 2017
Jul 20, 2018
Sep 8, 2017
Sep 10, 2018
Jul 19, 2017
Sep 10, 2018
Dec 11, 2017
Jul 10, 2017
Jul 12, 2017
Sep 8, 2017
Jul 8, 2017
Dec 11, 2017
Oct 5, 2019
Oct 24, 2018
Jan 15, 2018
Dec 11, 2017
Jun 5, 2017
Sep 8, 2017
Jul 17, 2019
Jan 16, 2019
Jan 16, 2019
Aug 26, 2020
Aug 26, 2020
Aug 26, 2020
Jul 10, 2017
Jul 12, 2017
May 12, 2020
Jun 30, 2017
Jan 18, 2018
Feb 20, 2020
Feb 20, 2020
Jun 21, 2017
Jun 9, 2017
Dec 11, 2017
Jan 29, 2018
Nov 28, 2018
Jan 29, 2018
Apr 16, 2020
Apr 16, 2020
Apr 16, 2020
Oct 3, 2017
Oct 3, 2017
Jul 14, 2017
Jun 5, 2017
Jul 8, 2017
Sep 8, 2017
Jun 30, 2017
Jan 15, 2018
Jan 15, 2019
Jul 20, 2018
Sep 13, 2017
Jun 5, 2017
Oct 24, 2018
Jul 9, 2020
Oct 7, 2019
Apr 16, 2020
Oct 24, 2018
Jun 18, 2019
Apr 16, 2020
Jun 18, 2019
Apr 16, 2020
Apr 16, 2020
Oct 24, 2018
Apr 5, 2019
Jul 9, 2020
Jul 2, 2020
Jun 18, 2020
Apr 16, 2020
Aug 26, 2020
Apr 28, 2020
May 1, 2020
Oct 29, 2018
Jul 10, 2020
Jul 14, 2020
Nov 20, 2018
Jul 9, 2020
Dec 10, 2018
Dec 10, 2018
Apr 15, 2019
Dec 13, 2018
Dec 13, 2018
Apr 24, 2019
Apr 16, 2020
Sep 13, 2018
Apr 16, 2020
Apr 16, 2020
Apr 16, 2020
Jan 15, 2018
Jan 15, 2018
Dec 24, 2017
Dec 18, 2018
Apr 16, 2020
Jun 12, 2017
Sep 10, 2018
Sep 8, 2017
Jun 5, 2017
Jun 5, 2017
Aug 10, 2018
Jun 7, 2017
Jun 7, 2017
Jul 12, 2017
Jun 5, 2017
Jul 12, 2020
Jun 5, 2017

Repository files navigation

Sandbox

This repository holds scripts and notebooks for Steve's musings, investigations, case studies, animations, and slides.

Here's a high-level snapshot of each script.

Non-text Analytics

File Language Dataset Package Notes
NB.R R NaiveBayes.csv e1071 Simple example of NB.
arules.Rmd R arules::Groceries arules, arulesViz
bigdata.Rmd R N/A tidyverse Just some charts for the big data slides.
classifiers.R R laheart.csv rpart, e1071, MLmetrics Compares NB and DT.
intro.Rmd R gapminder tidyr, dplyr, ggplot2 An intro to R and the tidyverse.
recSys.R R recommenderlab::MovieLense recommenderlab Recommendation system for Movie Lense data. Uses CF.
slide_plots.Rmd R chirps.csv, Prestige.txt, clusters.csv tidytext, tm, tidyverse Just a script to create some plots/charts I've used in slides.
spark-sample.mdR R nycflights13, Lahman sparklyr Simple of example of how to use sparklyr.
sql.Rmd R customer.csv, transaction.csv sqldf Shows how to use the sqldf package. Used for some of my slides on SQL.
sqlChallenge.Rmd R Lahman sqldf Used for creating the SQL challenge.
titanic.Rmd R titanic tidyverse, rpart, MLmetrics Titanic case study. Builds a DT to predict survival.

Text Analytics

File Language Dataset Package Notes
cluster_20.ipynb Python sklearn.datasets::20newsgroups nltk, sklearn Clustering the 20 Newsgroup dataset.
imdb.Rmd R all.imdb.pipe.csv tidytext, cleanNLP, tm Classifying IMDB data.
kiva.Rmd R kiva.csv tidytext, topicmodels, rpart, MLmetrics Classifying KIVA loans. Used as a case study.
nltk-cluster.py Python sklearn.datasets::20newsgroups nltk, sklearn I'm not sure how this is different from cluster_20.ipynb
sentiment-manning.Rmd R manning.csv, brady.csv tidytext Sentiment analysis on tweets about Peyton Manning and Tom Brady.
slides_sentiment.R R N/A tidytext Just a script to do some simple tidy-based sentiment analysis on some made-up data.
slides_text_amazon.Rmd R reviews_Grocery_and_Gourmet_Food_5_50000.csv tidytext, tm, wordcloud Descriptive stats on Amazon Reviews (Food category).
slides_text_amazon_classify.R R reviews_Grocery_and_Gourmet_Food_5_50000.csv tidytext, tm, caret Classifying Amazon reviews.
slides_text_reuters.Rmd R reutersCSV.csv tidytext, tm, wordcloud Descriptive stats on Reuters dataset.

Data

Note: the source isn't actually "Unknown" for most of the data files below. I just haven't done it yet.

File Source
HR_comma_sep.csv Unknown
Master.csv Unknown
NaiveBayes.csv Unknown
Prestige.txt Unknown
Salaries.csv Unknown
all.imdb.pipe.csv Unknown
alltweets.csv Unknown
beta.csv Unknown
beta_12.csv Unknown
chirps.csv Unknown
clusters.csv Unknown
customer.csv Unknown
gamma.csv Unknown
gamma_12.csv Unknown
jackastors.csv Unknown
kiva..csv Unknown
laheart.csv Unknown
laheart2.csv Unknown
site.csv Unknown
student.csv Unknown
survey.csv Unknown
topicnames_12.csv Unknown
transaction.csv Unknown
visited.csv Unknown
groceries.csv Unknown
loan_small.csv Unknown
all.imdb.pipe.csv Unknown
brady.csv Unknown
manning.csv Unknown
reutersCSV.csv Unknown
reviews_Grocery_and_Gourmet_Food_5_50000.csv Unknown