Skip to content

This is Benjamin Reese's repo for assignment06 and assignment07

Notifications You must be signed in to change notification settings

BenjaminFReese/assignment06

Repository files navigation

assignment06

This is Benjamin Reese's repo for assignment06 and assignment07. The first few exercises are analytical in nature and do not incude much coding. They calculate mean square error, root mean square error, mean absolute error, and a series of confusion matrices and model quality indicators for both binary and multiclass classification data tables. After that, I provide an analytical response to questions about random guessing and discuss the role of context in assessing the quality of our predictions. Finally, 5 includes quite a lot of code designed to create predictions of marble color based only on the knowledge of their size. For this, you will need to create a data folder in your working directory and place the marbles.csv file from canvas in it for the code to run. After that, you will only need the proper packages and all of the code should run. The code first splits the data into training and test data, with 20% reserved for testing. Then I discuss an intuitive/mental model of color prediction based on a barplot and a simple count of marbles of each size and color. From this, I derive conditional probabilities to better inform my prediction. I then create a simple function, color_prediction(), that takes an input of sizes and returns color predictions based entirely on the size the marble, big or small. I input our reserved testing data into this function in order to generate color predictions. The code then creates a second function, confusion_matrix(), designed to generate confusion matrices and calculate accuracy. Using the predictions generated by color_prediction(), I create a confusion matrix with the predicted data and the real colors from the reserved test data. The accuracy is predicted at .76. The last portion of code uses (tidymodels) to estimate a regression tree/CART model which will predict the color of marbles, using the training data and then applying the model to the testing data. The code also creates and plots a decision tree and creates a confusion matrix with the real color values from the test data and the predicted values derived from the CART model. The accuracy is the same as before, .76. I end the assignment by discussing why this is the case. This is the case because the size of a marble is such a meaningful predictor of a marble's color, and, in this simple case, we could figure this out from a simple plot while the model more strongly confirms our results. The assignment07.html file is my submission for Assignment 7. The assignment.qmd file is the quarto file that contains the code that created assignment07.html. Assignment 7 is focused on machine learning. The code reads in the data needed to replicate the analysis found in assignment07.qmd. The first exercise in assignment 7 creates a CART model to predict whether or not a New York City restaraunt will receive an "A" rating. The model finds that the most important predictors of a restaurant's rating is the inspection type and the location of the restaurant. The second exercise creates and tests three different models, one with hyperparameter tuning, to predict ridership at Chicago's Clark-Lake station. The models confirm what we can learn from exploratory data analysis: the day of the week is the strongest predictor of ridership. After testing a KNN model, a basic linear regression model, and a random forest model, the random forest model has the lowest root mean square error. Overall, it predicted ridership within about 2 thousand riders on average. While this error may be quite significant in comparison to the average ridership, the model generally predicts, with some accuracy, when ridership will be high and when will it be low.

About

This is Benjamin Reese's repo for assignment06 and assignment07

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages