This guide will help get you started with participating in projects and competitions with Data Science Club! The primary focus will be on Kaggle but most of the concepts touched upon will be applicable to almost he any other problem.
The guide will be split into a Python and R version. You can go with either but learning both is recommended in the long run (focus on 1 first though!). Of course both have their pros and cons but they both get the job done well.
Clayton's Note: I've used both for my co-op and you can't go wrong with either. Python's general purposeness makes it nice to work with gathering data (web scraping, APIs). R is built for statistical analysis, many libraries make doing cleaning data, doing analysis, and making beautiful graphs extremely easy (think a few lines of code).
This guide will go over following topics:
- Importing data
- Cleaning data
- Data exploration
- Predictions
Everything about Kaggle's scoring system can be found here.
As a club, we want to promote members creating kernels and submitting in competitions. How everything will be structure will be announced shortly.