-
-
Notifications
You must be signed in to change notification settings - Fork 1k
GSoC 2020 Apply Shogun to the Real World
This project is about using Shogun on “real world” data to demonstrate its capabilities beyond recognising handwritten digits or Iris species. In practice, this will involve a mix of using and modifying Shogun. The idea is that the data project is stand-alone and the result is a really cool show-off for Shogun :).
Important: While these projects have an analysis part where you try to figure out what is going on in your data, the major part should be of a technical/productive nature, i.e. you create a product such as interactive graphics, an analysis pipeline or an article/blog/how-to. Therefore ideally most of the data analysis part should be done before GSOC start so that you can hit the ground running.
- Heiko (github: karlnapf, IRC: HeikoS)
- Viktor (github: vigsterkr, IRC: wiking)
- Lea (github: lgoetz, IRC: leagoetz)
This depends very much on the exact project. We encourage you to get in touch with one of the mentors to discuss the project, your strengths and where you need help.
Most important requirements:
- you are extremely motivated and ready to work independently (!!!)
- some Shogun and Machine Learning basics
- Docker (optional)
We are looking for cool ideas of what is possible with the help of Shogun. Imagine you had 3 months and 5000 USD, what would you do?
We imagine the project roughly like this: you have data and an interesting question and now you will use Shogun to extract the answer and finally document and visualise the whole process. How exactly this project might look like will depend on what kind of question you want to answer and the data involved - it could be anything from mapping bike accidents in cities, to researching the link between national parks legislation and biodiversity, to predicting election outcomes.
Projects are aimed to be self-contained. This means that they do not necessarily need to be integrated within the main part of the library. They do not even need to be C++ code. Consequently, you have more freedom when working as the sometimes hard-to-dodge quality checks for our framework do not need to be passed. At the end, we expect a Docker container with an installed (potentially modified) Shogun and code to reproduce everything that you did. In addition, we require a number of major in-depth blog posts about your project, and what problem you are solving.
While the project is of a much less technical nature than the others, experience tells us that for these kind of projects, frequent communication with the mentor is very important, so you need to both be able to work independently (on your code) as well as be very responsive and seeking advice (on what it is you should be coding up). We also expect a detailed timeline and together will work out a number of milestones that will make up success/fail of the project.
Below is an idea for a project that will give you a flavour.
Are you a creative person, whose interests are beyond "implementing new algorithms", and who thrives in applying Machine Learning to real problems? Then this project is for you!
- Data Project - Elections
- Application Project - Interactive Webdemos
- Time series prediction
- Integrate Shogun with zepelin
- First step: a dataset or problem to work on, and an Ansatz
- Most important: a very detailed outline of the project goals and parts, and a detailed timeline
- Topics for 3 major blog posts about the project.
- Final step: Docker image to reproduce all results
- Map data examples
- Example of mapping bike accidents in London
- Example of spatial/temporal analysis of Twitter rumors and how it was done