This is small attempt to create application which analyzes real-time stream of data (via twitter streaming API) and clusters different posts according to their meaning. It is planned to use for grouping tweets in real-time and to visualize trends in observed topic.
Various clustering algorithms are planned to be tested as part of this app.
The goal of this project is to create meaningful clustering for textual stream of information and visualize it accordingly.
See discussion regarding possible algorithms to use is on stackoverflow here: http://datascience.stackexchange.com/questions/979/algorithms-for-text-clustering
One of the approaches is using Lingo algorithm from carrot2
Also web-based interface is proposed to observe real-time clusters' fluctuations. To simplify clusters presentation - Masonry javascript layout is used
Post me a message in Twitter is you are interested in the topic or want to contribute: @MaximGalushka