Confusion 1) what is difference between Topic Modelling and Clustering? 2) How CountVectorizor work for tokenization and stop word removal after embedding? is work in numeric data? #2225
Replies: 2 comments 2 replies
-
After clustering, you still do not have any descriptions of the clusters right? Here, topic modeling is meant to give descriptions to the clusters that you found as a way to extract topics. It's a bit more involved than that but it is basically the difference between topic assignment (clustering) and topic representation (topic modeling).
It performs the tokenization on the aggregated groups of documents, not the embeddings.
It is through c-TF-IDF that it is formed, not after. For more information on the procedure of BERTopic, I would highly advise reading through the documentation here or the tutorial here. |
Beta Was this translation helpful? Give feedback.
-
Dear @MaartenGr In BERTopic which similarity measure is applied such as cosine similarity or euclidean distance? Also i have small confusion what is difference between DBSCAN and HDBSCAN instead of basic difference , on the basis of clustering with cluster size, process of clustering. Thanks in Advance. |
Beta Was this translation helpful? Give feedback.
-
I am recently work my thesis on Topic Modelling by using BERTopic for nepali text, but i have lots of confusion what exactly difference between topic modelling and clustering. In BERTopic documentation there is step of tokenization after clustering, and my question is that tokenization is perform on numeric data after embedding? please clearify me. And also how exactly topic are formed after C-TF-IDF?
Beta Was this translation helpful? Give feedback.
All reactions