Skip to content

Collapsing BERTopic topics using similarities which are outliers from the distribution of topic similarities.

Notifications You must be signed in to change notification settings

stewartjollymore/Topic-Collapsing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Topic-Collapsing

Collapsing BERTopic topics using similarities which are outliers from the distribution of topic similarities. This is an augmentation of a process that was created by Maarten Grootendorst and differs by looking at all the topic similarites and finding those that are outliers (1.5 times the IQR from the upper whisker) and collapses that largest outlier.

Once the two most similar topics are collapsed the c-tf-idf is calculated and a new similarity matrix is created and the process is repeated until the there are no more outliers. This is not an optimized proccess but is a first pass.

This was deemed necssary as some of the smaller topics, which Grootendosrt was collapsing first, were solid topics in thier own right from with-in the corpus that I was working with.

IN DEV

About

Collapsing BERTopic topics using similarities which are outliers from the distribution of topic similarities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages