Topic-Collapsing

Collapsing BERTopic topics using similarities which are outliers from the distribution of topic similarities. This is an augmentation of a process that was created by Maarten Grootendorst and differs by looking at all the topic similarites and finding those that are outliers (1.5 times the IQR from the upper whisker) and collapses that largest outlier.

Once the two most similar topics are collapsed the c-tf-idf is calculated and a new similarity matrix is created and the process is repeated until the there are no more outliers. This is not an optimized proccess but is a first pass.

This was deemed necssary as some of the smaller topics, which Grootendosrt was collapsing first, were solid topics in thier own right from with-in the corpus that I was working with.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
topic_collapse.py		topic_collapse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic-Collapsing

IN DEV

About

Releases

Packages

Languages

stewartjollymore/Topic-Collapsing

Folders and files

Latest commit

History

Repository files navigation

Topic-Collapsing

IN DEV

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages