I have a lot of outliers in my project, I want to get the probability of each sentence belonging to each topic, If I use topics,probs= topic_model.fit_transform(sentences,embeddings=embeddings) the probs obtained by this method will contain every sentence (including outliers). Is there any way to extract these outliers, and do these outliers also have topic probability proportions #2220

superseanyoung · 2024-11-20T09:16:09Z

superseanyoung
Nov 20, 2024

topic_model = BERTopic(embedding_model=transformer_model,
min_topic_size=3,
verbose=True,
umap_model=umap_model,
hdbscan_model=hdbscan_model,
ctfidf_model=ctfidf_model,
representation_model=representation_model,
#top_n_words=10,
#min_topic_size=10,
#nr_topics=None,
#low_memory=False,
calculate_probabilities=True
)
topics,probs= topic_model.fit_transform(sentences,embeddings=embeddings)

superseanyoung · 2024-11-20T09:22:13Z

superseanyoung
Nov 20, 2024
Author

In the obtained probs, I do not want to get sentences belonging to outliers, how should I post them

3 replies

MaartenGr Nov 20, 2024
Maintainer

Have you checked out the documentation of BERTopic? You should be able to find the solution there.

To help you out, please refer to the FAQ about this specific question.

superseanyoung Nov 20, 2024
Author

I update the topic after reducing the outliers, whether this will change the previous probs

MaartenGr Nov 20, 2024
Maintainer

The probabilities should stay the same after reducing outliers if you do not merge topics together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

superseanyoung Nov 20, 2024

Replies: 1 comment · 3 replies

superseanyoung Nov 20, 2024 Author

MaartenGr Nov 20, 2024 Maintainer

superseanyoung Nov 20, 2024 Author

MaartenGr Nov 20, 2024 Maintainer

superseanyoung
Nov 20, 2024

Replies: 1 comment 3 replies

superseanyoung
Nov 20, 2024
Author

MaartenGr Nov 20, 2024
Maintainer

superseanyoung Nov 20, 2024
Author

MaartenGr Nov 20, 2024
Maintainer