Skip to content

cluster tweets using the Jaccard Distance metric and the K-means clustering algorithm. This approach groups similar tweets together, which is useful for applications like trend analysis and content organization on Twitter.

Notifications You must be signed in to change notification settings

amirhosseinazami1373/Tweet-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Tweet-clustering

Cluster tweets using the Jaccard Distance metric and the K-means clustering algorithm. This approach groups similar tweets together, which is useful for applications like trend analysis and content organization on Twitter.

The data is accessible through the link below:

https://archive.ics.uci.edu/dataset/438/health+news+in+twitter

Steps:

Tokenize Tweets: Convert tweets into sets of words.

Define Jaccard Distance: Calculate the similarity between sets.

Initialize Centroids: Randomly pick k tweets as initial centroids.

Cluster Assignment: Assign each tweet to the nearest centroid based on Jaccard distance.

Update Centroids: Update each centroid to the tweet in the cluster that minimizes the sum of distances to other tweets in the same cluster.

Calculate SSE: Sum of Squared Errors for evaluation.

About

cluster tweets using the Jaccard Distance metric and the K-means clustering algorithm. This approach groups similar tweets together, which is useful for applications like trend analysis and content organization on Twitter.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published