Hashtag context #1986
-
Hi! I have read before that you have to do minimal preprocessing before topic modeling. However, as I am topic modeling tweets I am wondering if I should keep in hashtags or not. On the one hand, hashtags are sometimes use din the middle of a sentence, as just a word, which means that deleting every word that begins with a # would remove a lot of the structure of the sentence. Is a # something the multilingual embedding model takes into account? Can I leave it in? Or would it be better to delete the hashtags, or only the # ? Thanks in advance |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Good question, that depends on the embedding model I would guess. If it was trained using twitter-like data, then I can imagine that it would properly embed those tags and be helpful in the embedding representations. If not, then the hashtags might be noise that might be better removed. |
Beta Was this translation helpful? Give feedback.
Good question, that depends on the embedding model I would guess. If it was trained using twitter-like data, then I can imagine that it would properly embed those tags and be helpful in the embedding representations. If not, then the hashtags might be noise that might be better removed.