Preparing text for generating word vectors with Floret #11285
orglce
started this conversation in
Help: Best practices
Replies: 1 comment
-
I would recommend only tokenizing. The static vectors currently look up a token's vector by the token text ( (It's technically possible to have vectors for a token attribute other than |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have been trying to generate my own word vectors with Floret and I was just wondering if there are any recommended preprocessing steps besides tokenization. Would it improve the down-stream accuracy of the pipeline if the text would be
I reckon none of these things would prove beneficial as a part of the whole pipeline (POS, NER, lemmatization...) but I don't know exactly how Spacy uses the vectors under the hood.
Beta Was this translation helpful? Give feedback.
All reactions