Analyzing support tickets with spaCy #12130
Replies: 1 comment 3 replies
-
It is hard to say a priori what representations work for document similarity and clustering. It really depends on the vocabulary, the amount of noise, etc of the data set. At any rate, I would recommend you to make a setup for reproducible evaluations, so that you can easily compare how well different measures and document representations work. It probably also makes sense to tackle document similarity before document clustering, since many cluster methods require a document similarity measure. spaCy itself has the |
Beta Was this translation helpful? Give feedback.
-
Hi, i am using spacy to analyse support-tickets. The tickets are written by humans and contain a lot of domain-specific words. With not much training of the standard model i managed to implement a sentiment analysis tool to calculate ticket-urgency.
Now i want to move onto the next nlp-tasks, like clustering and text similarity,
The tickets have a very technical background with abbreviations, technical terms and gibberish.
I am a bit lost at how to move onto those next nlp-task. Like, should i train custom wordvectors, or just NER? Train BERT with unlabeld text, or do i need pos-,dep-tags, etc.?
In short: I want a solid model based on my "non-standard" data and need some guidance on what kind of training results in the most solid model that can be used to work on a multitude of nlp-tasks.
Beta Was this translation helpful? Give feedback.
All reactions