Custom component on filtered sentences of doc object #11946
Replies: 2 comments 3 replies
-
It sounds like you should write a custom suggester for the spancat that only suggests the sentences you want to keep. To be clear, do you want spancat to classify whole sentences, or to classify spans only within certain sentences? Either way you can do it with a custom suggester, it'll just be slightly more complicated if it's the latter option. |
Beta Was this translation helpful? Give feedback.
-
So here is the actual problem we have very large documents to process and number of candidates generated is very huge from n-gram suggester for example one doc itself can create 550K candidates. So, the model runs of out of gpu memory (16 gb) hence we wanted to reduce the number of generated candidates in spancat and this is the reason we seek for pre filtered sentences from that the doc which can be processed through spancat model. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am trying to build a pipeline with Transforms, NER and Spancat and another classification model.
pipeline = ["transformer", "tagger", "parser", "attribute_ruler", "lemmatizer", "ner","spancat", "custom componet"]
spancat is custom trained model here.
I need to pre filter sentences from docs which contains any word for a list and only those sentences need to be processed via spancat model, instead of entire doc obj. However, for all the other components i need entire document.
Example if document has 20 sents and only 8 sentences are from the pre filter list then spancat component should use only those 8 sentences and annotate.
How can i achieve that?
One way i can think of create another custom component and load spancat model into the init module of language factory. But i dont want to load the model for every single document it processes.
Beta Was this translation helpful? Give feedback.
All reactions