What means 'existing models can not be extended'? #1288
Replies: 4 comments 3 replies
-
Basically, when training is done, the optimizer is thrown out, we put a big
AS IS stamp on the model, and ship it. Without the final state of the
optimizer, restarting training with a new set of data would lead to
suboptimal results. However, it would be less of an issue here than it
would in a deep learning model, I believe. The CoreNLP code doesn't
support that workflow, though. Perhaps you could retrain an entire new
model once a week or so - it only takes an hour on a decent CPU.
There's a gazette feature which can be updated after finishing a model,
although in general the models included in CoreNLP don't use the gazettes.
You can absolutely train an entire new classifier as long as you have data
to start from. The conll NER dataset is available on HuggingFace, for
example.
|
Beta Was this translation helpful? Give feedback.
-
But on the other side I expected that for deep learning a continues flow of new training data is a good approach?
For deep learning I would expect the issue will be exactly the same - if
you train once on a new item, the model won't learn anything. If you train
for several iterations on just a new item, the model will overfit to that
new item. If you train for several iterations on a new item and all the
existing data, it starts being questionable why you aren't just retraining
the whole model anyway. It'll be a hard balance to get right so that the
model knows both the old data and the new data.
There is work done on fine tuning models, and our deep learning model in
Python does in fact allow for it, but what I'm hearing is that you'll be
repeatedly fine tuning the model, which I expect to cause a lot of
performance degradation over time. I'd be curious to know if the Spacy
models are still working well for the original data after several rounds of
this process.
|
Beta Was this translation helpful? Give feedback.
-
@AngledLuffa Thanks for this discussion! You say '..... if you train once on a new item, the model won't learn anything...' Ok, but how would you solve the following problem: We want to extract entities (date, total, iban, bic, customer name) from invoice documents. Now a new (complete different) invoice receives the company. The model detects only half the entities (because the layout and context is different) . The user enters the missing data manually. So we now have a new invoice with the correct training data. How should we train this? As far as I understand you now, you say (even if not possible for now) sending a single new training data entity into the model will not have any effect. So you recommand to retrain a complete new model including the new invoice - right? As I have ten thousands of invoices from 100 suppliers would you than recommand to train only on invoice of for each supplier or really the full stack? We know that some invoice form suppliers which are sending many invoices are recognized better than those invoices form exotic suppliers sending only one invoice in a year. And this was the reason why I thought re-training only the exotic once is a good idea..... |
Beta Was this translation helpful? Give feedback.
-
What about writing solutions for each invoice type? On the one hand, doing this 100 times sounds pretty annoying, and it requires human intervention when there's a new invoice type added. On the other hand, writing 100 sets of regular expressions will probably get you over 99% accuracy, and it won't take that long The idea of training new models for each situation is also a possibility |
Beta Was this translation helpful? Give feedback.
-
Hi,
in the FAQs https://nlp.stanford.edu/software/crf-faq.shtml
I read in question 11
Does this mean, I can not add new entities or classifiers to an existing model? Or does this mean I can not re-train an existing model with new training data sets?
We have an open source workflow system which uses a continues learning approach. This means each new process instance provides finally a new single training data set. We want to refine our model day by day, based on the data users enter into the workflow platform. Is this possible?
Beta Was this translation helpful? Give feedback.
All reactions