What means 'existing models can not be extended'? #1288

rsoika · 2022-07-27T18:38:52Z

rsoika
Jul 27, 2022

Hi,

in the FAQs https://nlp.stanford.edu/software/crf-faq.shtml

I read in question 11

Can an existing model be extended?

Unfortunately, no.

Does this mean, I can not add new entities or classifiers to an existing model? Or does this mean I can not re-train an existing model with new training data sets?

We have an open source workflow system which uses a continues learning approach. This means each new process instance provides finally a new single training data set. We want to refine our model day by day, based on the data users enter into the workflow platform. Is this possible?

Answered by AngledLuffa

Jul 27, 2022

Basically, when training is done, the optimizer is thrown out, we put a big AS IS stamp on the model, and ship it. Without the final state of the optimizer, restarting training with a new set of data would lead to suboptimal results. However, it would be less of an issue here than it would in a deep learning model, I believe. The CoreNLP code doesn't support that workflow, though. Perhaps you could retrain an entire new model once a week or so - it only takes an hour on a decent CPU. There's a gazette feature which can be updated after finishing a model, although in general the models included in CoreNLP don't use the gazettes. You can absolutely train an entire new classifier as long as …

View full answer

AngledLuffa · 2022-07-27T19:49:26Z

AngledLuffa
Jul 27, 2022
Maintainer

Basically, when training is done, the optimizer is thrown out, we put a big AS IS stamp on the model, and ship it. Without the final state of the optimizer, restarting training with a new set of data would lead to suboptimal results. However, it would be less of an issue here than it would in a deep learning model, I believe. The CoreNLP code doesn't support that workflow, though. Perhaps you could retrain an entire new model once a week or so - it only takes an hour on a decent CPU. There's a gazette feature which can be updated after finishing a model, although in general the models included in CoreNLP don't use the gazettes. You can absolutely train an entire new classifier as long as you have data to start from. The conll NER dataset is available on HuggingFace, for example.

3 replies

rsoika Jul 28, 2022
Author

Hi @AngledLuffa thanks for your answer - this is bad :-(
We have a complex process to compute a new training data set and the idea is to start from scratch with an empty model and improve the model day by day when the server computes new training data.

AngledLuffa Jul 28, 2022
Maintainer

To take a step back, what does it mean the server is computing new training data? Is that going to be gold standard data? Putting predicted answers into the training data probably won't improve the model in the long run.

Still, I'm thinking that regardless of how the data is being added, it won't be too bad to retrain the entire model every few days instead of finetune it on a daily basis.

Another point to note is that finetuning is a feature that could be added, and a couple of us here would know how in terms of Java skills, but for the most part the research community has moved away from Java and feature models to Python and deep learning models

rsoika Jul 28, 2022
Author

Yes, I will think about the concept of building a new model every view days.
But let me give you a better idea what we are doing:

We run the open source project Imixs-Workflow.. Our customers have a wide range of different business processes generating different kind of information. We train ML Models for these different kind of workflows and data.
One typical process is the approval of ingoing invoices. These documents have no common format and are in different languages. So its not easy for a ML to predict the named entities for new kind of invoice documents.
But during our business workflows the users add a lot of information into a process instance. And as a result at some point we got a perfect single data set in gold standard. In this situation we create a new single training data and send it back to the ML server (currently we are working with spacy which accepts this kind of re-training).
So over the time the system will recognize new invoice documents better and better. I do not know how this is named in technical term, but I call it a 'continuous supervised learning process'

And yes, your are totally right: the model could become worse for invoices that where trained a long time ago. The model is 'forgetting' learned data. And this is the point where I begin thinking about your idea of build a complete new model from time to time.

But on the other side I expected that for deep learning a continues flow of new training data is a good approach?

AngledLuffa · 2022-07-28T21:35:24Z

AngledLuffa
Jul 28, 2022
Maintainer

But on the other side I expected that for deep learning a continues flow of new training data is a good approach?

For deep learning I would expect the issue will be exactly the same - if you train once on a new item, the model won't learn anything. If you train for several iterations on just a new item, the model will overfit to that new item. If you train for several iterations on a new item and all the existing data, it starts being questionable why you aren't just retraining the whole model anyway. It'll be a hard balance to get right so that the model knows both the old data and the new data. There is work done on fine tuning models, and our deep learning model in Python does in fact allow for it, but what I'm hearing is that you'll be repeatedly fine tuning the model, which I expect to cause a lot of performance degradation over time. I'd be curious to know if the Spacy models are still working well for the original data after several rounds of this process.

0 replies

rsoika · 2022-07-29T08:08:43Z

rsoika
Jul 29, 2022
Author

@AngledLuffa Thanks for this discussion!
No the spacy model did not working very well ;-). So I think I should rethink my strategy.

You say '..... if you train once on a new item, the model won't learn anything...'

Ok, but how would you solve the following problem:

We want to extract entities (date, total, iban, bic, customer name) from invoice documents.
We have a starting set of 1000 invoices with training data
We create a initial model by running in several iterations randomly over the training dataset
So we have a good start model. The invoice detection works well.

Now a new (complete different) invoice receives the company. The model detects only half the entities (because the layout and context is different) .

The user enters the missing data manually. So we now have a new invoice with the correct training data. How should we train this?

As far as I understand you now, you say (even if not possible for now) sending a single new training data entity into the model will not have any effect.

So you recommand to retrain a complete new model including the new invoice - right?

As I have ten thousands of invoices from 100 suppliers would you than recommand to train only on invoice of for each supplier or really the full stack? We know that some invoice form suppliers which are sending many invoices are recognized better than those invoices form exotic suppliers sending only one invoice in a year. And this was the reason why I thought re-training only the exotic once is a good idea.....

0 replies

AngledLuffa · 2022-08-01T17:02:19Z

AngledLuffa
Aug 1, 2022
Maintainer

What about writing solutions for each invoice type? On the one hand, doing this 100 times sounds pretty annoying, and it requires human intervention when there's a new invoice type added. On the other hand, writing 100 sets of regular expressions will probably get you over 99% accuracy, and it won't take that long

The idea of training new models for each situation is also a possibility

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What means 'existing models can not be extended'? #1288

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

What means 'existing models can not be extended'? #1288

rsoika Jul 27, 2022

Replies: 4 comments · 3 replies

AngledLuffa Jul 27, 2022 Maintainer

rsoika Jul 28, 2022 Author

AngledLuffa Jul 28, 2022 Maintainer

rsoika Jul 28, 2022 Author

AngledLuffa Jul 28, 2022 Maintainer

rsoika Jul 29, 2022 Author

AngledLuffa Aug 1, 2022 Maintainer

rsoika
Jul 27, 2022

Replies: 4 comments 3 replies

AngledLuffa
Jul 27, 2022
Maintainer

rsoika Jul 28, 2022
Author

AngledLuffa Jul 28, 2022
Maintainer

rsoika Jul 28, 2022
Author

AngledLuffa
Jul 28, 2022
Maintainer

rsoika
Jul 29, 2022
Author

AngledLuffa
Aug 1, 2022
Maintainer