It's sometimes difficult to initialize pipeline components in code #7027
Labels
enhancement
Feature requests and improvements
feat / pipeline
Feature: Processing pipeline and components
feat / ux
Feature: User experience, error messages etc.
The workflow for setting up a pipeline component in code sometimes feels a bit rough. This came up while I was investigating #6958.
Let's say we have some pipeline component that assumes its
.initialize()
method will be called before it's in a valid state, as the transformer does --- but the component doesn't necessarily need to be trained, as such, before it's in a functional state. We have the following:So now we need to call
transformer.initialize()
. How to do that?nlp.initialize()
? That does work --- but if I were adding the component in addition to other components, I'll have problems, as I'll wipe their weights.nlp.resume_training()
? It seemed like that ought to work, even though it's not the most obvious. It doesn't though, because it doesn't call.initialize()
on the components, as it can't know what weights that would reset.transformer.initialize(get_examples=lambda: [], nlp=nlp)
. However, this runs into an error invalidate_get_examples
, which complains the list is empty. The component does support an empty list though.transformer.initialize(nlp=nlp)
? This doesn't work, even though the docstring refers to it as an "optionalget_examples
callback".Example
object, so that I can return it inget_examples
. Kind of a hassle.transformer.model.initialize()
. This happens to work, but if the component requires other initialization it won't in this instance, so it's not a generalizable solution.A quick improvement is to add an argument to
validate_get_examples
indicating whether the component can work with no examples. I'm not sure how to help components that do need some data though.Maybe some components should check whether they're initialized, and do that on first usage if necessary? It does feel dirty, though.
The text was updated successfully, but these errors were encountered: