Best practice when packaging multiple models with shared custom components #12031
Replies: 2 comments 1 reply
-
Hi! Sorry to hear you're running into issues. Can you give us a little bit more background on what exactly you're trying to do? When you're calling |
Beta Was this translation helpful? Give feedback.
-
This question is rooted is really multi-part and this second one depends on the first. I am using v3.4.4 and training 4 ner pipelines and 3 textcats. My custom code is provided during training using spacy train with config file and passing --code functions.py. I was initially getting an error that one of the functions was missing when I performed a spacy.load(). This was also from a model that had been training using 3.3. Now the are no errors when I load the models. [initialize.before_init] The custom tokenizer is pretty simple, but is used by every pipeline. I also have a custom scorer. Question 1: When I perform a spacy.load() of a training model, are the various suffixes and prefixes I have specified already built into this model and the custom code is no longer required? The Package command will include it so am I missing something using "as-is". |
Beta Was this translation helpful? Give feedback.
-
I have several ner models with trained vectors, several textcat models...all using the same customer tokenizer and scorer. As I am migrating from 2.3, I was disappointed but not surprised that the spacy.load(disk location) did not work and that I would need to package the models in order to include custom components.
Is there a best practice to avoid duplication and combine these multiple models into 1 package? Is there anyway to avoid packaging altogether since my code will never be in the public space and the interim step of packaging is a little burdensome.
Beta Was this translation helpful? Give feedback.
All reactions