Best practice when packaging multiple models with shared custom components #12031

python3Berg · 2022-12-29T14:51:58Z

python3Berg
Dec 29, 2022

I have several ner models with trained vectors, several textcat models...all using the same customer tokenizer and scorer. As I am migrating from 2.3, I was disappointed but not surprised that the spacy.load(disk location) did not work and that I would need to package the models in order to include custom components.

Is there a best practice to avoid duplication and combine these multiple models into 1 package? Is there anyway to avoid packaging altogether since my code will never be in the public space and the interim step of packaging is a little burdensome.

svlandeg · 2023-01-03T11:47:40Z

svlandeg
Jan 3, 2023
Maintainer

Hi! Sorry to hear you're running into issues.

Can you give us a little bit more background on what exactly you're trying to do?

When you're calling spacy.load(), are you using spaCy v3, and which version? And the location that you're referring to, are these models trained with v2, or v3? What error are you getting?

0 replies

python3Berg · 2023-01-03T12:17:45Z

python3Berg
Jan 3, 2023
Author

This question is rooted is really multi-part and this second one depends on the first. I am using v3.4.4 and training 4 ner pipelines and 3 textcats. My custom code is provided during training using spacy train with config file and passing --code functions.py. I was initially getting an error that one of the functions was missing when I performed a spacy.load(). This was also from a model that had been training using 3.3. Now the are no errors when I load the models.

[initialize.before_init]
@callbacks = "custom_tokenizer"

The custom tokenizer is pretty simple, but is used by every pipeline. I also have a custom scorer.

Question 1: When I perform a spacy.load() of a training model, are the various suffixes and prefixes I have specified already built into this model and the custom code is no longer required? The Package command will include it so am I missing something using "as-is".
Question 2: If Packaging is required (I really hope it isn't), how can I do this once for all on my pipelines to simpify the extra steps and also ensure that the extra code has only one copy in the wild.

1 reply

thomashacker Jan 5, 2023

Hello,
Unfortunately, it's not possible to package multiple pipelines into one package. A workaround could be to merge all of your models into one pipeline and then package it; however, this depends on how you currently use your models and whether it makes sense.

After you train your model, no custom code will be built-in unless you use spacy package and add the --code parameter referencing your scripts. Another way is to import the custom code to your script when you use spacy.load on trained directories (e.g. ./model-best).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practice when packaging multiple models with shared custom components #12031

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Best practice when packaging multiple models with shared custom components #12031

python3Berg Dec 29, 2022

Replies: 2 comments · 1 reply

svlandeg Jan 3, 2023 Maintainer

python3Berg Jan 3, 2023 Author

thomashacker Jan 5, 2023

python3Berg
Dec 29, 2022

Replies: 2 comments 1 reply

svlandeg
Jan 3, 2023
Maintainer

python3Berg
Jan 3, 2023
Author