How to use different Transformers models in spaCy #10327

polm · 2022-02-18T05:46:10Z

polm
Feb 18, 2022

Using spacy-transformers, many Hugging Face models can be loaded and used in spaCy. However it's important to understand how they're used and the limitations before you try to incorporate a model.

Models from the Hugging Face Hub can be specified by name and loaded into a transformer component to be used as a source of features, similar to a tok2vec component. This has a number of implications.

First, task specific heads are not supported for training within spaCy. This means that if you load an NER model from Hugging Face you can't use it directly for NER with spacy-transformers. This isn't supported because the variation in implementation for task-specific heads is too high. That doesn't mean you can't use these models though - you can use the wrappers in spacy-huggingface-pipelines or write your own custom component to wrap them and get the predictions. This does have the downside that they won't be trainable within spaCy, and serialization also won't be handled automatically.

Changing the base model requires retraining. Keep in mind that any components that use a Transformer for features - like NER, textcat, or other components - rely on mutually learned representations. If you change the base model in a Transformers component you therefore have to retrain your model. If you don't, downstream components will get embeddings completely different from what they expect and you'll get nonsense results.

To be perfectly clear, this also means that you cannot take a trained pipeline like en_core_web_trf, replace the transformer component, and get meaningful results.

Not all models are supported. spacy-transformers works by handling common conventions for models on the Hugging Face Hub, but there's no fixed standard, so some models may simply not work. In rare cases we've seen models that don't give an error but also don't give meaningful results. If you're not sure if your model is working, try replacing it with roberta-base and see if you're able to train a model that way - if so it may be a model compatibility issue.

OK, with that out of the way, here's how you specify a different model in a config when training a model from scratch:

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
# XXX You can specify the model name below
# It can be a model on the hub or a local path, same as AutoModel
name = "bert-base-cased"
tokenizer_config = {"use_fast": true}

That's it! You can also see the docs for more details. Have fun with Transformers, and if you make something cool remember to share it on the Show & Tell board.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use different Transformers models in spaCy #10327

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to use different Transformers models in spaCy #10327

polm Feb 18, 2022

Replies: 0 comments

polm
Feb 18, 2022