v3.2.1 - Patch CLIP loading, small ONNX fix, compatibility with other libraries
This patch release fixes some small bugs, such as related to loading CLIP models, automatic model card generation issues, and ensuring compatibility with third party libraries.
Install this version with
# Training + Inference
pip install sentence-transformers[train]==3.2.1
# Inference only, use one of:
pip install sentence-transformers==3.2.1
pip install sentence-transformers[onnx-gpu]==3.2.1
pip install sentence-transformers[onnx]==3.2.1
pip install sentence-transformers[openvino]==3.2.1
Fixing Loading non-Transformer models
In v3.2.0, a non-Transformer based model (e.g. CLIP) would not load correctly if the model was saved in the root of the model repository/directory. This has been resolved in #3007.
Throw error if StaticEmbedding
-based model is finetuned with incompatible losses
The following losses are not compatible with StaticEmbedding
-based models:
- CachedGISTEmbedLoss
- CachedMultipleNegativesRankingLoss
- CachedMultipleNegativesSymmetricRankingLoss
- DenoisingAutoEncoderLoss
- GISTEmbedLoss
An error is now thrown when one of these are used with a StaticEmbedding
-based model. I recommend using MultipleNegativesRankingLoss to finetune these models, e.g. as in https://huggingface.co/tomaarsen/static-bert-uncased-gooaq.
Note: to get good performance, you must use much higher learning rates than otherwise. In my experiments, 2e-1 worked well.
Patch ONNX model when the model uses output_hidden_states
For example, this script used to fail, but passes now:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"distiluse-base-multilingual-cased",
backend="onnx",
model_kwargs={"provider": "CPUExecutionProvider"},
)
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings.shape)
All changes
- Bump optimum version by @echarlaix in #2984
- [
docs
] Update the training snippets for some losses that should use the v3 Trainer by @tomaarsen in #2987 - [
enh
] Throw error if StaticEmbedding-based model is trained with incompatible loss by @tomaarsen in #2990 - [
fix
] Fix semantic_search_usearch with 'binary' by @tomaarsen in #2989 - [enh] Add support for large_string in model card create by @yaohwang in #2999
- [
model cards
] Prevent crash on generating widgets if dataset column is empty by @tomaarsen in #2997 - [fix] Added model2vec import compatible with current and newer version by @Pringled in #2992
- Fix cache_dir issue with loading CLIPModel by @BoPeng in #3007
- [
warn
] Throw a warning if compute_metrics is set, as it's not used by @tomaarsen in #3002 - [
fix
] Prevent IndexError if output_hidden_states & ONNX by @tomaarsen in #3008
New Contributors
- @echarlaix made their first contribution in #2984
- @yaohwang made their first contribution in #2999
- @Pringled made their first contribution in #2992
- @BoPeng made their first contribution in #3007
Full Changelog: v3.2.0...v3.2.1