Skip to content

Release 0.15.1

Latest
Compare
Choose a tag to compare
@alanakbik alanakbik released this 05 Feb 14:46
· 3 commits to master since this release
5325bee

This release fixes compatibility bugs with the newest PyTorch and SciPy versions, and adds a number of small improvements and new features.

Improvements and new features

  • SegtokTokenizer: Add option to customize SegtokTokenizer, by @alanakbik in #3592
  • RegexpTagger: Add option to define matching groups to RegexpTagger, by @alanakbik in #3598
  • RelationClassifier: Optimize RelationClassifier by adding the option to filter long sentences and truncate context, by @alanakbik in #3593
  • RelationClassifier: Modify printouts in RelationClassifier evaluation to remove clutter by @alanakbik in #3591
  • Add sentence labeler, by @MattGPT-ai in #3570
  • Adding a Deep Nearest Class Means Classifier model to Flair, by @sheldon-roberts in #3532
  • Add per-task metrics by @ntravis22 in #3605
  • Add options to load full documents as Sentence objects, by @alanakbik in #3595

New Model: Deep Nearest Class Means Classifier (#3532)

Adds a new Nearest Class Mean classification approach to Flair that classifies data points to the class with the closest class data mean. This approach can be used as an alternative to fitting a Softmax Classifier. It is now available for any class in Flair that implements DefaultClassifier. For instance, to train a TextClassifier with DeepNCMs you can use the following code:

from flair.data import Corpus
from flair.datasets import TREC_50
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.nn import DeepNCMDecoder
from flair.trainers import ModelTrainer
from flair.trainers.plugins import DeepNCMPlugin

# load the TREC dataset
corpus: Corpus = TREC_50()

label_type = "class"

# make a transformer document embedding
document_embeddings = TransformerDocumentEmbeddings("distilbert-base-uncased", fine_tune=True)

# create the label_dictionary
label_dictionary = corpus.make_label_dictionary(label_type=label_type)

# create a text classifier with a special DeepNCM decoder
classifier = TextClassifier(
    document_embeddings,
    label_type=label_type,
    label_dictionary=label_dictionary,
    decoder=DeepNCMDecoder(
        mean_update_method="condensation",
        embeddings_size=document_embeddings.embedding_length,
        label_dictionary=label_dictionary,
    ),
)

# initialize the trainer
trainer = ModelTrainer(classifier, corpus)

# train the model using the DeepNCM plugin
trainer.fine_tune(
    "resources/taggers/deepncm_baseline",
    plugins=[DeepNCMPlugin()],
)

Contributed by @sheldon-roberts in #3532

Datasets

Bug Fixes

  • Fix model loading for compatibility with PyTorch 2.6, by @helpmefindaname in #3608
  • Fix SciPy compatibility by updating scipy .A to toarray(), by @sg-wbi in #3606
  • Fix: use proper eval default main eval metrics for text regression model by @MattGPT-ai in #3602
  • Fix: cast indices tensor to int to fix bug by @MattGPT-ai in #3601

New Contributors

Full Changelog: v0.15.0...v0.15.1